This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Wget ignores robot.txt entry


No, I don't think cURL does recursive retrieval. I don't think it does Web page dependency retrieval, either. Both of these are a big deal for me. How could a tool of wget's versatility be replaced by something inferior? Whatever happened to technological meritocracy? (Please, no laughing.)

I was actually hoping to get some time to work on an extension to wget of my own. I wanted to add an option that would cause wget to look in one hierarchy to determine file existence and modification times relative to the set of files and mod times on the server and download new or newer files to a different location. That way I can easily maintain mirror copies on a CD-ROM. I'd tell wget to use the CD's contents as the file and mod-time reference and to download to a location on my hard drive (of course). Then I could incrementally update the ROM with whatever was downloaded.

Of course I can still do that and I may yet. Does that sound like a desirable feature to anyone? I don't know how many people share my mania for keeping local archives of content from the Internet.

What happens to an open source project when it devolves to this state? Who, for example, could hand out writable access to the wget CVS repository? Surely this isn't an unrecoverable state of affairs, is it?

Randall Schulz

At 19:04 2003-02-13, Max Bowsher wrote:
Randall R Schulz wrote:
> Wget is orphaned? That's bad news, since it seems to have it all over
> cURL. (Sure. Go ahead and prove me wrong. I might as well get it over
> with... for now.)

cURL doesn't do recursive web-suck (does it?)

Yes, wget is orphaned. There's no one on the wget mailing list who has CVS
write access. Which is a great shame, as there are a surprising amount of
patches being sent in.


Unsubscribe info:
Bug reporting:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]