This is the mail archive of the
mailing list for the Cygwin project.
Re: Wget ignores robot.txt entry
No, I don't think cURL does recursive retrieval. I don't think it does
Web page dependency retrieval, either. Both of these are a big deal for
me. How could a tool of wget's versatility be replaced by something
inferior? Whatever happened to technological meritocracy? (Please, no
I was actually hoping to get some time to work on an extension to wget
of my own. I wanted to add an option that would cause wget to look in
one hierarchy to determine file existence and modification times
relative to the set of files and mod times on the server and download
new or newer files to a different location. That way I can easily
maintain mirror copies on a CD-ROM. I'd tell wget to use the CD's
contents as the file and mod-time reference and to download to a
location on my hard drive (of course). Then I could incrementally
update the ROM with whatever was downloaded.
Of course I can still do that and I may yet. Does that sound like a
desirable feature to anyone? I don't know how many people share my
mania for keeping local archives of content from the Internet.
What happens to an open source project when it devolves to this state?
Who, for example, could hand out writable access to the wget CVS
repository? Surely this isn't an unrecoverable state of affairs, is it?
At 19:04 2003-02-13, Max Bowsher wrote:
Randall R Schulz wrote:
> Wget is orphaned? That's bad news, since it seems to have it all over
> cURL. (Sure. Go ahead and prove me wrong. I might as well get it over
> with... for now.)
cURL doesn't do recursive web-suck (does it?)
Yes, wget is orphaned. There's no one on the wget mailing list who has CVS
write access. Which is a great shame, as there are a surprising amount of
patches being sent in.
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html