This is the mail archive of the
mailing list for the Cygwin project.
Wget ignores robot.txt entry
- From: L Anderson <lowella at serv dot net>
- To: cygwinList <cygwin at cygwin dot com>
- Date: Thu, 13 Feb 2003 18:14:54 -0800
- Subject: Wget ignores robot.txt entry
- Organization: TBD
Using the latest of things Cygwin, I downloaded some stuff with wget
from <http://cygwin.com> to peruse off-line and noticed a problem I
The <http://cygwin.com/robots.txt> file has the entries:
so wget should not download /cgi-bin/.
However, "wget -o cygwincom.log -m -p --no-parent -X /cygwin,/ml
http://cygwin.com/" downloads /cgi-bin anyway.
NB. "wget -o cygwincom.log -m -p --no-parent -X /cgi-bin,/cygwin,/ml
http://cygwin.com/ doesn't download /cgi-bin
I ran a validity check on <http://cygwin.com/robots.txt> and found no
Is this a bug in wget or am I doing something wrong?
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html