lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <200603231941.k2NJf3g3015750@turing-police.cc.vt.edu>
Date: Thu Mar 23 19:41:13 2006
From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks@...edu)
Subject: Re: Re: Re: Links to Google's cache
	of626FrSIRTexploits 

On Thu, 23 Mar 2006 15:15:00 GMT, Dave Korn said:

> difference?  robots.txt is enforced (or ignored) by the client.  If a server 
> returns a 403 or doesn't, depending on what UserAgent you specified, then 
> how could making the client ignore robots.txt somehow magically make the 
> server not return a 403 when you try to fetch a page?

It *can*, however, make the client *issue* a request it would otherwise not have.

If the client had snarfed a robots.txt, and it said "don't snarf anything
under /dontlook/here/", and a link pointed there. it wouldn't follow the link.

If you tell it 'robots=off', then it *would* follow the link.

Remember - robots.txt *isn't* for the pages that would 403.  It's for the pages
that *would* work, that you don't want automatically snarfed by things like
wget and googlebots....
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
Url : http://lists.grok.org.uk/pipermail/full-disclosure/attachments/20060323/db8cc287/attachment.bin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ