Robots.txt |
Overview The Robot Exclusion Standard, Robots Exclusion Protocol or robots.txt protocol is a suggestion for "honest" browsers which should prevent web crawlers from accessing the parts of a website listed in the roboos.txt file. Video Tutorials Discovery Methodology Browse to the robots.txt file in Mutillidae and read the contents Exploitation Follow the paths in the robots.txt file to see if any sensitive directories or files are exposed. Try to list the contents of directories since servers will sometimes be misconfigured to show directory contents. Print robots.txt pages for list of servers
while read HOST; do echo -n $HOST:; curl -v --silent --connect-timeout 2 --max-time 3 $HOST/robots.txt 2>&1 | grep -A 100 Disallow; echo; done < hosts.txt
Nmap: Sweeping for comments
nmap -p 80,443 -v -Pn --script=http-robots.txt --open -iL hosts.txt
Example Robots.txt is located at http://[server]/mutillidae/robots.txt with default installation. On Samurai WTF the path will be http://mutillidae/robots.txt. Videos How to grab robots.txt file with CURL |