80legs
Type of site
Web Crawling Service
Available inEnglish
OwnerDatafiniti, LLC
Created byShion Deysarkar
URLwww.80legs.com
LaunchedSeptember 2009

80legs is a web crawling service that allows its users to create and run web crawls through its software as a service platform.

History

80legs was created by Computational Crawling, a company in Houston, Texas. The company launched the private beta of 80legs in April 2009 and publicly launched the service at the DEMOfall 09 conference. At the time of its public launch, 80legs offered customized web crawling and scraping services. It has since added subscription plans and other product offerings.[1][2]

Technology

80legs is built on top of a distributed grid computing network.[3] This grid consists of approximately 50,000 individual computers, distributed across the world, and uses bandwidth monitoring technology to prevent bandwidth cap overages.[4]

80legs has been criticised by numerous site owners for its technology effectively acting as a Distributed Denial of Service attack and not obeying robots.txt.[5][6][7][8] As the average webmaster is not aware of the existence of 80legs, blocking access to its crawler can only be done when it is already too late, the server DDoSed, and the guilty party detected after a time-consuming in-depth analysis of the logfiles.

Some rulesets for modsecurity (like the one from Atomicorp[9]) block all access to the webserver from 80legs in order to prevent a DDOS. WebKnight also blocks 80legs by default. As it is a distributed crawler, it is impossible to block this crawler by IP. The best way found to block 80legs is by its UserAgent, "008".[10] Wrecksite blocks 80legs by default.

References

  1. https://venturebeat.com/2009/12/21/80legs-web-crawler-free/ 80legs sets its web crawler free
  2. http://www.readwriteweb.com/archives/bulk_social_data_80legs.php Archived 2010-07-22 at the Wayback Machine Thoughts From the Man Who Would Sell The World, Nicely
  3. http://gigaom.com/2009/09/22/80legs-is-where-setihome-meets-google/ 80legs is Where SETI@home Meets Google
  4. http://gigaom.com/2009/05/14/80legs-cares-about-your-bandwidth-cap/ 80legs Cares About Your Bandwidth Cap
  5. http://www.datamadness.com/2012/01/ddosed-by-80legs/ Archived 2012-01-16 at the Wayback Machine DDOSed by 80legs
  6. http://news.ycombinator.com/item?id=1056960 HackerNews thread
  7. http://www.webmasterworld.com/search_engine_spiders/4457359.htm Webmasterworld thread
  8. https://twitter.com/openstreetmap/status/221188821721681920 Complaint from OpenStreetMap
  9. "Atomicorp". Retrieved 2013-02-05.
  10. "80legs - Most Powerful Web Crawler Ever". Archived from the original on 2013-10-31. Retrieved 2013-11-06.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.