TSV

Web Scraping Tools - TSV

Web scraping tools are software and libraries for automatically collecting structured data from websites on the internet. Various tools exist to accommodate different approaches and use cases, from static HTML parsing to dynamic JavaScript rendering and browser automation. Implemented in multiple programming languages including Python, JavaScript, and Java, they are widely used for everything from small-scale data collection to large-scale crawling projects.

web scraping data collection crawling automation Python JavaScript
code	slug	name	description	githubUrl	javascriptSupport	language	officialUrl	type
01	scrapy	Scrapy	A high-level Python web crawling and scraping framework	https://github.com/scrapy/scrapy	false	Python	https://scrapy.org/	Framework
02	beautifulsoup	BeautifulSoup	A Python library for parsing HTML and XML documents		false	Python	https://www.crummy.com/software/BeautifulSoup/	Library
03	selenium	Selenium	A cross-platform tool for browser automation	https://github.com/SeleniumHQ/selenium	true	Multi-language	https://www.selenium.dev/	Framework
04	playwright	Playwright	Microsoft's end-to-end testing and automation framework	https://github.com/microsoft/playwright	true	Multi-language	https://playwright.dev/	Framework
05	puppeteer	Puppeteer	Google's Node.js library for Chrome and Firefox automation	https://github.com/puppeteer/puppeteer	true	JavaScript/Node.js	https://pptr.dev/	Library
06	octoparse	Octoparse	A no-code visual web scraping tool		true	N/A	https://www.octoparse.com/	No-code Tool
07	apify	Apify	A cloud-based web scraping and automation platform	https://github.com/apify	true	JavaScript/Node.js	https://apify.com/	Cloud Platform
08	parsehub	ParseHub	A machine learning-powered cloud-based scraping tool		true	N/A	https://www.parsehub.com/	Cloud Tool