Web Scraping is essentially getting a program to extract data from the internet within a fraction of the time you would require to do it manually. Web Scraping can be used to compare product reviews/prices from various e-commerce sites, scrape job hosting sites to check for available jobs in your area, monitor social media to gather the latest trends/hashtags. You can also automate your browser to do tasks such as buying your favourite band's concert tickets as soon as they go up for sale, notify you if your exam results are available and much more.
This article will cover Web Scraping in Python, which is the most popular language today used for the purpose.
- This is an amazing tutorial to help you get started. Brownie points for doing the Practice Projects mentioned in the end.
- Beautiful Soup is a tool that can be used to easily parse HTML code.
- When websites are dynamic and require some sort of interaction(clicking, hovering, entering text) to reveal data, browser automation comes in handy. Selenium is one of the best browser automation tools available. Check out the excellent unofficial documentation on Selenium.
- This link contains an exhaustive list of tools and libraries used in browser automation and web scraping using python. You can also check out the original repository to get information about the tools and libraries used for web scraping in other languages.
- Some websites do prohibit the use of robots(i.e web scrapers) to gather information from them, so it is best to read the Website User Agreement before proceeding for the same.
- Here is a helpful article concerning legality of web scraping.