Developing a Web Scraper

Abstract

Mining content from the web is becoming an increasingly common activity for individuals and organizations seeking to make informed decisions or simply wanting to collect related information on topics of interest. Web mining is a process of automatically crawling (browsing) websites and scraping (extracting) the relevant content from their pages. This paper describes the development of a web scraper for purposes of gaining marketing intelligence. A review of related work in web mining is presented specific to the development of a web scraper along with details of the development process. The paper concludes with a discussion on concerns about the technical, ethical, and legal issues that should be considered.