Exploring Web Mining

Abstract

Mining content from the web is becoming an increasingly common activity for individuals and organizations seeking to make informed decisions or simply wanting to collect related information on topics of interest. Web mining is a process of automatically crawling (browsing) websites and scraping (extracting) the relevant content from their pages. This paper presents a literature review of web crawler components and crawl strategies and web scraper components and strategies. A review of current research in web mining approaches is also presented along with a sample listing of publicly available data sources. A discussion on potential security, ethical, and legal issues common to use of web mining tools is also included.