Creating a web crawler in Go with Colly

Post Details

Company

LogRocket

Date Published

Dec. 22, 2020

Author

Michael Okoko

Word Count

1,481

Company Posts That Month

75

Language

-

Hacker News Points

-

Source URL

blog.logrocket.com/web-scraping-with-go-and-colly

Summary

Web scraping and crawling are techniques for extracting data from websites without dedicated APIs, and can be achieved using tools like Colly, a Go package that utilizes Go's net/HTTP and goquery for network communication and HTML element targeting. The article details a project using Colly to scrape celebrity birthday data from the IMDB website, emphasizing the setup and configuration of Colly's Collector component, which manages network requests and can be customized with callbacks such as OnRequest and OnHTML. The tutorial walks through the process of initializing a Go module, installing Colly as a dependency, and setting up functions for scraping tasks, including fetching lists of celebrities born on a given date and navigating through pages using recursive links. Additionally, it explains how to parse HTML data into Go structs representing celebrity profiles and movies, and how to dynamically handle command-line arguments to fetch data for any specified date. The complete source code is available on GitLab, and the article encourages further exploration of Colly's capabilities for more complex operations.

Trends Found in this Post

No tracked trend matches for this post yet.