Fajri Abdillah

Forum Crawler

This project is build to collect Forum Data, I use no 1 Forum in Indonesia as Starting point, so spesifically it called Scrapy Kaskus Crawler.

Challenge

  • Python

    Write a good python code is fun, because it indented very well

  • Scrapy & XPATH

    Confusing at first, but it goes well, trial and error using xpath code just to capture some in string in HTML element

  • Pagination

    Every Forum has paginated result, to crawl a thread I have to know the pagination. It's challenging for me

Year

2012

Stack

  • Python 2.6
  • Ubuntu Server 12.04
  • Apache 2
  • Scrapy 0.16
  • Redis

Code

Scrapy Kaskus Crawler

Lesson learned

Scrapy is absolutely a web crawler framework, and fast.