News Web Crawler is my essay to got my Bachelor Degree, so I work as best as I can for this project.
This Project has 3 Parts.
1) Capture RSS Data from given providers
2) Crawl News Data from RSS
3) Show Data using
As a guest, they can search RSS data, or search crawled data. As we know, some news sites sometimes has too much distraction, in this project, that Distraction is removed, so only Head and Body of the news going into Database, no ads, no ads via click, just plain news.
Data increase every 30 Minute, so every 30 minutes, every query is cached in Redis. And at that moment, to collect RSS, I use
Google Reader, sadly that awesome project was shutdown. I can't find the best replacement for
Django is awesome web framework, it has automatic backend administration. It is useful to check the Database, Create data from simple interface, Update, and Delete. If designed correctly, the
foreign-key will show here.
Scrapy & XPATH
I need to create individually Xpath data. Because every news provider has different structure
Well, Django is really a webframework to meet fast deadline
Write a good python code is fun, because it indented very well
Use Case Diagram
Django is fun, good choice for Python web framework and fast to develop.