Skip to content

InfiniteLegend/habrahabr-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#Habrahabr spider Spiders that crawls Habrahabr site using Scrapy framework. All results are sending to MongoDB or AWS ES server (if those pipelines where defined in local_settings.py)

Example:

ITEM_PIPELINES = {
   'habrahabr.pipelines.MongoPipeline': 0,
   'habrahabr.pipelines.ElasticsearchPipeline': 1,
}

ELASTICSEARCH_URL = ""
ELASTICSEARCH_INDEX = ""

MONGO_URI = ""
MONGO_DATABASE = ""
MONGO_COLLECTION = ""

To disable any of this pipelines, simply don't mention them in ITEM_PIPELINES (it's all in your hands, I warned :))

To install scrapy you need to install plenty of packages.

*For Linux Solution: sudo apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev

Create local_settings.py file with YOUR Mongo and AWS ES credentials

About

Spiders that crawls habrahabr.ru site using Scrapy framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages