详解scrapy内置中间件的顺序


Posted in Python onSeptember 28, 2020

1. 内置下载器中间件顺序

{'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500}

2. 内置爬虫中间件顺序

{'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800}

3. 内置scrapy的settings

{'AJAXCRAWL_ENABLED': False,
 'AUTOTHROTTLE_DEBUG': False,
 'AUTOTHROTTLE_ENABLED': False,
 'AUTOTHROTTLE_MAX_DELAY': 60.0,
 'AUTOTHROTTLE_START_DELAY': 5.0,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
 'BOT_NAME': 'scrapybot',
 'CLOSESPIDER_ERRORCOUNT': 0,
 'CLOSESPIDER_ITEMCOUNT': 0,
 'CLOSESPIDER_PAGECOUNT': 0,
 'CLOSESPIDER_TIMEOUT': 0,
 'COMMANDS_MODULE': '',
 'COMPRESSION_ENABLED': True,
 'CONCURRENT_ITEMS': 100,
 'CONCURRENT_REQUESTS': 16,
 'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
 'CONCURRENT_REQUESTS_PER_IP': 0,
 'COOKIES_DEBUG': False,
 'COOKIES_ENABLED': True,
 'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
 'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Language': 'en'},
 'DEPTH_LIMIT': 0,
 'DEPTH_PRIORITY': 0,
 'DEPTH_STATS_VERBOSE': False,
 'DNSCACHE_ENABLED': True,
 'DNSCACHE_SIZE': 10000,
 'DNS_TIMEOUT': 60,
 'DOWNLOADER': 'scrapy.core.downloader.Downloader',
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLS',
 'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
 'DOWNLOADER_MIDDLEWARES': {},
 'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500},
 'DOWNLOADER_STATS': True,
 'DOWNLOAD_DELAY': 0,
 'DOWNLOAD_FAIL_ON_DATALOSS': True,
 'DOWNLOAD_HANDLERS': {},
 'DOWNLOAD_HANDLERS_BASE': {'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
 'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
 'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
 'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
 'DOWNLOAD_MAXSIZE': 1073741824,
 'DOWNLOAD_TIMEOUT': 180,
 'DOWNLOAD_WARNSIZE': 33554432,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'EDITOR': 'D:\\Program Files (x86)\\Notepad++\\notepad++.exe',
 'EXTENSIONS': {},
 'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
 'scrapy.extensions.corestats.CoreStats': 0,
 'scrapy.extensions.feedexport.FeedExporter': 0,
 'scrapy.extensions.logstats.LogStats': 0,
 'scrapy.extensions.memdebug.MemoryDebugger': 0,
 'scrapy.extensions.memusage.MemoryUsage': 0,
 'scrapy.extensions.spiderstate.SpiderState': 0,
 'scrapy.extensions.telnet.TelnetConsole': 0,
 'scrapy.extensions.throttle.AutoThrottle': 0},
 'FEED_EXPORTERS': {},
 'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
 'jl': 'scrapy.exporters.JsonLinesItemExporter',
 'json': 'scrapy.exporters.JsonItemExporter',
 'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
 'marshal': 'scrapy.exporters.MarshalItemExporter',
 'pickle': 'scrapy.exporters.PickleItemExporter',
 'xml': 'scrapy.exporters.XmlItemExporter'},
 'FEED_EXPORT_ENCODING': None,
 'FEED_EXPORT_FIELDS': None,
 'FEED_EXPORT_INDENT': 0,
 'FEED_FORMAT': 'jsonlines',
 'FEED_STORAGES': {},
 'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
 'file': 'scrapy.extensions.feedexport.FileFeedStorage',
 'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
 's3': 'scrapy.extensions.feedexport.S3FeedStorage',
 'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
 'FEED_STORE_EMPTY': False,
 'FEED_TEMPDIR': None,
 'FEED_URI': None,
 'FEED_URI_PARAMS': None,
 'FILES_STORE_GCS_ACL': '',
 'FILES_STORE_S3_ACL': 'private',
 'FTP_PASSIVE_MODE': True,
 'FTP_PASSWORD': 'guest',
 'FTP_USER': 'anonymous',
 'HTTPCACHE_ALWAYS_STORE': False,
 'HTTPCACHE_DBM_MODULE': 'dbm',
 'HTTPCACHE_DIR': 'httpcache',
 'HTTPCACHE_ENABLED': False,
 'HTTPCACHE_EXPIRATION_SECS': 0,
 'HTTPCACHE_GZIP': False,
 'HTTPCACHE_IGNORE_HTTP_CODES': [],
 'HTTPCACHE_IGNORE_MISSING': False,
 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
 'HTTPCACHE_IGNORE_SCHEMES': ['file'],
 'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
 'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
 'HTTPPROXY_AUTH_ENCODING': 'latin-1',
 'HTTPPROXY_ENABLED': True,
 'IMAGES_STORE_GCS_ACL': '',
 'IMAGES_STORE_S3_ACL': 'private',
 'ITEM_PIPELINES': {},
 'ITEM_PIPELINES_BASE': {},
 'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
 'LOGSTATS_INTERVAL': 0,
 'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
 'LOG_ENABLED': True,
 'LOG_ENCODING': 'utf-8',
 'LOG_FILE': None,
 'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
 'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
 'LOG_LEVEL': 'DEBUG',
 'LOG_SHORT_NAMES': False,
 'LOG_STDOUT': False,
 'MAIL_FROM': 'scrapy@localhost',
 'MAIL_HOST': 'localhost',
 'MAIL_PASS': None,
 'MAIL_PORT': 25,
 'MAIL_USER': None,
 'MEMDEBUG_ENABLED': False,
 'MEMDEBUG_NOTIFY': [],
 'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
 'MEMUSAGE_ENABLED': True,
 'MEMUSAGE_LIMIT_MB': 0,
 'MEMUSAGE_NOTIFY_MAIL': [],
 'MEMUSAGE_WARNING_MB': 0,
 'METAREFRESH_ENABLED': True,
 'METAREFRESH_MAXDELAY': 100,
 'NEWSPIDER_MODULE': '',
 'RANDOMIZE_DOWNLOAD_DELAY': True,
 'REACTOR_THREADPOOL_MAXSIZE': 10,
 'REDIRECT_ENABLED': True,
 'REDIRECT_MAX_TIMES': 20,
 'REDIRECT_PRIORITY_ADJUST': 2,
 'REFERER_ENABLED': True,
 'REFERRER_POLICY': 'scrapy.spidermiddlewares.referer.DefaultReferrerPolicy',
 'RETRY_ENABLED': True,
 'RETRY_HTTP_CODES': [500, 502, 503, 504, 522, 524, 408],
 'RETRY_PRIORITY_ADJUST': -1,
 'RETRY_TIMES': 2,
 'ROBOTSTXT_OBEY': False,
 'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
 'SCHEDULER_DEBUG': False,
 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
 'SCHEDULER_PRIORITY_QUEUE': 'queuelib.PriorityQueue',
 'SPIDER_CONTRACTS': {},
 'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
 'scrapy.contracts.default.ScrapesContract': 3,
 'scrapy.contracts.default.UrlContract': 1},
 'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
 'SPIDER_LOADER_WARN_ONLY': False,
 'SPIDER_MIDDLEWARES': {},
 'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
 'SPIDER_MODULES': [],
 'STATSMAILER_RCPTS': [],
 'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
 'STATS_DUMP': True,
 'TELNETCONSOLE_ENABLED': 1,
 'TELNETCONSOLE_HOST': '127.0.0.1',
 'TELNETCONSOLE_PASSWORD': None,
 'TELNETCONSOLE_PORT': [6023, 6073],
 'TELNETCONSOLE_USERNAME': 'scrapy',
 'TEMPLATES_DIR': 'd:\\python36\\lib\\site-packages\\scrapy\\templates',
 'URLLENGTH_LIMIT': 2083,
 'USER_AGENT': 'Scrapy/1.6.0 (+https://scrapy.org)',
 'KEEP_ALIVE': True}

到此这篇关于详解scrapy内置中间件的顺序的文章就介绍到这了,更多相关scrapy 中间件顺序内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木!

Python 相关文章推荐
Python 字符串定义
Sep 25 Python
python实现Decorator模式实例代码
Feb 09 Python
对TensorFlow的assign赋值用法详解
Jul 30 Python
Python3对称加密算法AES、DES3实例详解
Dec 06 Python
Python批量生成特定尺寸图片及图画任意文字的实例
Jan 30 Python
对python读取zip压缩文件里面的csv数据实例详解
Feb 08 Python
简单了解python高阶函数map/reduce
Jun 28 Python
Python使用Beautiful Soup爬取豆瓣音乐排行榜过程解析
Aug 15 Python
PyTorch中Tensor的拼接与拆分的实现
Aug 18 Python
django drf框架自带的路由及最简化的视图
Sep 10 Python
python3中sorted函数里cmp参数改变详解
Mar 12 Python
详解Python flask的前后端交互
Mar 31 Python
Python爬虫代理池搭建的方法步骤
Sep 28 #Python
浅析python 通⽤爬⾍和聚焦爬⾍
Sep 28 #Python
Scrapy 配置动态代理IP的实现
Sep 28 #Python
Scrapy中如何向Spider传入参数的方法实现
Sep 28 #Python
详解向scrapy中的spider传递参数的几种方法(2种)
Sep 28 #Python
小结Python的反射机制
Sep 28 #Python
scrapy与selenium结合爬取数据(爬取动态网站)的示例代码
Sep 28 #Python
You might like
PHP将MySQL的查询结果转换为数组并用where拼接的示例
2016/05/13 PHP
php版微信公众号接口实现发红包的方法
2016/10/14 PHP
php实现断点续传大文件示例代码
2020/06/19 PHP
让广告代码不再影响你的网页加载速度
2006/07/07 Javascript
jsp+javascript打造级连菜单的实例代码
2013/06/14 Javascript
z-blog SyntaxHighlighter 长代码无法换行解决办法(jquery)
2014/11/16 Javascript
详谈jQuery操纵DOM元素属性 attr()和removeAtrr()方法
2015/01/22 Javascript
7个去伪存真的JavaScript面试题
2016/01/07 Javascript
js+canvas绘制五角星的方法
2016/01/28 Javascript
使用getBoundingClientRect方法实现简洁的sticky组件的方法
2016/03/22 Javascript
BootStrap中Datepicker控件带中文的js文件
2016/08/10 Javascript
Iscrool下拉刷新功能实现方法(推荐)
2017/06/26 Javascript
Vue.js搭建移动端购物车界面
2020/06/28 Javascript
vue中的过滤器及其时间格式化问题
2020/04/09 Javascript
Python中的exec、eval使用实例
2014/09/23 Python
Windows下Python的Django框架环境部署及应用编写入门
2016/03/10 Python
PyQt5每天必学之工具提示功能
2018/04/19 Python
python使用magic模块进行文件类型识别方法
2018/12/08 Python
python中如何使用分步式进程计算详解
2019/03/22 Python
使用python将最新的测试报告以附件的形式发到指定邮箱
2019/09/20 Python
如何设置PyCharm中的Python代码模版(推荐)
2020/11/20 Python
CSS3实现闪烁动画效果的方法
2015/02/09 HTML / CSS
如何在Canvas上的图形/图像绑定事件监听的实现
2020/09/16 HTML / CSS
Under Armour瑞典官方网站:美国高端运动科技品牌
2018/11/21 全球购物
Rossignol金鸡美国官网:始于1907年法国百年雪具品牌
2019/03/06 全球购物
屈臣氏菲律宾官网:Watsons菲律宾
2020/06/30 全球购物
Eclipse面试题
2014/03/22 面试题
办公室文书岗位职责
2013/12/16 职场文书
教育课题研究自我鉴定范文
2013/12/28 职场文书
化工专业大学生职业生涯规划书
2014/01/14 职场文书
元旦晚会邀请函
2014/01/27 职场文书
小学校园文化建设汇报材料
2014/08/19 职场文书
2015年主婚人婚礼致辞
2015/07/28 职场文书
初中体育教学随笔
2015/08/15 职场文书
适合毕业生创业的项目怎么找?
2019/08/08 职场文书
JavaScript实现九宫格拖拽效果
2022/06/28 Javascript