详解scrapy内置中间件的顺序


Posted in Python onSeptember 28, 2020

1. 内置下载器中间件顺序

{'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500}

2. 内置爬虫中间件顺序

{'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800}

3. 内置scrapy的settings

{'AJAXCRAWL_ENABLED': False,
 'AUTOTHROTTLE_DEBUG': False,
 'AUTOTHROTTLE_ENABLED': False,
 'AUTOTHROTTLE_MAX_DELAY': 60.0,
 'AUTOTHROTTLE_START_DELAY': 5.0,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 1.0,
 'BOT_NAME': 'scrapybot',
 'CLOSESPIDER_ERRORCOUNT': 0,
 'CLOSESPIDER_ITEMCOUNT': 0,
 'CLOSESPIDER_PAGECOUNT': 0,
 'CLOSESPIDER_TIMEOUT': 0,
 'COMMANDS_MODULE': '',
 'COMPRESSION_ENABLED': True,
 'CONCURRENT_ITEMS': 100,
 'CONCURRENT_REQUESTS': 16,
 'CONCURRENT_REQUESTS_PER_DOMAIN': 8,
 'CONCURRENT_REQUESTS_PER_IP': 0,
 'COOKIES_DEBUG': False,
 'COOKIES_ENABLED': True,
 'DEFAULT_ITEM_CLASS': 'scrapy.item.Item',
 'DEFAULT_REQUEST_HEADERS': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'Accept-Language': 'en'},
 'DEPTH_LIMIT': 0,
 'DEPTH_PRIORITY': 0,
 'DEPTH_STATS_VERBOSE': False,
 'DNSCACHE_ENABLED': True,
 'DNSCACHE_SIZE': 10000,
 'DNS_TIMEOUT': 60,
 'DOWNLOADER': 'scrapy.core.downloader.Downloader',
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory',
 'DOWNLOADER_CLIENT_TLS_METHOD': 'TLS',
 'DOWNLOADER_HTTPCLIENTFACTORY': 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory',
 'DOWNLOADER_MIDDLEWARES': {},
 'DOWNLOADER_MIDDLEWARES_BASE': {'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware': 560,
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware': 400,
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware': 350,
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 300,
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': 900,
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 590,
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 750,
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware': 580,
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': 600,
 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 550,
 'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': 100,
 'scrapy.downloadermiddlewares.stats.DownloaderStats': 850,
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': 500},
 'DOWNLOADER_STATS': True,
 'DOWNLOAD_DELAY': 0,
 'DOWNLOAD_FAIL_ON_DATALOSS': True,
 'DOWNLOAD_HANDLERS': {},
 'DOWNLOAD_HANDLERS_BASE': {'data': 'scrapy.core.downloader.handlers.datauri.DataURIDownloadHandler',
 'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
 'ftp': 'scrapy.core.downloader.handlers.ftp.FTPDownloadHandler',
 'http': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 'https': 'scrapy.core.downloader.handlers.http.HTTPDownloadHandler',
 's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler'},
 'DOWNLOAD_MAXSIZE': 1073741824,
 'DOWNLOAD_TIMEOUT': 180,
 'DOWNLOAD_WARNSIZE': 33554432,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'EDITOR': 'D:\\Program Files (x86)\\Notepad++\\notepad++.exe',
 'EXTENSIONS': {},
 'EXTENSIONS_BASE': {'scrapy.extensions.closespider.CloseSpider': 0,
 'scrapy.extensions.corestats.CoreStats': 0,
 'scrapy.extensions.feedexport.FeedExporter': 0,
 'scrapy.extensions.logstats.LogStats': 0,
 'scrapy.extensions.memdebug.MemoryDebugger': 0,
 'scrapy.extensions.memusage.MemoryUsage': 0,
 'scrapy.extensions.spiderstate.SpiderState': 0,
 'scrapy.extensions.telnet.TelnetConsole': 0,
 'scrapy.extensions.throttle.AutoThrottle': 0},
 'FEED_EXPORTERS': {},
 'FEED_EXPORTERS_BASE': {'csv': 'scrapy.exporters.CsvItemExporter',
 'jl': 'scrapy.exporters.JsonLinesItemExporter',
 'json': 'scrapy.exporters.JsonItemExporter',
 'jsonlines': 'scrapy.exporters.JsonLinesItemExporter',
 'marshal': 'scrapy.exporters.MarshalItemExporter',
 'pickle': 'scrapy.exporters.PickleItemExporter',
 'xml': 'scrapy.exporters.XmlItemExporter'},
 'FEED_EXPORT_ENCODING': None,
 'FEED_EXPORT_FIELDS': None,
 'FEED_EXPORT_INDENT': 0,
 'FEED_FORMAT': 'jsonlines',
 'FEED_STORAGES': {},
 'FEED_STORAGES_BASE': {'': 'scrapy.extensions.feedexport.FileFeedStorage',
 'file': 'scrapy.extensions.feedexport.FileFeedStorage',
 'ftp': 'scrapy.extensions.feedexport.FTPFeedStorage',
 's3': 'scrapy.extensions.feedexport.S3FeedStorage',
 'stdout': 'scrapy.extensions.feedexport.StdoutFeedStorage'},
 'FEED_STORE_EMPTY': False,
 'FEED_TEMPDIR': None,
 'FEED_URI': None,
 'FEED_URI_PARAMS': None,
 'FILES_STORE_GCS_ACL': '',
 'FILES_STORE_S3_ACL': 'private',
 'FTP_PASSIVE_MODE': True,
 'FTP_PASSWORD': 'guest',
 'FTP_USER': 'anonymous',
 'HTTPCACHE_ALWAYS_STORE': False,
 'HTTPCACHE_DBM_MODULE': 'dbm',
 'HTTPCACHE_DIR': 'httpcache',
 'HTTPCACHE_ENABLED': False,
 'HTTPCACHE_EXPIRATION_SECS': 0,
 'HTTPCACHE_GZIP': False,
 'HTTPCACHE_IGNORE_HTTP_CODES': [],
 'HTTPCACHE_IGNORE_MISSING': False,
 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': [],
 'HTTPCACHE_IGNORE_SCHEMES': ['file'],
 'HTTPCACHE_POLICY': 'scrapy.extensions.httpcache.DummyPolicy',
 'HTTPCACHE_STORAGE': 'scrapy.extensions.httpcache.FilesystemCacheStorage',
 'HTTPPROXY_AUTH_ENCODING': 'latin-1',
 'HTTPPROXY_ENABLED': True,
 'IMAGES_STORE_GCS_ACL': '',
 'IMAGES_STORE_S3_ACL': 'private',
 'ITEM_PIPELINES': {},
 'ITEM_PIPELINES_BASE': {},
 'ITEM_PROCESSOR': 'scrapy.pipelines.ItemPipelineManager',
 'LOGSTATS_INTERVAL': 0,
 'LOG_DATEFORMAT': '%Y-%m-%d %H:%M:%S',
 'LOG_ENABLED': True,
 'LOG_ENCODING': 'utf-8',
 'LOG_FILE': None,
 'LOG_FORMAT': '%(asctime)s [%(name)s] %(levelname)s: %(message)s',
 'LOG_FORMATTER': 'scrapy.logformatter.LogFormatter',
 'LOG_LEVEL': 'DEBUG',
 'LOG_SHORT_NAMES': False,
 'LOG_STDOUT': False,
 'MAIL_FROM': 'scrapy@localhost',
 'MAIL_HOST': 'localhost',
 'MAIL_PASS': None,
 'MAIL_PORT': 25,
 'MAIL_USER': None,
 'MEMDEBUG_ENABLED': False,
 'MEMDEBUG_NOTIFY': [],
 'MEMUSAGE_CHECK_INTERVAL_SECONDS': 60.0,
 'MEMUSAGE_ENABLED': True,
 'MEMUSAGE_LIMIT_MB': 0,
 'MEMUSAGE_NOTIFY_MAIL': [],
 'MEMUSAGE_WARNING_MB': 0,
 'METAREFRESH_ENABLED': True,
 'METAREFRESH_MAXDELAY': 100,
 'NEWSPIDER_MODULE': '',
 'RANDOMIZE_DOWNLOAD_DELAY': True,
 'REACTOR_THREADPOOL_MAXSIZE': 10,
 'REDIRECT_ENABLED': True,
 'REDIRECT_MAX_TIMES': 20,
 'REDIRECT_PRIORITY_ADJUST': 2,
 'REFERER_ENABLED': True,
 'REFERRER_POLICY': 'scrapy.spidermiddlewares.referer.DefaultReferrerPolicy',
 'RETRY_ENABLED': True,
 'RETRY_HTTP_CODES': [500, 502, 503, 504, 522, 524, 408],
 'RETRY_PRIORITY_ADJUST': -1,
 'RETRY_TIMES': 2,
 'ROBOTSTXT_OBEY': False,
 'SCHEDULER': 'scrapy.core.scheduler.Scheduler',
 'SCHEDULER_DEBUG': False,
 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleLifoDiskQueue',
 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.LifoMemoryQueue',
 'SCHEDULER_PRIORITY_QUEUE': 'queuelib.PriorityQueue',
 'SPIDER_CONTRACTS': {},
 'SPIDER_CONTRACTS_BASE': {'scrapy.contracts.default.ReturnsContract': 2,
 'scrapy.contracts.default.ScrapesContract': 3,
 'scrapy.contracts.default.UrlContract': 1},
 'SPIDER_LOADER_CLASS': 'scrapy.spiderloader.SpiderLoader',
 'SPIDER_LOADER_WARN_ONLY': False,
 'SPIDER_MIDDLEWARES': {},
 'SPIDER_MIDDLEWARES_BASE': {'scrapy.spidermiddlewares.depth.DepthMiddleware': 900,
 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': 50,
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware': 500,
 'scrapy.spidermiddlewares.referer.RefererMiddleware': 700,
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware': 800},
 'SPIDER_MODULES': [],
 'STATSMAILER_RCPTS': [],
 'STATS_CLASS': 'scrapy.statscollectors.MemoryStatsCollector',
 'STATS_DUMP': True,
 'TELNETCONSOLE_ENABLED': 1,
 'TELNETCONSOLE_HOST': '127.0.0.1',
 'TELNETCONSOLE_PASSWORD': None,
 'TELNETCONSOLE_PORT': [6023, 6073],
 'TELNETCONSOLE_USERNAME': 'scrapy',
 'TEMPLATES_DIR': 'd:\\python36\\lib\\site-packages\\scrapy\\templates',
 'URLLENGTH_LIMIT': 2083,
 'USER_AGENT': 'Scrapy/1.6.0 (+https://scrapy.org)',
 'KEEP_ALIVE': True}

到此这篇关于详解scrapy内置中间件的顺序的文章就介绍到这了,更多相关scrapy 中间件顺序内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木!

Python 相关文章推荐
Python中用于检查英文字母大写的isupper()方法
May 19 Python
Python中操作mysql的pymysql模块详解
Sep 13 Python
python 表达式和语句及for、while循环练习实例
Jul 07 Python
Python进阶学习之特殊方法实例详析
Dec 01 Python
python 接口测试response返回数据对比的方法
Feb 11 Python
python计算阶乘和的方法(1!+2!+3!+...+n!)
Feb 01 Python
python程序快速缩进多行代码方法总结
Jun 23 Python
使用python绘制温度变化雷达图
Oct 18 Python
WxPython实现无边框界面
Nov 18 Python
Django视图、传参和forms验证操作
Jul 15 Python
Python列表嵌套常见坑点及解决方案
Sep 30 Python
Python用requests库爬取返回为空的解决办法
Feb 21 Python
Python爬虫代理池搭建的方法步骤
Sep 28 #Python
浅析python 通⽤爬⾍和聚焦爬⾍
Sep 28 #Python
Scrapy 配置动态代理IP的实现
Sep 28 #Python
Scrapy中如何向Spider传入参数的方法实现
Sep 28 #Python
详解向scrapy中的spider传递参数的几种方法(2种)
Sep 28 #Python
小结Python的反射机制
Sep 28 #Python
scrapy与selenium结合爬取数据(爬取动态网站)的示例代码
Sep 28 #Python
You might like
php-perl哈希算法实现(times33哈希算法)
2013/12/30 PHP
php读取本地json文件的实例
2018/03/07 PHP
php微信公众号开发之欢迎老朋友
2018/10/20 PHP
javascript import css实例代码
2008/07/18 Javascript
浏览器脚本兼容 文本框中,回车键触发事件的兼容
2010/06/21 Javascript
理解Javascript_10_对象模型
2010/10/16 Javascript
HTML5附件拖拽上传drop & google.gears实现代码
2011/04/28 Javascript
JS常用正则表达式总结
2013/11/12 Javascript
js调用iframe实现打印页面内容的方法
2014/03/04 Javascript
JavaScript多图片上传案例
2015/09/28 Javascript
使用jQuery的easydrag插件实现可拖动的DIV弹出框
2016/02/19 Javascript
关于动态执行代码(js的Eval)实例详解
2016/08/15 Javascript
JS 动态加载js文件和css文件 同步/异步的两种简单方式
2016/09/23 Javascript
在JSP中如何实现MD5加密的方法
2016/11/02 Javascript
JavaScript中的普通函数和箭头函数的区别和用法详解
2017/03/21 Javascript
微信小程序之页面拦截器的示例代码
2017/09/07 Javascript
详解.vue文件中监听input输入事件(oninput)
2017/09/19 Javascript
浅谈Vue SSR 的 Cookies 问题
2017/11/20 Javascript
js合并两个数组生成合并后的key:value数组
2018/05/09 Javascript
Vue.js实现开发购物车功能的方法详解
2019/02/22 Javascript
史上最为详细的javascript继承(推荐)
2019/05/18 Javascript
Vee-validate 父组件获取子组件表单校验结果的实例代码
2019/05/20 Javascript
Bootstrap实现省市区三级联动(亲测可用)
2019/07/26 Javascript
JS实现商城秒杀倒计时功能(动态设置秒杀时间)
2019/12/12 Javascript
openlayers 3实现车辆轨迹回放
2020/09/24 Javascript
简单介绍Python中的几种数据类型
2016/01/02 Python
Python网络爬虫神器PyQuery的基本使用教程
2018/02/03 Python
使用pygame写一个古诗词填空通关游戏
2019/12/03 Python
Speedo美国:澳大利亚顶尖泳衣制造商
2016/08/03 全球购物
英国外籍人士的在线超市:British Corner Shop
2019/06/03 全球购物
新东网科技Java笔试题
2012/07/13 面试题
护理专业自我鉴定
2014/01/30 职场文书
2014年3.15团委活动总结
2014/03/16 职场文书
项目经理岗位职责
2015/01/31 职场文书
2015年酒店前台工作总结
2015/04/20 职场文书
java实现web实时消息推送的七种方案
2022/07/23 Java/Android