编程 Python

Scrapy框架基本命令与settings.py设置

Posted in Python onFebruary 06, 2020

本文实例讲述了Scrapy框架基本命令与settings.py设置。分享给大家供大家参考，具体如下：

Scrapy框架基本命令

1.创建爬虫项目

scrapy startproject [项目名称]

2.创建爬虫文件

scrapy genspider +文件名+网址

3.运行(crawl)

scrapy crawl 爬虫名称
# -o output 输出数据到文件
scrapy crawl [爬虫名称] -o zufang.json
scrapy crawl [爬虫名称] -o zufang.csv

4.check检查错误

scrapy check

5.list返回项目所有spider

scrapy list

6.view 存储、打开网页

scrapy view http://www.baidu.com

7.scrapy shell, 进入终端

scrapy shell https://www.baidu.com

8.scrapy runspider

scrapy runspider zufang_spider.py

Scrapy框架: settings.py设置

# -*- coding: utf-8 -*-
# Scrapy settings for maitian project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#   https://doc.scrapy.org/en/latest/topics/settings.html
#   https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#   https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'maitian'
SPIDER_MODULES = ['maitian.spiders']
NEWSPIDER_MODULE = 'maitian.spiders'
#不能批量设置
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'maitian (+http://www.yourdomain.com)'
#默认遵守robots协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
#设置日志文件
LOG_FILE="maitian.log"
#日志等级分为5种：1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL
#等级越高 输出的日志越少
# LOG_LEVEL="INFO"
#scrapy设置最大并发数 默认16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
#设置批量延迟请求16 等待3秒再发16 秒
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
#cookie 不生效 默认是True
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
#远程
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
#加载默认的请求头
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#  'Accept-Language': 'en',
#}
#爬虫中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#  'maitian.middlewares.MaitianSpiderMiddleware': 543,
#}
#下载中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#  'maitian.middlewares.MaitianDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#  'scrapy.extensions.telnet.TelnetConsole': None,
#}
#在配置文件 开启管道
#优先级的范围 0--1000；值越小 优先级越高
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
#  'maitian.pipelines.MaitianPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

更多相关内容可查看本站专题：《Python Socket编程技巧总结》、《Python正则表达式用法总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》

希望本文所述对大家基于Scrapy框架的Python程序设计有所帮助。

Scrapy框架基本命令与settings.py设置

- Author -

hankleo

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python中requests模块的使用方法

Apr 08 Python

python友情链接检查方法

Jul 08 Python

python编程开发之日期操作实例分析

Nov 13 Python

不同版本中Python matplotlib.pyplot.draw()界面绘制异常问题的解决

Sep 24 Python

浅谈python 线程池threadpool之实现

Nov 17 Python

Python用imghdr模块识别图片格式实例解析

Jan 11 Python

python求最大连续子数组的和

Jul 07 Python

python操作小程序云数据库实现简单的增删改查功能

Jun 06 Python

python实现socket+threading处理多连接的方法

Jul 23 Python

python3用urllib抓取贴吧邮箱和QQ实例

Mar 10 Python

python实现横向拼接图片

Mar 23 Python

python 进制转换 int、bin、oct、hex的原理

Jan 13 Python

python opencv圆、椭圆与任意多边形的绘制实例详解

Feb 06 #Python

Python输出指定字符串的方法

Feb 06 #Python

python实现简单飞行棋

Feb 06 #Python

python实现飞行棋游戏

Feb 05 #Python

以SQLite和PySqlite为例来学习Python DB API

Feb 05 #Python

Python操作Sqlite正确实现方法解析

Feb 05 #Python

Tensorflow矩阵运算实例(矩阵相乘,点乘,行/列累加)

Feb 05 #Python

You might like

php中如何判断一个网页请求是ajax请求还是普通请求

2013/08/10 PHP

为jQuery.Treeview添加右键菜单的实现代码

2010/10/22 Javascript

初窥JQuery(二)事件机制(2)

2010/12/06 Javascript

使用javascript获取flash加载的百分比的实现代码

2011/05/25 Javascript

使用javascript为网页增加夜间模式

2014/01/26 Javascript

jquery中get和post的简单实例

2014/02/04 Javascript

使用jQuery实现的掷色子游戏动画效果

2014/03/14 Javascript

JS辨别访问浏览器判断是android还是ios系统

2014/08/19 Javascript

jQuery判断指定id的对象是否存在的方法

2015/05/22 Javascript

Bootstrap源码解读表单（2）

2016/12/22 Javascript

Nodejs基于LRU算法实现的缓存处理操作示例

2017/03/17 NodeJs

javascript 中Cookie读、写与删除操作

2017/03/29 Javascript

webpack打包后直接访问页面图片路径错误的解决方法

2017/06/17 Javascript

Vue Element 分组+多选+可搜索Select选择器实现示例

2018/07/23 Javascript

ES6 对象的新功能与解构赋值介绍

2019/02/05 Javascript

微信小程序实现蒙版弹出窗功能

2019/09/17 Javascript

vue 公共列表选择组件,引用Vant-UI的样式方式

2020/11/02 Javascript

Python设置Socket代理及实现远程摄像头控制的例子

2015/11/13 Python

深入理解Python中的super()方法

2017/11/20 Python

Python实现PS滤镜Fish lens图像扭曲效果示例

2018/01/29 Python

python数据挖掘需要学的内容

2019/06/23 Python

编译 pycaffe时报错：fatal error: numpy/arrayobject.h没有那个文件或目录

2020/11/29 Python

html5写一个BUI折叠菜单插件的实现方法

2019/09/11 HTML / CSS

Subside Sports德国：足球球衣和球迷商品

2019/06/08 全球购物

关于母亲节的感言

2014/02/04 职场文书

《浅水洼里的小鱼》听课反思

2014/02/28 职场文书

小学优秀班集体申报材料

2014/05/25 职场文书

医院党员公开承诺书

2014/08/30 职场文书

机动车交通事故协议书

2015/01/29 职场文书

公司市场部岗位职责

2015/04/15 职场文书

搞笑的婚礼主持词

2015/06/29 职场文书

《祁黄羊》教学反思

2016/02/20 职场文书

日本读研：怎样写好一篇日本研究计划书?

2019/07/15 职场文书

Vue3.0写自定义指令的简单步骤记录

2021/06/27 Vue.js

浅谈css实现背景颜色半透明的两种方法

2021/12/06 HTML / CSS

Golang 对es的操作实例

2022/04/20 Golang