编程 Python

python爬虫多次请求超时的几种重试方法(6种)

Posted in Python onDecember 01, 2020

第一种方法

headers = Dict()
url = 'https://www.baidu.com'
try:
 proxies = None
 response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
except:
 # logdebug('requests failed one time')
 try:
  proxies = None
  response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
 except:
  # logdebug('requests failed two time')
  print('requests failed two time')

总结：代码比较冗余，重试try的次数越多，代码行数越多，但是打印日志比较方便

第二种方法

def requestDemo(url，):
 headers = Dict()
 trytimes = 3 # 重试的次数
 for i in range(trytimes):
 try:
  proxies = None
  response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
  # 注意此处也可能是302等状态码
  if response.status_code == 200:
  break
 except:
  # logdebug(f'requests failed {i}time')
   print(f'requests failed {i} time')

总结：遍历代码明显比第一个简化了很多，打印日志也方便

第三种方法

def requestDemo(url， times=1):
 headers = Dict()
 try:
  proxies = None
  response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
  html = response.text()
  # todo 此处处理代码正常逻辑
  pass
  return html
 except:
  # logdebug(f'requests failed {i}time')
  trytimes = 3 # 重试的次数
  if times < trytimes:
  times += 1
   return requestDemo(url， times)
  return 'out of maxtimes'

总结：迭代显得比较高大上，中间处理代码时有其它错误照样可以进行重试；缺点不太好理解，容易出错，另外try包含的内容过多时，对代码运行速度不利。

第四种方法

@retry(3) # 重试的次数 3
def requestDemo(url):
 headers = Dict()
 proxies = None
 response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
 html = response.text()
 # todo 此处处理代码正常逻辑
 pass
 return html
 
def retry(times):
 def wrapper(func):
  def inner_wrapper(*args, **kwargs):
   i = 0
   while i < times:
    try:
     print(i)
     return func(*args, **kwargs)
    except:
     # 此处打印日志 func.__name__ 为say函数
     print("logdebug: {}()".format(func.__name__))
     i += 1
  return inner_wrapper
 return wrapper

总结：装饰器优点多种函数复用，使用十分方便

第五种方法

#!/usr/bin/python
# -*-coding='utf-8' -*-
import requests
import time
import json
from lxml import etree
import warnings
warnings.filterwarnings("ignore")

def get_xiaomi():
 try:
  # for n in range(5): # 重试5次
  #  print("第"+str(n)+"次")
  for a in range(5): # 重试5次
   print(a)
   url = "https://www.mi.com/"
   headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
    "Connection": "keep-alive",
    # "Cookie": "xmuuid=XMGUEST-D80D9CE0-910B-11EA-8EE0-3131E8FF9940; Hm_lvt_c3e3e8b3ea48955284516b186acf0f4e=1588929065; XM_agreement=0; pageid=81190ccc4d52f577; lastsource=www.baidu.com; mstuid=1588929065187_5718; log_code=81190ccc4d52f577-e0f893c4337cbe4d|https%3A%2F%2Fwww.mi.com%2F; Hm_lpvt_c3e3e8b3ea48955284516b186acf0f4e=1588929099; mstz=||1156285732.7|||; xm_vistor=1588929065187_5718_1588929065187-1588929100964",
    "Host": "www.mi.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36"
   }
   response = requests.get(url,headers=headers,timeout=10,verify=False)
   html = etree.HTML(response.text)
   # print(html)
   result = etree.tostring(html)
   # print(result)
   print(result.decode("utf-8"))
   title = html.xpath('//head/title/text()')[0]
   print("title==",title)
   if "左左" in title:
   # print(response.status_code)
   # if response.status_code ==200:
    break
  return title

 except:
  result = "异常"
  return result

if __name__ == '__main__':
 print(get_xiaomi())

第六种方法

Python重试模块retrying

# 设置最大重试次数
@retry(stop_max_attempt_number=5)
def get_proxies(self):
 r = requests.get('代理地址')
 print('正在获取')
 raise Exception("异常")
 print('获取到最新代理 = %s' % r.text)
 params = dict()
 if r and r.status_code == 200:
  proxy = str(r.content, encoding='utf-8')
  params['http'] = 'http://' + proxy
  params['https'] = 'https://' + proxy

# 设置方法的最大延迟时间，默认为100毫秒(是执行这个方法重试的总时间)
@retry(stop_max_attempt_number=5,stop_max_delay=50)
# 通过设置为50，我们会发现，任务并没有执行5次才结束！

# 添加每次方法执行之间的等待时间
@retry(stop_max_attempt_number=5,wait_fixed=2000)
# 随机的等待时间
@retry(stop_max_attempt_number=5,wait_random_min=100,wait_random_max=2000)
# 每调用一次增加固定时长
@retry(stop_max_attempt_number=5,wait_incrementing_increment=1000)

# 根据异常重试，先看个简单的例子
def retry_if_io_error(exception):
 return isinstance(exception, IOError)

@retry(retry_on_exception=retry_if_io_error)
def read_a_file():
 with open("file", "r") as f:
  return f.read()

read_a_file函数如果抛出了异常，会去retry_on_exception指向的函数去判断返回的是True还是False，如果是True则运行指定的重试次数后，抛出异常，False的话直接抛出异常。

当时自己测试的时候网上一大堆抄来抄去的，意思是retry_on_exception指定一个函数，函数返回指定异常，会重试，不是异常会退出。真坑人啊！

来看看获取代理的应用(仅仅是为了测试retrying模块)

到此这篇关于python爬虫多次请求超时的几种重试方法的文章就介绍到这了,更多相关python爬虫多次请求超时内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木！

python爬虫多次请求超时的几种重试方法(6种)

- Author -

莫贞俊晗

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python中函数的用法实例教程

Sep 08 Python

Python中利用sorted()函数排序的简单教程

Apr 27 Python

Java中重定向输出流实现用文件记录程序日志

Jun 12 Python

Python数组遍历的简单实现方法小结

Apr 27 Python

python对象与json相互转换的方法

May 07 Python

Python 画出来六维图

Jul 26 Python

利用pyecharts实现地图可视化的例子

Aug 12 Python

python 负数取模运算实例

Jun 03 Python

python对一个数向上取整的实例方法

Jun 18 Python

python编程的核心知识点总结

Feb 08 Python

用Python爬虫破解滑动验证码的案例解析

May 06 Python

pytorch常用数据类型所占字节数对照表一览

May 17 Python

python爬虫搭配起Bilibili唧唧的流程分析

Dec 01 #Python

python爬虫看看虎牙女主播中谁最“顶”步骤详解

Dec 01 #Python

详解Django自定义图片和文件上传路径(upload_to)的2种方式

Dec 01 #Python

使用python爬取抖音app视频的实例代码

Dec 01 #Python

基于Python实现粒子滤波效果

Dec 01 #Python

Django集成MongoDB实现过程解析

Dec 01 #Python

基于Django快速集成Echarts代码示例

Dec 01 #Python

You might like

PHPMyAdmin 快速配置方法

2009/05/11 PHP

PHP 事务处理数据实现代码

2010/05/13 PHP

解析如何在PHP下载文件名中解决乱码的问题

2013/06/20 PHP

php数据类型判断函数有哪些

2013/09/23 PHP

采用thinkphp自带方法生成静态html文件详解

2014/06/13 PHP

PHP计算数组中值的和与乘积的方法(array_sum与array_product函数)

2016/04/01 PHP

php进行ip地址掩码运算处理的方法

2016/07/11 PHP

PHP5.4起内置web服务器使用方法

2016/08/09 PHP

PHP+Session防止表单重复提交的解决方法

2018/04/09 PHP

PHP ADODB生成HTML表格函数rs2html功能【附错误处理函数用法】

2018/05/29 PHP

JS是否可以跨文件同时控制多个iframe页面的应用技巧

2007/12/16 Javascript

JS 实现双色表格实现代码

2009/11/24 Javascript

html中使用javascript调用本地程序(exe、doc等)实现代码

2013/04/26 Javascript

javascript页面动态显示时间变化示例代码

2013/12/18 Javascript

jQuery 过滤方法filter()选择具有特殊属性的元素

2014/06/15 Javascript

node.js cookie-parser之parser.js

2016/06/06 Javascript

几句话带你理解JS中的this、闭包、原型链

2016/09/26 Javascript

Vue前端开发规范整理(推荐)

2018/04/23 Javascript

微信小程序模板消息限制实现无限制主动推送的示例代码

2019/08/27 Javascript

Python socket.error: [Errno 98] Address already in use的原因和解决方法

2014/08/25 Python

python定时器（Timer）用法简单实例

2015/06/04 Python

django之常用命令详解

2016/06/30 Python

python实现搜索文本文件内容脚本

2018/06/22 Python

python使用wxpy实现微信消息防撤回脚本

2019/04/29 Python

在Pytorch中计算自己模型的FLOPs方式

2019/12/30 Python

详解python 内存优化

2020/08/17 Python

美国围栏公司：Walpole Outdoors

2019/11/19 全球购物

Java程序员面试90题

2013/10/19 面试题

大学生怎样进行自我评价

2013/12/07 职场文书

学校招生宣传广告词

2014/03/19 职场文书

2014乡镇干部纪律作风整顿思想汇报

2014/09/13 职场文书

交通事故被告答辩状

2015/05/22 职场文书

毕业证明模板

2015/06/19 职场文书

2016年学校“3.12”植树节活动总结

2016/03/16 职场文书

在前女友婚礼上,用Python破解了现场的WIFI还把名称改成了

2021/05/28 Python

Spring Boot 整合 Apache Dubbo的示例代码

2021/07/04 Java/Android