编程 Python

Python3学习urllib的使用方法示例

Posted in Python onNovember 29, 2017

urllib是python的一个获取url(Uniform Resource Locators,统一资源定址符)了，可以利用它来抓取远程的数据进行保存，本文整理了一些关于urllib使用中的一些关于header,代理,超时,认证,异常处理处理方法。

1.基本方法

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

url: 需要打开的网址
data：Post提交的数据
timeout：设置网站的访问超时时间

直接用urllib.request模块的urlopen（）获取页面，page的数据格式为bytes类型，需要decode（）解码，转换成str类型。

from urllib import request
response = request.urlopen(r'http://python.org/') # <http.client.HTTPResponse object at 0x00000000048BC908> HTTPResponse类型
page = response.read()
page = page.decode('utf-8')

urlopen返回对象提供方法：

read() , readline() ,readlines() , fileno() , close() ：对HTTPResponse类型数据进行操作
info()：返回HTTPMessage对象，表示远程服务器返回的头信息
getcode()：返回Http状态码。如果是http请求，200请求成功完成;404网址未找到
geturl()：返回请求的url

1、简单读取网页信息

import urllib.request 
response = urllib.request.urlopen('http://python.org/') 
html = response.read()

2、使用request

urllib.request.Request(url, data=None, headers={}, method=None)

使用request（）来包装请求，再通过urlopen（）获取页面。

import urllib.request 
req = urllib.request.Request('http://python.org/') 
response = urllib.request.urlopen(req) 
the_page = response.read()

3、发送数据，以登录知乎为例

''''' 
Created on 2016年5月31日 
 
@author: gionee 
''' 
import gzip 
import re 
import urllib.request 
import urllib.parse 
import http.cookiejar 
 
def ungzip(data): 
  try: 
    print("尝试解压缩...") 
    data = gzip.decompress(data) 
    print("解压完毕") 
  except: 
    print("未经压缩，无需解压") 
   
  return data 
     
def getXSRF(data): 
  cer = re.compile('name=\"_xsrf\" value=\"(.*)\"',flags = 0) 
  strlist = cer.findall(data) 
  return strlist[0] 
 
def getOpener(head): 
  # cookies 处理 
  cj = http.cookiejar.CookieJar() 
  pro = urllib.request.HTTPCookieProcessor(cj) 
  opener = urllib.request.build_opener(pro) 
  header = [] 
  for key,value in head.items(): 
    elem = (key,value) 
    header.append(elem) 
  opener.addheaders = header 
  return opener 
# header信息可以通过firebug获得 
header = { 
  'Connection': 'Keep-Alive', 
  'Accept': 'text/html, application/xhtml+xml, */*', 
  'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3', 
  'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0', 
  'Accept-Encoding': 'gzip, deflate', 
  'Host': 'www.zhihu.com', 
  'DNT': '1' 
} 
 
url = 'http://www.zhihu.com/' 
opener = getOpener(header) 
op = opener.open(url) 
data = op.read() 
data = ungzip(data) 
_xsrf = getXSRF(data.decode()) 
 
url += "login/email" 
email = "登录账号" 
password = "登录密码" 
postDict = { 
  '_xsrf': _xsrf, 
  'email': email, 
  'password': password, 
  'rememberme': 'y'  
} 
postData = urllib.parse.urlencode(postDict).encode() 
op = opener.open(url,postData) 
data = op.read() 
data = ungzip(data) 
 
print(data.decode())

4、http错误

import urllib.request 
req = urllib.request.Request('http://www.lz881228.blog.163.com ') 
try: 
  urllib.request.urlopen(req) 
except urllib.error.HTTPError as e: 
print(e.code) 
print(e.read().decode("utf8"))

5、异常处理

from urllib.request import Request, urlopen 
from urllib.error import URLError, HTTPError 
 
req = Request("http://www.abc.com /") 
try: 
  response = urlopen(req) 
except HTTPError as e: 
  print('The server couldn't fulfill the request.') 
  print('Error code: ', e.code) 
except URLError as e: 
  print('We failed to reach a server.') 
  print('Reason: ', e.reason) 
else: 
  print("good!") 
  print(response.read().decode("utf8"))

6、http认证

import urllib.request 
 
# create a password manager 
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm() 
 
# Add the username and password. 
# If we knew the realm, we could use it instead of None. 
top_level_url = "https://3water.com /" 
password_mgr.add_password(None, top_level_url, 'rekfan', 'xxxxxx') 
 
handler = urllib.request.HTTPBasicAuthHandler(password_mgr) 
 
# create "opener" (OpenerDirector instance) 
opener = urllib.request.build_opener(handler) 
 
# use the opener to fetch a URL 
a_url = "https://3water.com /" 
x = opener.open(a_url) 
print(x.read()) 
 
# Install the opener. 
# Now all calls to urllib.request.urlopen use our opener. 
urllib.request.install_opener(opener) 
a = urllib.request.urlopen(a_url).read().decode('utf8') 
 
print(a)

7、使用代理

import urllib.request 
 
proxy_support = urllib.request.ProxyHandler({'sock5': 'localhost:1080'}) 
opener = urllib.request.build_opener(proxy_support) 
urllib.request.install_opener(opener) 
 
a = urllib.request.urlopen("http://www.baidu.com ").read().decode("utf8") 
print(a)

8、超时

import socket 
import urllib.request 
 
# timeout in seconds 
timeout = 2 
socket.setdefaulttimeout(timeout) 
 
# this call to urllib.request.urlopen now uses the default timeout 
# we have set in the socket module 
req = urllib.request.Request('https://3water.com /') 
a = urllib.request.urlopen(req).read() 
print(a)

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

Python3学习urllib的使用方法示例

- Author -

Data&Truth

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

python字符串连接的N种方式总结

Sep 17 Python

Python socket网络编程TCP/IP服务器与客户端通信

Jan 05 Python

Python中扩展包的安装方法详解

Jun 14 Python

Python简单计算数组元素平均值的方法示例

Dec 26 Python

在python win系统下打开TXT文件的实例

Apr 29 Python

python 二维数组90度旋转的方法

Jan 28 Python

python抓取需要扫微信登陆页面

Apr 29 Python

Python利用pandas处理Excel数据的应用详解

Jun 18 Python

详解python uiautomator2 watcher的使用方法

Sep 09 Python

python正则过滤字母、中文、数字及特殊字符方法详解

Feb 11 Python

Django 后台带有字典的列表数据与页面js交互实例

Apr 03 Python

Python first-order-model实现让照片动起来

Jun 25 Python

Python实现的选择排序算法示例

Nov 29 #Python

Python实现的桶排序算法示例

Nov 29 #Python

[原创]教女朋友学Python（一）运行环境搭建

Nov 29 #Python

对变量赋值的理解--Pyton中让两个值互换的实现方法

Nov 29 #Python

基于Python函数的作用域规则和闭包(详解)

Nov 29 #Python

JSONLINT：python的json数据验证库实例解析

Nov 28 #Python

详解如何使用Python编写vim插件

Nov 28 #Python

You might like

杏林同学录(七)

2006/10/09 PHP

FleaPHP的安全设置方法

2008/09/15 PHP

PHP中对各种加密算法、Hash算法的速度测试对比代码

2014/07/08 PHP

PHP高手需要要掌握的知识点

2014/08/21 PHP

PHP之sprintf函数用法详解

2014/11/12 PHP

jQuery 过滤not()与filter()实例代码

2012/05/10 Javascript

如何使用jQuery Draggable和Droppable实现拖拽功能

2013/07/05 Javascript

jquery 使用简明教程

2014/03/05 Javascript

JS中类或对象的定义说明

2014/03/10 Javascript

JavaScript中的eval()函数使用介绍

2014/12/31 Javascript

jQuery修改class属性和CSS样式整理

2015/01/30 Javascript

WordPress中鼠标悬停显示和隐藏评论及引用按钮的实现

2016/01/12 Javascript

jQuery多个版本和其他js库冲突的解决方法

2016/08/11 Javascript

nodejs学习笔记之路由

2017/03/27 NodeJs

JS 数组随机洗牌的实例代码

2018/09/12 Javascript

JS数组实现分类统计实例代码

2018/09/30 Javascript

[47:35]VP vs Pain 2018国际邀请赛小组赛BO2 第一场 8.18

2018/08/20 DOTA

[38:54]完美世界DOTA2联赛PWL S2 Rebirth vs LBZS 第一场 11.28

2020/12/01 DOTA

基python实现多线程网页爬虫

2015/09/06 Python

python检查URL是否正常访问的小技巧

2017/02/25 Python

Centos 升级到python3后pip 无法使用的解决方法

2018/06/12 Python

很酷的python表白工具你喜欢我吗

2019/04/11 Python

python flask框架实现传数据到js的方法分析

2019/06/11 Python

Python实现的ftp服务器功能详解【附源码下载】

2019/06/26 Python

python库matplotlib绘制坐标图

2019/10/18 Python

python打印n位数“水仙花数”(实例代码)

2019/12/25 Python

基于python实现matlab filter函数过程详解

2020/06/08 Python

Django实现文章详情页面跳转代码实例

2020/09/16 Python

What is EJB

2016/07/22 面试题

Set里的元素是不能重复的，那么用什么方法来区分重复与否呢？是用==还是equals()？它们有何区别？

2014/07/27 面试题

优秀本科生求职推荐信

2014/02/24 职场文书

大学生党员承诺书

2014/05/20 职场文书

房地产销售经理岗位职责

2015/02/02 职场文书

简单的辞职信模板

2015/05/12 职场文书

php TP5框架生成二维码链接

2021/04/01 PHP

Java实现字符串转为驼峰格式的方法详解

2022/07/07 Java/Android