Python实现多线程下载文件的代码实例


Posted in Python onJune 01, 2014

实现简单的多线程下载,需要关注如下几点:
1.文件的大小:可以从reponse header中提取,如“Content-Length:911”表示大小是911字节
2.任务拆分:指定各个线程下载的文件的哪一块,可以通过request header中添加“Range: bytes=300-400”(表示下载300~400byte的内容),注意可以请求的文件的range是[0, size-1]字节的。
3.下载文件的聚合:各个线程将自己下载的文件块保存为临时文件,所有线程都完成后,再将这些临时文件按顺序聚合写入到最终的一个文件中。

实现代码:

#!/usr/bin/python
# -*- coding: utf-8 -*-
# filename: paxel.py
# FROM: http://3water.com/code/view/58/full/
# Jay modified it a little and save for further potential usage.'''It is a multi-thread downloading tool
    It was developed following axel.
        Author: volans
        E-mail: volansw [at] gmail.com
'''
import sys
import os
import time
import urllib
from threading import Thread
# in case you want to use http_proxy
local_proxies = {'http': 'http://131.139.58.200:8080'}
 
class AxelPython(Thread, urllib.FancyURLopener):
    '''Multi-thread downloading class.
        run() is a vitural method of Thread.
    '''
    def __init__(self, threadname, url, filename, ranges=0, proxies={}):
        Thread.__init__(self, name=threadname)
        urllib.FancyURLopener.__init__(self, proxies)
        self.name = threadname
        self.url = url
        self.filename = filename
        self.ranges = ranges
        self.downloaded = 0
    def run(self):
        '''vertual function in Thread'''
        try:
            self.downloaded = os.path.getsize(self.filename)
        except OSError:
            #print 'never downloaded'
            self.downloaded = 0
        # rebuild start poind
        self.startpoint = self.ranges[0] + self.downloaded
        # This part is completed
        if self.startpoint >= self.ranges[1]:
            print 'Part %s has been downloaded over.' % self.filename
            return
        self.oneTimeSize = 16384  # 16kByte/time
        print 'task %s will download from %d to %d' % (self.name, self.startpoint, self.ranges[1])
        self.addheader("Range", "bytes=%d-%d" % (self.startpoint, self.ranges[1]))
        self.urlhandle = self.open(self.url)
        data = self.urlhandle.read(self.oneTimeSize)
        while data:
            filehandle = open(self.filename, 'ab+')
            filehandle.write(data)
            filehandle.close()
            self.downloaded += len(data)
            #print "%s" % (self.name)
            #progress = u'\r...'
            data = self.urlhandle.read(self.oneTimeSize)
 
def GetUrlFileSize(url, proxies={}):
    urlHandler = urllib.urlopen(url, proxies=proxies)
    headers = urlHandler.info().headers
    length = 0
    for header in headers:
        if header.find('Length') != -1:
            length = header.split(':')[-1].strip()
            length = int(length)
    return length
 
def SpliteBlocks(totalsize, blocknumber):
    blocksize = totalsize / blocknumber
    ranges = []
    for i in range(0, blocknumber - 1):
        ranges.append((i * blocksize, i * blocksize + blocksize - 1))
    ranges.append((blocksize * (blocknumber - 1), totalsize - 1))
    return ranges
 
def islive(tasks):
    for task in tasks:
        if task.isAlive():
            return True
    return False
 
def paxel(url, output, blocks=6, proxies=local_proxies):
    ''' paxel
    '''
    size = GetUrlFileSize(url, proxies)
    ranges = SpliteBlocks(size, blocks)
    threadname = ["thread_%d" % i for i in range(0, blocks)]
    filename = ["tmpfile_%d" % i for i in range(0, blocks)]
    tasks = []
    for i in range(0, blocks):
        task = AxelPython(threadname[i], url, filename[i], ranges[i])
        task.setDaemon(True)
        task.start()
        tasks.append(task)
    time.sleep(2)
    while islive(tasks):
        downloaded = sum([task.downloaded for task in tasks])
        process = downloaded / float(size) * 100
        show = u'\rFilesize:%d Downloaded:%d Completed:%.2f%%' % (size, downloaded, process)
        sys.stdout.write(show)
        sys.stdout.flush()
        time.sleep(0.5)
    filehandle = open(output, 'wb+')
    for i in filename:
        f = open(i, 'rb')
        filehandle.write(f.read())
        f.close()
        try:
            os.remove(i)
            pass
        except:
            pass
    filehandle.close()
if __name__ == '__main__':
    url = 'http://dldir1.qq.com/qqfile/QQforMac/QQ_V3.1.1.dmg'
    output = 'download.file'
    paxel(url, output, blocks=4, proxies={})
Python 相关文章推荐
在Python的Django框架中编写编译函数
Jul 20 Python
Python实现的文本编辑器功能示例
Jun 30 Python
Python中使用haystack实现django全文检索搜索引擎功能
Aug 26 Python
详谈python在windows中的文件路径问题
Apr 28 Python
10 行 Python 代码教你自动发送短信(不想回复工作邮件妙招)
Oct 11 Python
python获取时间及时间格式转换问题实例代码详解
Dec 06 Python
python实现Dijkstra静态寻路算法
Jan 17 Python
Django 实现xadmin后台菜单改为中文
Nov 15 Python
pyqt5中动画的使用详解
Apr 01 Python
基于python模拟bfs和dfs代码实例
Nov 19 Python
基于tensorflow权重文件的解读
May 26 Python
Python用any()函数检查字符串中的字母以及如何使用all()函数
Apr 14 Python
python使用在线API查询IP对应的地理位置信息实例
Jun 01 #Python
pip 错误unused-command-line-argument-hard-error-in-future解决办法
Jun 01 #Python
2款Python内存检测工具介绍和使用方法
Jun 01 #Python
使用Python的Supervisor进行进程监控以及自动启动
May 29 #Python
python应用程序在windows下不出现cmd窗口的办法
May 29 #Python
python正则表达式re模块详细介绍
May 29 #Python
在python中的socket模块使用代理实例
May 29 #Python
You might like
PHP 字符串分割和比较
2009/10/06 PHP
php unset全局变量运用问题的深入解析
2013/06/17 PHP
PHP与Java进行通信的实现方法
2013/10/21 PHP
微信公众平台之快递查询功能用法实例
2015/04/14 PHP
PHP实现上传图片到 zimg 服务器
2016/10/19 PHP
php二维数组按某个键值排序的实例讲解
2019/02/15 PHP
限制文本框输入N个字符的js代码
2010/05/13 Javascript
javascript面向对象之二 命名空间
2011/02/08 Javascript
js二级地域选择的实现方法
2013/06/17 Javascript
Node.js实现批量去除BOM文件头
2014/12/20 Javascript
JQuery中serialize() 序列化
2015/03/13 Javascript
使用JavaScript和CSS实现文本隔行换色的方法
2015/11/04 Javascript
详解JavaScript的Date对象(制作简易钟表)
2020/04/07 Javascript
jQuery简单操作cookie的插件实例
2016/01/13 Javascript
JS实现根据文件字节数返回文件大小的方法
2016/08/02 Javascript
移动端触屏幻灯片图片切换插件idangerous swiper.js
2017/04/10 Javascript
layui.js实现的表单验证功能示例
2017/11/15 Javascript
微信小程序实现日历效果
2018/12/28 Javascript
新手简单了解vue
2019/05/29 Javascript
vue-router的两种模式的区别
2019/05/30 Javascript
countUp.js实现数字动态变化效果
2019/10/17 Javascript
vue项目实现设置根据路由高亮对应的菜单项操作
2020/08/06 Javascript
javascript canvas实现简易时钟例子
2020/09/05 Javascript
Python标准库与第三方库详解
2014/07/22 Python
基于Python实现的扫雷游戏实例代码
2014/08/01 Python
跟老齐学Python之编写类之一创建实例
2014/10/11 Python
Python pyautogui模块实现鼠标键盘自动化方法详解
2020/02/17 Python
Pandas实现一列数据分隔为两列
2020/05/18 Python
Python生成器next方法和send方法区别详解
2020/05/30 Python
浅谈tensorflow 中的图片读取和裁剪方式
2020/06/30 Python
do you have any Best Practice for testing
2016/06/04 面试题
员工安全生产承诺书
2014/05/22 职场文书
群教班子对照检查材料
2014/08/26 职场文书
关于清明节的演讲稿
2014/09/13 职场文书
贯彻落实“八项规定”思想汇报
2014/09/13 职场文书
十大最强妖精系宝可梦,哲尔尼亚斯实力最强,第五被称为大力士
2022/03/18 日漫