Python 批量刷博客园访问量脚本过程解析


Posted in Python onAugust 30, 2019

今早无聊。。。7点起来突然想写个刷访问量的。。那就动手吧

仅供测试,不建议刷访问量哦~~

很简单的思路,第一步提取代理ip,第二步模拟访问。

提取HTTP代理IP

网上很多收费的代理和免费的代理IP

如:

Python 批量刷博客园访问量脚本过程解析

无论哪个网站,我们需要的就是爬取上面的ip和端口号,整理到一起。

具体的网站根据具体的结构爬取 比如上面那个网站,ip和端口在td标签

Python 批量刷博客园访问量脚本过程解析

这里利用bs4爬取即可。贴上脚本

##获取代理ip
def Get_proxy_ip():
  print("==========批量提取ip刷博客园访问量 By 卿=========")
  print("     Blogs:https://www.cnblogs.com/-qing-/")
  print("     Started!   ")
  global proxy_list
  proxy_list = []
  url = "https://www.kuaidaili.com/free/inha/"
  headers = {
      "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
      "Accept-Encoding":"gzip, deflate, sdch, br",
      "Accept-Language":"zh-CN,zh;q=0.8",
      "Cache-Control":"max-age=0",
      "Connection":"keep-alive",
      "Cookie":"channelid=0; sid=1561681200472193; _ga=GA1.2.762166746.1561681203; _gid=GA1.2.971407760.1561681203; _gat=1; Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203; Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203",
      "Host":"www.kuaidaili.com",
      "Upgrade-Insecure-Requests":"1",
      "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0",
      "Referrer Policy":"no-referrer-when-downgrade",
      }
  for i in range(1,100):
    url = url = "https://www.kuaidaili.com/free/inha/"+str(i)
    html = requests.get(url = url,headers = headers).content
    soup = BeautifulSoup(html,'html.parser')
    ip_list = '';
    port_list = '';
    protocol_list = '';
    for ip in soup.find_all('td'):
      if "IP" in ip.get('data-title') :
        ip_list = ip.get_text()##获取ip       
      if "PORT" in ip.get('data-title'):
        port_list = ip.get_text()##获取port
      if ip_list != '' and port_list != '':
        proxy = ip_list+":"+port_list
        ip_list = '';
        port_list = '';
        proxy_list.append(proxy)
    iv_main()
    time.sleep(2)
    proxy_list = []

这样就把 提取的ip和端口放到列表里

模拟访问刷博客园文章

这里就很简单了 ,遍历上面那个代理ip的列表,使用requests模块取访问就是了

def iv_main():
  proxies = {}
  requests.packages.urllib3.disable_warnings()
  #proxy_ip = random.choice(proxy_list)
  url = 'https://www.cnblogs.com/-qing-/p/11080845.html'
  for proxy_ip in proxy_list:
    headers2 = {
      'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'accept-encoding':'gzip, deflate, sdch, br',
      'accept-language':'zh-CN,zh;q=0.8',
      'cache-control':'max-age=0',
      'cookie':'__gads=ID=8c6fd85d91262bb1:T=1561554219:S=ALNI_MZwz0CMKQJK-L19DrX5DPDtYvp63Q; _gat=1; _ga=GA1.2.359634670.1561535095; _gid=GA1.2.1087331661.1561535095',
      'if-modified-since':'Fri, 28 Jun 2019 02:10:23 GMT',
      'referer':'https://www.cnblogs.com/',
      'upgrade-insecure-requests':'1',
      'user-agent':random.choice(user_agent_list),
      }
    proxies['HTTP'] = proxy_ip
    #user_agent = random.choice(user_agent_list)
    try:
      r = requests.get(url,headers=headers2,proxies=proxies,verify=False) #verify是否验证服务器的SSL证书
      print("[*]"+proxy_ip+"访问成功!")
    except:
      print("[-]"+proxy_ip+"访问失败!")

最好带上随机的ua请求头

user_agent_list = [
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) "
           "Chrome/45.0.2454.85 Safari/537.36 115Browser/6.0.3",
  "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
  "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
]

优化整合

这里可以稍微优化下,加入队列线程优化(虽然python这个没啥用)

最终代码整合:

# -*- coding:utf-8 -*-
#By 卿 
#Blog:https://www.cnblogs.com/-qing-/

import requests
from bs4 import BeautifulSoup
import re
import time
import random
import threading

print("==========批量提取ip刷博客园访问量 By 卿=========")
print("     Blogs:https://www.cnblogs.com/-qing-/")
print("     Started!   ")
user_agent_list = [
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) "
           "Chrome/45.0.2454.85 Safari/537.36 115Browser/6.0.3",
  "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
  "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
  "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
]
def iv_main():
  proxies = {}
  requests.packages.urllib3.disable_warnings()
  #proxy_ip = random.choice(proxy_list)
  url = 'https://www.cnblogs.com/-qing-/p/11080845.html'
  for proxy_ip in proxy_list:
    headers2 = {
      'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'accept-encoding':'gzip, deflate, sdch, br',
      'accept-language':'zh-CN,zh;q=0.8',
      'cache-control':'max-age=0',
      'cookie':'__gads=ID=8c6fd85d91262bb1:T=1561554219:S=ALNI_MZwz0CMKQJK-L19DrX5DPDtYvp63Q; _gat=1; _ga=GA1.2.359634670.1561535095; _gid=GA1.2.1087331661.1561535095',
      'if-modified-since':'Fri, 28 Jun 2019 02:10:23 GMT',
      'referer':'https://www.cnblogs.com/',
      'upgrade-insecure-requests':'1',
      'user-agent':random.choice(user_agent_list),
      }
    proxies['HTTP'] = proxy_ip
    #user_agent = random.choice(user_agent_list)
    try:
      r = requests.get(url,headers=headers2,proxies=proxies,verify=False) #verify是否验证服务器的SSL证书
      print("[*]"+proxy_ip+"访问成功!")
    except:
      print("[-]"+proxy_ip+"访问失败!")

##获取代理ip
def Get_proxy_ip():
  global proxy_list
  proxy_list = []
  url = "https://www.kuaidaili.com/free/inha/"
  headers = {
      "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
      "Accept-Encoding":"gzip, deflate, sdch, br",
      "Accept-Language":"zh-CN,zh;q=0.8",
      "Cache-Control":"max-age=0",
      "Connection":"keep-alive",
      "Cookie":"channelid=0; sid=1561681200472193; _ga=GA1.2.762166746.1561681203; _gid=GA1.2.971407760.1561681203; _gat=1; Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203; Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31=1561681203",
      "Host":"www.kuaidaili.com",
      "Upgrade-Insecure-Requests":"1",
      "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0",
      "Referrer Policy":"no-referrer-when-downgrade",
      }
  for i in range(1,100):
    url = url = "https://www.kuaidaili.com/free/inha/"+str(i)
    html = requests.get(url = url,headers = headers).content
    soup = BeautifulSoup(html,'html.parser')
    ip_list = '';
    port_list = '';
    protocol_list = '';
    for ip in soup.find_all('td'):
      if "IP" in ip.get('data-title') :
        ip_list = ip.get_text()##获取ip       
      if "PORT" in ip.get('data-title'):
        port_list = ip.get_text()##获取port
      if ip_list != '' and port_list != '':
        proxy = ip_list+":"+port_list
        ip_list = '';
        port_list = '';
        proxy_list.append(proxy)
    iv_main()
    time.sleep(2)
    proxy_list = []    
th=[]
th_num=10
for x in range(th_num):
    t=threading.Thread(target=Get_proxy_ip)
    th.append(t)
for x in range(th_num):
    th[x].start()
for x in range(th_num):
    th[x].join()

结果

Python 批量刷博客园访问量脚本过程解析

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python实现从一组颜色中找出与给定颜色最接近颜色的方法
Mar 19 Python
python将unicode转为str的方法
Jun 21 Python
python 对多个csv文件分别进行处理的方法
Jan 07 Python
详解用Python练习画个美队盾牌
Mar 23 Python
pandas如何处理缺失值
Jul 31 Python
pycharm配置git(图文教程)
Aug 16 Python
python GUI库图形界面开发之PyQt5开发环境配置与基础使用
Feb 25 Python
python logging通过json文件配置的步骤
Apr 27 Python
如何利用Python 进行边缘检测
Oct 14 Python
python读写数据读写csv文件(pandas用法)
Dec 14 Python
python源文件的字符编码知识点详解
Mar 04 Python
python 常用的异步框架汇总整理
Jun 18 Python
快速解决docker-py api版本不兼容的问题
Aug 30 #Python
Python 使用 Pillow 模块给图片添加文字水印的方法
Aug 30 #Python
python pillow模块使用方法详解
Aug 30 #Python
docker-py 用Python调用Docker接口的方法
Aug 30 #Python
tesserocr与pytesseract模块的使用方法解析
Aug 30 #Python
Django获取应用下的所有models的例子
Aug 30 #Python
Django自带日志 settings.py文件配置方法
Aug 30 #Python
You might like
一个ORACLE分页程序,挺实用的.
2006/10/09 PHP
PHP实现恶意DDOS攻击避免带宽占用问题方法
2015/05/27 PHP
[原创]php正则删除html代码中class样式属性的方法
2017/05/24 PHP
让你的PHP,APACHE,NGINX支持大文件上传
2021/03/09 PHP
JavaScrip单线程引擎工作原理分析
2010/09/04 Javascript
JQuery中使用Ajax赋值给全局变量异常的解决方法
2014/01/10 Javascript
判断iframe里的页面是否加载完成
2014/06/06 Javascript
jQuery实现在列表的首行添加数据
2015/05/19 Javascript
AngularJS 在同一个界面启动多个ng-app应用模块详解
2016/12/20 Javascript
前端开发必知的15个jQuery小技巧
2017/01/22 Javascript
Vue的Flux框架之Vuex状态管理器
2017/07/30 Javascript
Angular ElementRef简介及其使用
2018/10/01 Javascript
Vue 递归多级菜单的实例代码
2019/05/05 Javascript
微信小程序使用canvas自适应屏幕画海报并保存图片功能
2019/07/25 Javascript
基于layui轮播图满屏是高度自适应的解决方法
2019/09/16 Javascript
vue如何使用async、await实现同步请求
2019/12/09 Javascript
Vue.js 中制作自定义选择组件的代码附演示demo
2020/02/28 Javascript
JavaScript定时器使用方法详解
2020/03/26 Javascript
javascript设计模式 ? 迭代器模式原理与用法实例分析
2020/04/17 Javascript
详解vite+ts快速搭建vue3项目以及介绍相关特性
2021/02/25 Vue.js
[02:34]肉山说——泡妞篇
2014/09/16 DOTA
[00:12]2018DOTA2亚洲邀请赛 Somnus丶M出阵单挑
2018/04/06 DOTA
简单介绍Python中的len()函数的使用
2015/04/07 Python
对Python实现累加函数的方法详解
2019/01/23 Python
python的内存管理和垃圾回收机制详解
2019/05/18 Python
使用python进行广告点击率的预测的实现
2019/07/04 Python
python实现邮件自动发送
2019/08/10 Python
Python爬虫库requests获取响应内容、响应状态码、响应头
2020/01/25 Python
Python虚拟环境virtualenv创建及使用过程图解
2020/12/08 Python
美国知名生活购物网站:Goop
2017/11/03 全球购物
阿里巴巴的Oracle DBA笔试题答案-SQL tuning类
2016/04/03 面试题
remote接口和home接口主要作用
2013/05/15 面试题
小学新学期寄语
2014/04/02 职场文书
高中班主任评语
2014/12/30 职场文书
2016年11月份红领巾广播稿
2015/12/21 职场文书
Java 垃圾回收超详细讲解记忆集和卡表
2022/04/08 Java/Android