编程 Python

python实现百度关键词排名查询

Posted in Python onMarch 30, 2014

就是一个简单的python查询百度关键词排名的函数，以下是一些简介：
1、UA随机
2、操作简单方便，直接getRank(关键词，域名)就可以了
3、编码转化。编码方面应该没啥问题了。
4、结果丰富。不仅有排名，还有搜索结果的title，URL，快照时间，符合SEO需求
5、拿来做个软件或者自己用都很方便。

功能是单线程实现，速度慢，大家可以参考修改成自己需要的。

#coding=utf-8
import requests
import BeautifulSoup
import re
import random
def decodeAnyWord(w):
    try:
        w.decode('utf-8')
    except:
        w = w.decode('gb2312')
    else:
        w = w.decode('utf-8')
    return w
def createURL(checkWord):   #create baidu URL with search words
    checkWord = checkWord.strip()
    checkWord = checkWord.replace(' ', '+').replace('\n', '')
    baiduURL = 'http://www.baidu.com/s?wd=%s&rn=100' % checkWord
    return baiduURL
def getContent(baiduURL):   #get the content of the serp
    uaList = ['Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322;+TencentTraveler)',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729)',
    'Mozilla/5.0+(Windows+NT+5.1)+AppleWebKit/537.1+(KHTML,+like+Gecko)+Chrome/21.0.1180.89+Safari/537.1',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)',
    'Mozilla/5.0+(Windows+NT+6.1;+rv:11.0)+Gecko/20100101+Firefox/11.0',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+SV1)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+GTB7.1;+.NET+CLR+2.0.50727)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+KB974489)']
    headers = {'User-Agent': random.choice(uaList)}
    r = requests.get(baiduURL, headers = headers)
    return r.content
def getLastURL(rawurl): #get final URL while there're redirects
    r = requests.get(rawurl)
    return r.url
def getAtext(atext):    #get the text with <a> and </a>
    pat = re.compile(r'<a .*?>(.*?)</a>')
    match = pat.findall(atext.replace('\n', ''))
    pureText = match[0].replace('<em>', '').replace('</em>', '')
    return pureText.replace('\n', '')
def getCacheDate(t):    #get the date of cache
    pat = re.compile(r'<span class="g">.*?(\d{4}-\d{1,2}-\d{1,2}) </span>')
    match = pat.findall(t)
    cacheDate = match[0]
    return cacheDate
def getRank(checkWord, domain): #main line
    checkWord = checkWord.replace('\n', '')
    checkWord = decodeAnyWord(checkWord)
    baiduURL = createURL(checkWord)
    cont = getContent(baiduURL)
    soup = BeautifulSoup.BeautifulSoup(cont)
    results = soup.findAll('table', {'class': 'result'})    #find all results in this page
    for result in results:
        checkData = unicode(result.find('span', {'class': 'g'}))
        if re.compile(r'^[^/]*%s.*?' %domain).match(checkData.replace('<b>', '').replace('</b>', '')): #改正则
            nowRank = result['id']  #get the rank if match the domain info
            resLink = result.find('h3').a
            resURL = resLink['href']
            domainURL = getLastURL(resURL)  #get the target URL
            resTitle = getAtext(unicode(resLink))   #get the title of the target page
            rescache = result.find('span', {'class': 'g'})
            cacheDate = getCacheDate(unicode(rescache)) #get the cache date of the target page
            res = u'%s, 第%s名, %s, %s, %s' % (checkWord, nowRank, resTitle, cacheDate, domainURL)
            return res.encode('gb2312')
            break
    else:
        return '>100'

domain = 'www.baidu.com' #set the domain which you want to search.
print getRank('百度', domain)

python实现百度关键词排名查询

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Django在Win7下的安装及创建项目hello word简明教程

Jul 14 Python

Python XML RPC服务器端和客户端实例

Nov 22 Python

目前最全的python的就业方向

Jun 05 Python

Python使用jsonpath-rw模块处理Json对象操作示例

Jul 31 Python

python字符串替换re.sub()方法解析

Sep 18 Python

Python拆分大型CSV文件代码实例

Oct 07 Python

django model的update时auto_now不被更新的原因及解决方式

Apr 01 Python

Python接口开发实现步骤详解

Apr 26 Python

Python-openCV开运算实例

Jul 05 Python

opencv 形态学变换(开运算，闭运算，梯度运算)

Jul 07 Python

python识别验证码的思路及解决方案

Sep 13 Python

python获取时间戳的实现示例(10位和13位)

Sep 23 Python

python获取网页状态码示例

Mar 30 #Python

python单线程实现多个定时器示例

Mar 30 #Python

python实现猜数字游戏(无重复数字)示例分享

Mar 29 #Python

使用python实现扫描端口示例

Mar 29 #Python

Python Trie树实现字典排序

Mar 28 #Python

python实现探测socket和web服务示例

Mar 28 #Python

python实现目录树生成示例

Mar 28 #Python

You might like

黑夜路人出的几道php笔试题

2009/08/04 PHP

ExtJS与PHP、MySQL实现存储的方法

2010/04/02 PHP

php简单提示框alert封装函数

2010/08/08 PHP

php图像处理函数大全(推荐收藏)

2013/07/11 PHP

Linux中为php配置伪静态

2014/12/17 PHP

php实现递归与无限分类的方法

2015/02/16 PHP

PHP Laravel 上传图片、文件等类封装

2017/08/16 PHP

使用PHPUnit进行单元测试并生成代码覆盖率报告的方法

2019/03/08 PHP

让innerText在firefox火狐和IE浏览器都能用的写法

2011/05/14 Javascript

JS打开新窗口的2种方式

2013/04/18 Javascript

防止浏览器记住用户名及密码的简单实用方法

2013/04/22 Javascript

jQuery对val和atrr("value")赋值的区别介绍

2014/09/26 Javascript

jquery图片倾斜层叠切换特效代码分享

2015/08/27 Javascript

浅谈JSON.stringify()和JOSN.parse()方法的不同

2016/08/29 Javascript

mvc 、bootstrap 结合分布式图简单实现分页

2016/10/10 Javascript

Vue键盘事件用法总结

2017/04/18 Javascript

使用vue与jquery实时监听用户输入状态的操作代码

2017/09/19 jQuery

angularJs 表格添加删除修改查询方法

2018/02/27 Javascript

vue-router 2.0 跳转之router.push()用法说明

2020/08/12 Javascript

python解析xml文件实例分享

2013/12/04 Python

Python中生成器和yield语句的用法详解

2015/04/17 Python

Python中的with...as用法介绍

2015/05/28 Python

Python3实现带附件的定时发送邮件功能

2020/12/22 Python

Python访问MongoDB,并且转换成Dataframe的方法

2018/10/15 Python

Python自动发送和收取邮件的方法

2020/08/12 Python

一文读懂Python 枚举

2020/08/25 Python

Regatta官网：英国最受欢迎的户外服装和鞋类品牌

2019/05/01 全球购物

英国排名第一的LED灯泡网站：LED Bulbs

2019/09/03 全球购物

中级会计职业生涯规划范文

2014/01/16 职场文书

《曹刿论战》教学反思

2014/03/02 职场文书

婚礼主持结束词

2014/03/13 职场文书

公司仓管员岗位职责

2015/04/01 职场文书

整改通知书格式

2015/04/22 职场文书

房屋维修申请报告

2015/05/18 职场文书

2015年中秋节主持词

2015/07/30 职场文书

详解Vue的options

2021/05/15 Vue.js