PHP curl模拟登录带验证码的网站


Posted in PHP onNovember 30, 2015

需求是这样的,需要登录带验证码的网站,获取数据,但是不可能人为一直去记录数据,想通过自动采集的方式进行,如下是试验出来的结果代码!有需要的可以参考下!

<?php
namespace Home\Controller;
use Think\Controller;
class LoginController extends Controller
{
  protected $cookieName = array('cookie_verify', 'cookie_verify');
  protected $cookiePath = '/cookie/';
  protected $cookiePathFile = array();
  public function index()
  {
    $this->display();
  }
  public function _initialize(){
    foreach($this->cookieName as $key => $name)
    {
      $this->cookiePathFile[] = ROOT_PATH . $this->cookiePath . $this->cookieName[$key] . '_xxx.txt';
    }
  }
  /**
   * 登录xxx
   */
  public function xxxLogin()
  {
    $username = I('username');
    $password = I('password');
    $verifyCode = I('verify');
    $loginData = array(
      '__VIEWSTATE' => '/wEPDwUKMTU0MzAzOTU4NmQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFDExvZ2luX1N1Ym1pdL/yae69NsY163G3yuP0lxjz8oXu',              //不把参数补全可能会不被响应哦
      '__VIEWSTATEGENERATOR' => 'DC42DE27',
      'txt_UserName' => $username,
      'txt_PWD' => $password,
      'txt_VerifyCode' => $verifyCode,
      'SMONEY' => 'ABC',
      'Login_Submit.x' => '52',
      'Login_Submit.y' => '19',
    );
    $getBack = $this->_cookieRequest('http://xxx.com/noLogin.aspx', $loginData);
    if(preg_match('/<div[^\<div]*?id\s*=\s*[\'\"]{1}div_msg[\'\"]{1}.*?>(.*?)<\/div>/s', $getBack, $match)){
      echo 'matched\r\n';
      print_r($match);
    }else{
      echo $getBack, '<br />';
      $paramsFull = parse_url($getBack);
      parse_str($paramsFull['query'], $paramsFull['parsedQuery']);
      if(!empty($paramsFull['parsedQuery']['Warn'])) {
        $msg = "您好,欢迎来P,请先登录。";
        switch ($paramsFull['parsedQuery']['Warn'])
        {
          case '2':
            $msg = '您输入的验证码错误,请重试';
            break;
          case '3':
            $msg = '该帐号不存在,还没帐号?';
            break;
          case '5':
            $msg = '账户已注销';
            break;
          case '6':
            $msg = '密码错误,如果连续错误3次半小时内不能登录!';
            break;
          case '20':
            $msg = '今日密码错误3次及以上,请于半小时后再来登录!';
            break;
          case '21':
            $msg = '今日您所在IP的所有帐号密码错误9次以上,请于半小时后再来登录!';
            break;
          case '22':
            $msg = '登录失败,您所在IP今日登录的帐号过多!';
            break;
          case '23':
            $msg = '登录失败,验证码失效!';
            break;
          case '32':
            $msg = '该帐号已经绑定其他xx帐号!';
            break;
          case '33':
            $msg = '一台电脑一天只能注册一个帐号!';
            break;
        }
        $this->error($msg, '', 5);
      }else{
        $_SESSION['user_id'] = '123456';      //登录设置session
        $this->success('登录P网站成功', U('Index/index'), 5);
      }
    }
  }
  /**
   * 获取验证码
   */
  public function getVerifyCode()
  {
    $img = $this->_cookieRequest('http://xxx.com/VerifyCode_Login.aspx?id=' . rand(10000,999999), null, true, 1);
    echo $img;
  }
  /**
   * 删除cookie
   */
  public function clearCookie()
  {
    for($i = 0; $i <count($this->cookieName); $i++)
    {
      setcookie($this->cookieName[$i], '', time() - 3600);
    }
//    unlink($this->cookiePathFile);
    $this->success('清除cookie成功!');
  }
  /**
   * 带COOKIE的访问curl
   * @param $url 访问地址
   * @param bool|array $data 传递的数据
   * @param bool $redirect 是否获取重定向的地址
   * @return mixed 地址或者返回内容
   */
  public function _cookieRequest($url, $data = null, $redirect = false, $cookieNum = 0)
  {
    $ch = curl_init();
    $params[CURLOPT_URL] = $url;     //请求url地址
    $params[CURLOPT_HEADER] = false; //是否返回响应头信息
    $params[CURLOPT_RETURNTRANSFER] = true; //是否将结果返回
    $params[CURLOPT_FOLLOWLOCATION] = true; //是否重定向
    $params[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 5.1; rv:9.0.1) Gecko/20100101 Firefox/9.0.1';
    if($data)
    {
      $params[CURLOPT_POST] = true;
      $params[CURLOPT_POSTFIELDS] = http_build_query($data);
    }
    //判断是否有cookie,有的话直接使用
    if (!empty($_COOKIE[$this->cookieName[$cookieNum]]) && is_file($this->cookiePathFile[$cookieNum]))
    {
      $params[CURLOPT_COOKIEFILE] = $this->cookiePathFile[$cookieNum];   //这里判断cookie
    }
    else
    {
//      $cookie_jar = tempnam($cookie_path, 'cookie');        //产生一个cookie文件
      $params[CURLOPT_COOKIEJAR] = $this->cookiePathFile[$cookieNum];    //写入cookie信息
      setcookie($this->cookieName[$cookieNum], $this->cookiePathFile[$cookieNum], time() + 120);   //保存cookie路径
    }
    curl_setopt_array($ch, $params);                //传入curl参数
    $content = curl_exec($ch);
    $headers = curl_getinfo($ch);
//    echo $content;
    curl_close($ch);
    if ($url != $headers["url"] && $redirect == false)
 {
return $headers["url"];
 }
return $content;
 }
}

登录以后,就可以使用带cookie的访问其他页面了!

ps:php curl 登录淘宝

提交上去后显示为填写验证码,登录不上去

 填写验证码提交:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>    
   <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
   <title></title>
  </head>
  <body>
  <iframe id='img' src="b.php" width="950" height="300" scrolling="No" frameborder="0"></iframe>
  <form action="tb.php" method="POST">
  <textarea name="vv" cols="50" rows="10">umto=&action=Authenticator&event_submit_do_login=anything&from=tb&fc=default&style=default&css_style=&tid=XOR_1_000000000000000000000000000000_635045544
70A7C717F750278&support=000001&CtrlVersion=1,0,0,7&loginType=3&minititle=&minipara=&pstrong=&llnick=&sign=&need_sign=&isIgnore=&full_redirect=&popid=&callback=&guf=¬_duplite_str=&need_user_id=&poy=XOR_1_000000000000000000000000000000_625A424
A45137C6F7A7F0B786D08&gvfdcname=&gvfdcre=&from_encoding=&TPL_redirect_url=http:www.taobao.com&TPL_username=xxx&TPL_password=xxxx&need_check_code=&&TPL_checkcode=</textarea>
  <input type="submit" />
  </form>
 
  </body>
</html>
<?php
session_start();
if(empty($_SESSION['cookie_jar'])) exit();
$cookie_jar=$_SESSION['cookie_jar'];
$post_fields=$_POST["vv"];
$ch = curl_init('https://login.taobao.com/member/login.jhtml'); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_USERAGENT, 
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQWubi 133; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Tablet PC 2.0; .NET4.0C; .NET4.0E; InfoPath.3; Media Center PC 6.0)"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar); 
$data = curl_exec($ch); 
curl_close($ch);
echo $data;exit;
$ch = curl_init('http://www.taobao.com'); 
curl_setopt($ch, CURLOPT_USERAGENT, 
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQWubi 133; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Tablet PC 2.0; .NET4.0C; .NET4.0E; InfoPath.3; Media Center PC 6.0)"); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE); 
curl_exec($ch); 
curl_close($ch); 
?>

提取验证码

<?php
session_start();
$cookie_jar=tempnam("./temp/","cookie");
$_SESSION['cookie_jar']=$cookie_jar;
$post_fields = "action=Authenticator&event_submit_do_login=anything&from=tb&fc=default&style=default&css_style=&tid=XOR_1_000000000000000000000000000000_635045544
70A7C717F750278&support=000001&CtrlVersion=1,0,0,7&loginType=3&minititle=&minipara=&pstrong=&llnick=&sign=&need_sign=&isIgnore=&full_redirect=&popid=&callback=&guf=¬_duplite_str=&need_user_id=&poy=XOR_1_000000000000000000000000000000_625A424A45137C6F7A7F0B786D08&gvfdcname=&gvfdcre=&from_encoding=&TPL_redirect_url=http:www.taobao.com&TPL_username=xxx&TPL_password=xxx"; 
$ch = curl_init('https://login.taobao.com/member/login.jhtml'); 
curl_setopt($ch, CURLOPT_USERAGENT, 
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQWubi 133; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Tablet PC 2.0; .NET4.0C; .NET4.0E; InfoPath.3; Media Center PC 6.0)"); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
$data = curl_exec($ch); 
curl_close($ch); 
preg_match("/id=\"um_to\" name=\"umto\" value=\"(.*?)\"\/>/", $data, $arr); 
$post_fields = "umto=" . $arr[1] . "&" . $post_fields."&TPL_checkcode="; 
echo "<textarea cols=50 rows=10>" . $post_fields . "</textarea><br/>" ; 
$ch = curl_init('https://login.taobao.com/member/login.jhtml'); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_setopt($ch, CURLOPT_USERAGENT, 
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQWubi 133; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Tablet PC 2.0; .NET4.0C; .NET4.0E; InfoPath.3; Media Center PC 6.0)"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_jar);
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie_jar); 
$data = curl_exec($ch); 
curl_close($ch);
preg_match("/img id=\"J_StandardCode_m\" src=\"(.*?)\" data-src=/", $data, $arr1); 
echo "<img src=".$arr1[1]." />";
exit;
?>
PHP 相关文章推荐
PHP获取类中常量,属性,及方法列表的方法
Apr 09 PHP
php 图片加水印与上传图片加水印php类
May 12 PHP
php 冒泡排序 交换排序法
May 10 PHP
php的大小写敏感问题整理
Dec 29 PHP
使用PHPMyAdmin修复论坛数据库的图文方法
Jan 09 PHP
解析argc argv在php中的应用
Jun 24 PHP
php 判断网页是否是utf8编码的方法
Jun 06 PHP
PHP浮点比较大小的方法
Feb 14 PHP
PHP文件系统管理(实例讲解)
Sep 19 PHP
浅谈php的TS和NTS的区别
Mar 13 PHP
Laravel关系模型指定条件查询方法
Oct 10 PHP
PHP中echo与print区别点整理
Mar 09 PHP
PHP可变函数学习小结
Nov 29 #PHP
PHP可变变量学习小结
Nov 29 #PHP
PHP中对数组的一些常用的增、删、插操作函数总结
Nov 27 #PHP
详解PHP对数组的定义以及数组的创建方法
Nov 27 #PHP
实例简介PHP的一些高级面向对象编程的特性
Nov 27 #PHP
PHP编程中的__clone()方法使用详解
Nov 27 #PHP
PHP通过反射动态加载第三方类和获得类源码的实例
Nov 27 #PHP
You might like
深入php 正则表达式的学习探讨
2013/06/06 PHP
Codeigniter实现智能裁剪图片的方法
2014/06/12 PHP
thinkphp的静态缓存用法分析
2014/11/29 PHP
PHP如何实现订单的延时处理详解
2017/12/30 PHP
JavaScript基本概念初级讲解论坛贴的学习记录
2009/02/22 Javascript
jQuery 学习第六课 实现一个Ajax的TreeView
2010/05/17 Javascript
Jquery练习之表单验证实现代码
2010/12/14 Javascript
根据一段代码浅谈Javascript闭包
2010/12/14 Javascript
将文本输入框内容加入表中的js代码
2013/08/18 Javascript
DOM基础教程之事件对象
2015/01/20 Javascript
实例详解AngularJS实现无限级联动菜单
2016/01/15 Javascript
原生js实现addClass,removeClass,hasClass方法
2016/04/27 Javascript
【经典源码收藏】基于jQuery的项目常见函数封装集合
2016/06/07 Javascript
你知道setTimeout是如何运行的吗?
2016/08/16 Javascript
详解Nodejs的timers模块
2016/12/22 NodeJs
微信小程序 Template详解及简单实例
2017/01/05 Javascript
Node.js+ES6+dropload.js实现移动端下拉加载实例
2017/06/01 Javascript
从零开始用electron手撸一个截屏工具的示例代码
2018/10/10 Javascript
微信小程序CSS3动画下拉菜单效果
2018/11/04 Javascript
如何手动实现一个 JavaScript 模块执行器
2020/10/16 Javascript
[42:20]2014 DOTA2华西杯精英邀请赛5 24 DK VS NewBee
2014/05/25 DOTA
Python基于Socket实现的简单聊天程序示例
2017/08/05 Python
Python实现Kmeans聚类算法
2020/06/10 Python
python中实现控制小数点位数的方法
2019/01/24 Python
基于python实现高速视频传输程序
2019/05/05 Python
对python中不同模块(函数、类、变量)的调用详解
2019/07/16 Python
python threading和multiprocessing模块基本用法实例分析
2019/07/25 Python
仿CSDN Blog返回页面顶部功能实现原理及代码
2013/06/30 HTML / CSS
家得宝官网:The Home Depot(全球最大的家居装饰专业零售商)
2018/12/17 全球购物
什么是反射
2012/03/17 面试题
java程序员面试交流
2012/11/29 面试题
迎新晚会主持词
2014/03/24 职场文书
中班开学寄语
2014/04/04 职场文书
认真学习保证书
2015/02/26 职场文书
酒会开场白大全
2015/06/01 职场文书
通过Python把学姐照片做成拼图游戏
2022/02/15 Python