爬虫之普通的模拟登陆

栏目: C++ · 发布时间: 7年前

内容简介：post与get有什么区别：在日常爬虫中，有些网站需要登录才能获取网站信息，这时我们需要写一个模拟登陆，才能去获取要爬取的页面，然后去分析提取。1.我们首先打开抽屉网站，选择登陆（没注册的先注册），我们按要求输入账号密码，这时我们故意输错密码，打开f12开发者工具，点击network选项，然后点击登陆，截图如下：

post与get有什么区别：

根据HTTP规范，GET一般用于获取/查询资源信息，应该是安全的和幂等。而POST一般用于更新资源信息
get是在url中传递数据，数据放在请求头中。 post是在请求体中传递数据
get传送的数据量较小，只能在请求头上发送数据。post传送的数据量较大，一般被默认为不受限制。
get安全性非常低，post安全性较高。但是执行效率却比Post方法好。建议： 1、get方式的安全性较Post方式要差些，包含机密信息的话，建议用Post数据提交方式； 2、在做数据查询时，建议用Get方式；而在做数据添加、修改或删除时，建议用Post方式；

概述

在日常爬虫中，有些网站需要登录才能获取网站信息，这时我们需要写一个模拟登陆，才能去获取要爬取的页面，然后去分析提取。

模拟登陆抽屉网

1.我们首先打开抽屉网站，选择登陆（没注册的先注册），我们按要求输入账号密码，这时我们故意输错密码，打开f12开发者工具，点击network选项，然后点击登陆，截图如下：

我们通过请求头信息发现，里面有一个form表单，提交了3项数据，我们也可以注意到它返回的response，

这在页面上已经反馈给我们了

模拟登陆

前面分析了数据是如何去提交的，接下来我们就开始模拟了，我们写一个post请求：

import requests

response = requests.post(
    url='https://dig.chouti.com/login',
    data = {
        'phone':'8613185007919',
        'password':'155560',
        'oneMonth':'1'
    }

)
print(response.text)
复制代码

这时我们来看结果：

<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>网站防火墙</title>
<style>
p {
	line-height:20px;
}
ul{ list-style-type:none;}
li{ list-style-type:none;}
</style>
</head>

<body style=" padding:0; margin:0; font:14px/1.5 Microsoft Yahei, 宋体,sans-serif; color:#555;">

 <div style="margin: 0 auto; width:1000px; padding-top:70px; overflow:hidden;">
  
  
  <div style="width:600px; float:left;">
    <div style=" height:40px; line-height:40px; color:#fff; font-size:16px; overflow:hidden; background:#6bb3f6; padding-left:20px;">网站防火墙 </div>
    <div style="border:1px dashed #cdcece; border-top:none; font-size:14px; background:#fff; color:#555; line-height:24px; height:220px; padding:20px 20px 0 20px; overflow-y:auto;background:#f3f7f9;">
      <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-weight:600; color:#fc4f03;">您的请求带有不合法参数，已被拦截！请勿在恶意提交。</span></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">可能原因：您提交的内容包含危险的攻击请求, 自动记录 ip 相关信息通知管理员</p>
<p style=" margin-top:12px; margin-bottom:12px; margin-left:0px; margin-right:0px; -qt-block-indent:1; text-indent:0px;">如何解决：</p>
<ul style="margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; -qt-list-indent: 1;"><li style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">1）检查提交内容；</li>
<li style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">2）普通网站访客，请联系网站管理员；</li></ul>
    </div>
  </div>
</div>
</body></html>

复制代码

网站给我们反馈这个信息，我们看到这个信息应该想到，这是网站的反爬措施，这时我们应该在post请求里面加上‘User—Agent’，更新最后代码如下：

import requests

response = requests.post(
    url='https://dig.chouti.com/login',
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
    },
    data = {
        'phone':'8613185007919',
        'password':'155560',
        'oneMonth':'1'
    }

)
print(response.text)
复制代码

我们运行一下看看结果：

{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_53360741818"}}}
复制代码

说明成功了，这时我们的模拟登陆就完成了

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

高性能网站建设进阶指南

Steve Souders / 口碑网前端团队 / 电子工业出版社 / 2010年4月 / 49.80元

性能是任何一个网站成功的关键，然而，如今日益丰富的内容和大量使用Ajax的Web应用程序已迫使浏览器达到其处理能力的极限。Steve Souders是Google Web性能布道者和前Yahoo！首席性能工程师，他在本书中提供了宝贵的技术来帮助你优化网站性能。 Souders的上一本畅销书《高性能网站建设指南》（High Performance Web Sites）震惊了Web开发界，它揭示......一起来看看《高性能网站建设进阶指南》这本书的介绍吧!

码农工具

爬虫之普通的模拟登陆

概述

模拟登陆抽屉网

模拟登陆

高性能网站建设进阶指南

HTML 压缩/解压工具

XML、JSON 在线转换

html转js在线工具