Requests库网络爬虫实战

栏目: 编程工具 · 发布时间: 6年前

1.京东商品页面的爬取

>>> import requests
>>> r = request.get('https://item.jd.com/14815925977.html')
>>> r.encoding
'gbk'
>>> r.text[:1000]
'<!DOCTYPE HTML>\n<html lang="zh-CN">\n<head>\n    <!-- shouji -->\n    <meta http-equiv="Content-Type" content="text/html; charset=gbk" />\n    <title>安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装【图片 价格 品牌 报价】-京东</title>\n    <meta name="keywords" content="安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装,安佳(Anchor),,京东,网上购物"/>\n    <meta name="description" content="安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装图片、价格、品牌样样齐全!【京东正品行货,全国 配送,心动不如行动,立即购买享受更多优惠哦!】" />\n    <meta name="format-detection" content="telephone=no">\n    <meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/14815925977.html">\n    <meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/14815925977.html">\n    <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n    <link rel="canonical" href="//item.jd.com/14815925977.html"/>\n        <link rel="dns-prefetch" href="//misc.360buyimg.com"/>\n    <link rel="dns-prefetch" href="//static.360buyimg.com"/>\n    <link rel="dns-prefetch" href="//img10.360buyimg.com"/>\n    <link rel="dns-pre'

全代码

import requests
url = 'https://item.jd.com/14815925977.html'
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000])
except:
    print('爬取失败')

2.亚马逊商品页面的爬取

>>> import requests
>>> r = requests.get('https://www.amazon.cn/dp/B01ION3VWI')
>>> r.stauts_code
503
>>> r.encoding
'ISO-8859-1'
>>> r.encoding = r.apparent_encoding
>>> r.text
'<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="zh-CN" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]-->\n<!--[if IE 7]>    <html lang="zh-CN" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]-->\n<!--[if IE 8]>    <html lang="zh-CN" class="a-no-js a-lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!-->\n<html class="a-no-js" lang="zh-CN"><!--<![endif]--><head>\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n<title dir="ltr">Amazon CAPTCHA</title>\n<meta name="viewport" content="width=device-width">\n<link rel="stylesheet" href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css">\n<script>\n\nif (true === true) {\n    var ue_t0 = (+ new Date()),\n        ue_csm = window,\n        ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },\n        ue_furl = "fls-cn.amazon.cn",\n        ue_mid = "AAHKV2X7AFYLW",\n        ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],\n        ue_sn = "opfcaptcha.amazon.cn",\n        ue_id = \'WHD3D9ZKD1CAMR1ZBMA9\';\n}\n</script>\n</head>\n<body>\n\n<!--\n        To discuss automated access to Amazon data please contact api-services-support@amazon.com.\n        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com.cn/index.html/ref=rm_c_sv, or our Product Advertising API at https://associates.amazon.cn/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.\n-->\n\n<!--\nCorreios.DoNotSend\n-->\n\n<div class="a-container a-padding-double-large" style="min-width:350px;padding:44px 0 !important">\n\n    <div class="a-row a-spacing-double-large" style="width: 350px; margin: 0 auto">\n\n        <div class="a-row a-spacing-medium a-text-center"><i class="a-icon a-logo"></i></div>\n\n        <div class="a-box a-alert a-alert-info a-spacing-base">\n            <div class="a-box-inner">\n                <i class="a-icon a-icon-alert"></i>\n                <h4>请输入您在下方看到的字符</h4>\n                <p class="a-last">抱歉,我们只是想确认一下当前访问者并非自动程序。为了达到最佳效果,请确保您浏览器上的 Cookie 已启用。</p>\n                </div>\n            </div>\n\n            <div class="a-section">\n\n                <div class="a-box a-color-offset-background">\n                    <div class="a-box-inner a-padding-extra-large">\n\n                        <form method="get" action="/errors/validateCaptcha" name="">\n                            <input type=hidden name="amzn" value="Ds6Fb8xSEP8SX63xhhDPcw==" /><input type=hidden name="amzn-r" value=" dp B01ION3VWI" />\n                            <div class="a-row a-spacing-large">\n                                <div class="a-box">\n                                    <div class="a-box-inner">\n                                        <h4>请输入您在这个图片中看到 的字符:</h4>\n                                        <div class="a-row a-text-center">\n                                            <img src="https://images-na.ssl-images-amazon.com/captcha/qujzzelu/Captcha_jtbnpmutnr.jpg">\n                                        </div>\n                                        <div class="a-row a-spacing-base">\n                                            <div class="a-row">\n                                                <div class="a-column a-span6">\n                                                    <label for="captchacharacters">输入字符</label>\n                                                </div>\n                                                <div class="a-column a-span6 a-span-last a-text-right">\n                                                    <a onclick="window.location.reload()">换一张图</a>\n                                                </div>\n                                            </div>\n                                            <input autocomplete="off" spellcheck="false" id="captchacharacters" name="field-keywords" class="a-span12" autocapitalize="off" autocorrect="off" type="text">\n                                        </div>\n                                    </div>\n                                </div>\n                            </div>\n\n                            <div class="a-section a-spacing-extra-large">\n\n                                <div class="a-row">\n                                    <span class="a-button a-button-primary a-span12">\n                                        <span class="a-button-inner">\n                                            <button type="submit" class="a-button-text">继续购物</button>\n                                        </span>\n                                    </span>\n                                </div>\n\n                            </div>\n                        </form>\n\n                    </div>\n                </div>\n\n            </div>\n\n        </div>\n\n        <div class="a-divider a-divider-section"><div class="a-divider-inner"></div></div>\n\n        <div class="a-text-center a-spacing-small a-size-mini">\n            <a href="https://www.amazon.cn/gp/help/customer/display.html/ref=footer_claim?ie=UTF8&nodeId=200347160">使用条件</a>\n            <span class="a-letter-space"></span>\n            <span class="a-letter-space"></span>\n            <span class="a-letter-space"></span>\n            <span class="a-letter-space"></span>\n            <a href="https://www.amazon.cn/gp/help/customer/display.html/ref=footer_privacy?ie=UTF8&nodeId=200347130">隐私声明</a>\n        </div>\n\n        <div class="a-text-center a-size-mini a-color-secondary">\n          © 1996-2015, Amazon.com, Inc. or its affiliates\n          <script>\n           if (true === true) {\n             document.write(\'<img src="https://fls-cn.amaz\'+\'on.cn/\'+\'1/oc-csi/1/OP/requestId=WHD3D9ZKD1CAMR1ZBMA9&js=1" />\');\n           };\n          </script>\n          <noscript>\n            <img src="https://fls-cn.amazon.cn/1/oc-csi/1/OP/requestId=WHD3D9ZKD1CAMR1ZBMA9&js=0" />\n          </noscript>\n        </div>\n    </div>\n    <script>\n    if (true === true) {\n        var elem = document.createElement("script");\n        elem.src = "https://images-cn.ssl-images-amazon.com/images/G/01/csminstrumentation/csm-captcha-instrumentation.min._V" + (+ new Date()) + "_.js";\n        document.getElementsByTagName(\'head\')[0].appendChild(elem);\n    }\n    </script>\n</body></html>\n'

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Black Box Society

The Black Box Society

Frank Pasquale / Harvard University Press / 2015-1-5 / USD 35.00

Every day, corporations are connecting the dots about our personal behavior—silently scrutinizing clues left behind by our work habits and Internet use. The data compiled and portraits created are inc......一起来看看 《The Black Box Society》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码