1.京东商品页面的爬取
>>> import requests >>> r = request.get('https://item.jd.com/14815925977.html') >>> r.encoding 'gbk' >>> r.text[:1000] '<!DOCTYPE HTML>\n<html lang="zh-CN">\n<head>\n <!-- shouji -->\n <meta http-equiv="Content-Type" content="text/html; charset=gbk" />\n <title>安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装【图片 价格 品牌 报价】-京东</title>\n <meta name="keywords" content="安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装,安佳(Anchor),,京东,网上购物"/>\n <meta name="description" content="安佳脱脂牛奶 新西兰进口轻欣脱脂250ml*24整箱装图片、价格、品牌样样齐全!【京东正品行货,全国 配送,心动不如行动,立即购买享受更多优惠哦!】" />\n <meta name="format-detection" content="telephone=no">\n <meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/14815925977.html">\n <meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/14815925977.html">\n <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n <link rel="canonical" href="//item.jd.com/14815925977.html"/>\n <link rel="dns-prefetch" href="//misc.360buyimg.com"/>\n <link rel="dns-prefetch" href="//static.360buyimg.com"/>\n <link rel="dns-prefetch" href="//img10.360buyimg.com"/>\n <link rel="dns-pre'
全代码
import requests url = 'https://item.jd.com/14815925977.html' try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text[:1000]) except: print('爬取失败')
2.亚马逊商品页面的爬取
>>> import requests >>> r = requests.get('https://www.amazon.cn/dp/B01ION3VWI') >>> r.stauts_code 503 >>> r.encoding 'ISO-8859-1' >>> r.encoding = r.apparent_encoding >>> r.text '<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="zh-CN" class="a-no-js a-lt-ie9 a-lt-ie8 a-lt-ie7"> <![endif]-->\n<!--[if IE 7]> <html lang="zh-CN" class="a-no-js a-lt-ie9 a-lt-ie8"> <![endif]-->\n<!--[if IE 8]> <html lang="zh-CN" class="a-no-js a-lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!-->\n<html class="a-no-js" lang="zh-CN"><!--<![endif]--><head>\n<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n<title dir="ltr">Amazon CAPTCHA</title>\n<meta name="viewport" content="width=device-width">\n<link rel="stylesheet" href="https://images-na.ssl-images-amazon.com/images/G/01/AUIClients/AmazonUI-3c913031596ca78a3768f4e934b1cc02ce238101.secure.min._V1_.css">\n<script>\n\nif (true === true) {\n var ue_t0 = (+ new Date()),\n ue_csm = window,\n ue = { t0: ue_t0, d: function() { return (+new Date() - ue_t0); } },\n ue_furl = "fls-cn.amazon.cn",\n ue_mid = "AAHKV2X7AFYLW",\n ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],\n ue_sn = "opfcaptcha.amazon.cn",\n ue_id = \'WHD3D9ZKD1CAMR1ZBMA9\';\n}\n</script>\n</head>\n<body>\n\n<!--\n To discuss automated access to Amazon data please contact api-services-support@amazon.com.\n For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com.cn/index.html/ref=rm_c_sv, or our Product Advertising API at https://associates.amazon.cn/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.\n-->\n\n<!--\nCorreios.DoNotSend\n-->\n\n<div class="a-container a-padding-double-large" style="min-width:350px;padding:44px 0 !important">\n\n <div class="a-row a-spacing-double-large" style="width: 350px; margin: 0 auto">\n\n <div class="a-row a-spacing-medium a-text-center"><i class="a-icon a-logo"></i></div>\n\n <div class="a-box a-alert a-alert-info a-spacing-base">\n <div class="a-box-inner">\n <i class="a-icon a-icon-alert"></i>\n <h4>请输入您在下方看到的字符</h4>\n <p class="a-last">抱歉,我们只是想确认一下当前访问者并非自动程序。为了达到最佳效果,请确保您浏览器上的 Cookie 已启用。</p>\n </div>\n </div>\n\n <div class="a-section">\n\n <div class="a-box a-color-offset-background">\n <div class="a-box-inner a-padding-extra-large">\n\n <form method="get" action="/errors/validateCaptcha" name="">\n <input type=hidden name="amzn" value="Ds6Fb8xSEP8SX63xhhDPcw==" /><input type=hidden name="amzn-r" value=" dp B01ION3VWI" />\n <div class="a-row a-spacing-large">\n <div class="a-box">\n <div class="a-box-inner">\n <h4>请输入您在这个图片中看到 的字符:</h4>\n <div class="a-row a-text-center">\n <img src="https://images-na.ssl-images-amazon.com/captcha/qujzzelu/Captcha_jtbnpmutnr.jpg">\n </div>\n <div class="a-row a-spacing-base">\n <div class="a-row">\n <div class="a-column a-span6">\n <label for="captchacharacters">输入字符</label>\n </div>\n <div class="a-column a-span6 a-span-last a-text-right">\n <a onclick="window.location.reload()">换一张图</a>\n </div>\n </div>\n <input autocomplete="off" spellcheck="false" id="captchacharacters" name="field-keywords" class="a-span12" autocapitalize="off" autocorrect="off" type="text">\n </div>\n </div>\n </div>\n </div>\n\n <div class="a-section a-spacing-extra-large">\n\n <div class="a-row">\n <span class="a-button a-button-primary a-span12">\n <span class="a-button-inner">\n <button type="submit" class="a-button-text">继续购物</button>\n </span>\n </span>\n </div>\n\n </div>\n </form>\n\n </div>\n </div>\n\n </div>\n\n </div>\n\n <div class="a-divider a-divider-section"><div class="a-divider-inner"></div></div>\n\n <div class="a-text-center a-spacing-small a-size-mini">\n <a href="https://www.amazon.cn/gp/help/customer/display.html/ref=footer_claim?ie=UTF8&nodeId=200347160">使用条件</a>\n <span class="a-letter-space"></span>\n <span class="a-letter-space"></span>\n <span class="a-letter-space"></span>\n <span class="a-letter-space"></span>\n <a href="https://www.amazon.cn/gp/help/customer/display.html/ref=footer_privacy?ie=UTF8&nodeId=200347130">隐私声明</a>\n </div>\n\n <div class="a-text-center a-size-mini a-color-secondary">\n © 1996-2015, Amazon.com, Inc. or its affiliates\n <script>\n if (true === true) {\n document.write(\'<img src="https://fls-cn.amaz\'+\'on.cn/\'+\'1/oc-csi/1/OP/requestId=WHD3D9ZKD1CAMR1ZBMA9&js=1" />\');\n };\n </script>\n <noscript>\n <img src="https://fls-cn.amazon.cn/1/oc-csi/1/OP/requestId=WHD3D9ZKD1CAMR1ZBMA9&js=0" />\n </noscript>\n </div>\n </div>\n <script>\n if (true === true) {\n var elem = document.createElement("script");\n elem.src = "https://images-cn.ssl-images-amazon.com/images/G/01/csminstrumentation/csm-captcha-instrumentation.min._V" + (+ new Date()) + "_.js";\n document.getElementsByTagName(\'head\')[0].appendChild(elem);\n }\n </script>\n</body></html>\n'
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- 爬虫入库实战之干死反爬虫
- Python 爬虫实战(2):股票数据定向爬虫
- Python3网络爬虫实战---17、爬虫基本原理
- 爬虫实战——拉勾网
- 爬虫实战——大麦网
- Python3网络爬虫实战---15、爬虫基础:HTTP基本原理
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。