python爬虫，requests使用，网页采集案列：搜狗爬取人物信息

刘玉霞 · 发表于 2023-3-12 19:31:52

一、初识爬虫，requests使用

requests介绍：
Request支持HTTP连接保持和连接池，支持使用cookie保持会话，支持文件上传，支持自动响应内容的编码，支持国际化的URL和POST数据自动编码。requests会自动实现持久连接keep-alive

# 导入模块import requests# 目标URLurl = 'https://www.sogou.com/'response = requests.get(url=url) # 发起请求，并接受# 接受的页面进行解析page_text = response.text# 打印出来print(page_text)# 保存到本地with open('sogou.html', 'w', encoding='utf-8') as fp: fp.write(page_text)print("结束")

复制代码

二、网页采集案列：搜狗爬取人物信息

# 导入模块，输入urlimport requestsurl = 'https://www.sogou.com/web?'# 模拟浏览器UA，防止被发现是个爬虫headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36)'}# 输入提示框（要搜索的东西）name = input("输入一个人名:")# 构造payload，模拟真实数据包param = { 'type': 'getpinyin', 'query': name}# 发起请求并接受请求到的内容response = requests.get(url, params=param, headers=headers)# 文本方式读取page_txt = response.text# 保存网页filename = name + '.html'with open(filename, 'w', encoding='utf-8') as fp: fp.write(page_txt) print("succeed")

复制代码

来源:https://www.cnblogs.com/shuxi/p/17208509.html
免责声明：由于采集信息均来自互联网，如果侵犯了您的权益，请联系我们【E-Mail:cb@itdo.tech】我们会及时删除侵权内容，谢谢合作！