凑个小热闹：python采集《狂飙》评论

皓扬发表于 2023-2-9 17:24:14

2023年首部爆款剧集《狂飙》一度冲上热搜第一，害的我两倍速熬夜看完了。

“是非面前稍不留神，就会步入万丈深渊，唯有坚守信仰，才能守得初心”

面对这么多广大网友的讨论，我也来凑上一个热闹
用python爬取《狂飙》评论数据
代码展示
部分代码展示
import requests
import parsel
# 我还录制了详细讲解的视频，直接在这个裙 708525271 自取，包括完整代码

headers = {
'Cookie': '数据我都删除了，建议用自己的',
'Host': '',
'User-Agent': '',
}
for page in range(0, 4000):
print(page)
url = f'https://movie.douban.com/subject/35465232/comments?start={page*20}&limit=20&status=P&sort=new_score'
response = requests.get(url=url, headers=headers)
select = parsel.Selector(response.text)
comments = select.css('.comment-item .comment')
for comment in comments:
   name = comment.css('.comment-info a::text').get()
   try:
         score_str = comment.css('.comment-info .rating::attr(class)').get()
         score = score_str.replace('0 rating', '').replace('allstar', '')
   except:
         score = 0
   comment_time = comment.css('.comment-info .comment-time::text').get().strip()
   vote_count = comment.css('.comment-vote .votes.vote-count::text').get()
   comment_content = comment.css('.comment-content span::text').get()
   print(name, score, comment_time, vote_count, comment_content)
效果展示

不登录的话，只能采集部分，全部评论需要登录后才能爬取。
浏览器数据容易泄密，我都删掉了，大家自己修改一下。

最后

感谢你观看我的文章~本次航班到这里就结束
来源:https://www.cnblogs.com/hahaa/p/17105263.html
免责声明：由于采集信息均来自互联网，如果侵犯了您的权益，请联系我们【E-Mail:cb@itdo.tech】我们会及时删除侵权内容，谢谢合作！

页: [1]

翼度科技's Archiver

凑个小热闹：python采集《狂飙》评论