【python】爬取百度热搜排行榜Top50+可视化【附源码】【送数据分析书籍】-CFANZ编程社区

英杰社区https://bbs.csdn.net/topics/617804998

一、导入必要的模块：

这篇博客将介绍如何使用Python编写一个爬虫程序，从斗鱼直播网站上获取图片信息并保存到本地。我们将使用requests模块发送HTTP请求和接收响应，以及os模块处理文件和目录操作。

如果出现模块报错

进入控制台输入：建议使用国内镜像源

pip install requests -i https://mirrors.aliyun.com/pypi/simple

我大致罗列了以下几种国内镜像源：

清华大学
https://pypi.tuna.tsinghua.edu.cn/simple

阿里云
https://mirrors.aliyun.com/pypi/simple/

豆瓣
https://pypi.douban.com/simple/ 

百度云
https://mirror.baidu.com/pypi/simple/

中科大
https://pypi.mirrors.ustc.edu.cn/simple/

华为云
https://mirrors.huaweicloud.com/repository/pypi/simple/

腾讯云
https://mirrors.cloud.tencent.com/pypi/simple/

二、发送GET请求获取响应数据：

设置了请求头部信息，以模拟浏览器的请求，函数返回响应数据的JSON格式内容。

def get_html(url):
    header = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    }
    response = requests.get(url=url, headers=header)
    # print(response.json())
    html = response.json()
    return html

如何获取请求头：

火狐浏览器：

将以下请求头信息复制出来即可

三、代码思路

导入所需的库：

import requests
from bs4 import BeautifulSoup
import openpyxl

2.发起HTTP请求获取百度热搜页面内容：

url = 'https://top.baidu.com/board?tab=realtime'
response = requests.get(url)
html = response.content

3.使用BeautifulSoup解析页面内容：

soup = BeautifulSoup(html, 'html.parser')

4.提取热搜数据：

hot_searches = []
for item in soup.find_all('div', {'class': 'c-single-text-ellipsis'}):
    hot_searches.append(item.text)

5.保存热搜数据到Excel：

workbook = openpyxl.Workbook()
sheet = workbook.active
sheet.title = 'Baidu Hot Searches'

6.设置标题：

sheet.cell(row=1, column=1, value='百度热搜排行榜—博主:Yan-英杰')

7.写入热搜数据：

for i in range(len(hot_searches)):
    sheet.cell(row=i+2, column=1, value=hot_searches[i])

8.保存Excel文件：

workbook.save('百度热搜.xlsx')

9.输出提示信息：

print('热搜数据已保存到 百度热搜.xlsx')

四、完整代码:

如果对CSDN周边以及有偿返现任务感兴趣：https://bbs.csdn.net/topics/617804998

私信博主进入交流群，一起学习探讨：
可添加博主：Yan--yingjie
如果想免费获取图书，也可添加博主微信，每周免费送数十本


import requests
from bs4 import BeautifulSoup
import openpyxl

# 发起HTTP请求获取百度热搜页面内容
url = 'https://top.baidu.com/board?tab=realtime'
response = requests.get(url)
html = response.content

# 使用BeautifulSoup解析页面内容
soup = BeautifulSoup(html, 'html.parser')

# 提取热搜数据
hot_searches = []
for item in soup.find_all('div', {'class': 'c-single-text-ellipsis'}):
    hot_searches.append(item.text)

# 保存热搜数据到Excel
workbook = openpyxl.Workbook()
sheet = workbook.active
sheet.title = 'Baidu Hot Searches'

# 设置标题
sheet.cell(row=1, column=1, value='百度热搜排行榜—博主:Yan-英杰')

# 写入热搜数据
for i in range(len(hot_searches)):
    sheet.cell(row=i+2, column=1, value=hot_searches[i])

workbook.save('百度热搜.xlsx')
print('热搜数据已保存到 百度热搜.xlsx')

效果图:

可视化完整代码：

如果对CSDN周边以及有偿返现任务感兴趣：https://bbs.csdn.net/topics/617804998

私信博主进入交流群，一起学习探讨，如果对CSDN周边以及有偿返现任务感兴趣：
可添加博主：Yan--yingjie
如果想免费获取图书，也可添加博主微信，每周免费送数十本

import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

# 发起HTTP请求获取百度热搜页面内容
url = 'https://top.baidu.com/board?tab=realtime'
response = requests.get(url)
html = response.content

# 使用BeautifulSoup解析页面内容
soup = BeautifulSoup(html, 'html.parser')

# 提取热搜数据
hot_searches = []
for item in soup.find_all('div', {'class': 'c-single-text-ellipsis'}):
    hot_searches.append(item.text)

# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# 绘制条形图
plt.figure(figsize=(15, 10))
x = range(len(hot_searches))
y = list(reversed(range(1, len(hot_searches)+1)))
plt.barh(x, y, tick_label=hot_searches, height=0.8)  # 调整条形图的高度

# 添加标题和标签
plt.title('百度热搜排行榜')
plt.xlabel('排名')
plt.ylabel('关键词')

# 调整坐标轴刻度
plt.xticks(range(1, len(hot_searches)+1))

# 调整条形图之间的间隔
plt.subplots_adjust(hspace=0.8, wspace=0.5)

# 显示图形
plt.tight_layout()
plt.show()

效果图：

【文末送书】

内容简介

　　《Pandas数据分析》详细阐述了与Pandas数据分析相关的基本解决方案，主要包括数据分析导论、使用PandasDataFrame、使用Pandas进行数据整理、聚合Pandas DataFrame、使用Pandas和Matplotlib可视化数据、使用Seabom和自定义技术绘图、金融分析、基于规则的异常检测、Python机器学习入门、做出更好的预测、机器学习异常检测等内容。此外，该书还提供了相应的示例、代码，以帮助读者进一步理解相关方案的实现过程。
　　《Pandas数据分析》适合作为高等院校计算机及相关专业的教材和教学参考书，也可作为相关开发人员的自学用书和参考手册。

购买链接：

京东：https://item.jd.com/14065178.html

当当：http://product.dangdang.com/29599087.html

注：活动结束后会在我的主页动态如期公布中奖者，包邮到家。