爬虫中的正则表达式
操作步骤
-
指定url
-
发出请求
-
获取响应数据
-
数据解析
-
持久化存储
实例
要求:爬取豆瓣电影前25部电影的评价
import requests
import re
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36 Edg/94.0.992.50"
}
url = "https://movie.douban.com/top250"
response = requests.post(url=url, headers=headers).text
x = '<p class="quote">.*?<span class="inq">(.*?)</span>.*?</p>'
list1 = re.findall(x, response, re.S)
with open ("电影评分","w",encoding="utf-8") as fp:
str1=""
for i in range(len(list1)):
str1 = str1+str(list1[i])
fp.write(str1)









