0
点赞
收藏
分享

微信扫一扫

异步协程查询搜狗seo关键词排名

前段时间写了百度seo关键词排名的查询任务,用了传统的requests模块加多进程进行排名任务的查询,这次呢,用异步协成做了一下搜狗关键词的排名查询任务,查询速度挺快的了。下面就代码核心问题讲解一下


一、请求头


在请求头,获取随机的ua和cookies,这里用了fake_headers包随机更换ua,cookies是获取了搜狗视频号的cookies


async def get_new_cookies():
global session
session = aiohttp.ClientSession()
url = 'https://v.sogou.com'
headers = Headers(headers=True).generate()
async with session.get(url=url, headers=headers, allow_redirects=False) as response:
cookies = response.cookies
# print(cookies)
snuid = re.search(r'SNUID=(.*?);',str(cookies.get('SNUID'))).group(1)
suv = re.search(r'SUV=(.*?);',str(cookies.get('SUV'))).group(1)
return suv,snuid


异步协程查询搜狗seo关键词排名_请求头



二、创建关键词任务


创建关键词连接数据库,将关键词做个协成待执行的任务


async def get_keyword(*args):
sql = 'select keyword,web_url from seo_ranking where status=%s and engine=%s'
cursor.execute(sql, args)
keywords = cursor.fetchall()
# print(info)
return keywords



async def main():
global session
session = aiohttp.ClientSession()
await connect_db()
keywords = await get_keyword(1,1)
spider_url_tasks = [asyncio.ensure_future(spider_url(keyword[0],keyword[1])) for keyword in keywords]
spider_urls = await asyncio.gather(*spider_url_tasks)
await get_html(spider_urls)
await session.close()



三、获取目标网站排名


在查询关键词排名时,查询前10页,那么就创建一个循环,每次解析一个页面,判断目标网站是否在当前页中,如果在当前页中,就停止循环。


async def get_html(spider_urls):
for spiderUrl,web_name,keyword in spider_urls:
if not spiderUrl:continue
for page in range(0,100,10):
html = await spider_page(spiderUrl,page)
ranking = await parse_html(html,web_name,page)
if ranking:
print(keyword,ranking,web_name)
q_time = datetime.datetime.today().strftime("%Y-%m-%d %H:%M:%S")
await update_db(ranking, site_cx, status, q_time, keyword)
break


async def parse_html(html,web_name,page):
e_obj = etree.HTML(html)
divs = e_obj.xpath("//div[@id='main']//div[@class='vrwrap']")
# print(divs)
for index, div in enumerate(divs):
i = div.xpath("./div[@class='citeurl']/i")
if i:
yu_name = div.xpath(
"./div[@class='citeurl']/span[2]/text() | .//div[contains(@class,'citeurl')]/span[1]/text()")
# print(yu_name)
else:
yu_name = div.xpath(
"./div[@class='citeurl']/span[1]/text() | .//div[contains(@class,'citeurl')]/span[1]/text()")
# print(yu_name)
if yu_name:
if web_name in yu_name[0]:
ranking = index + 1 + page if index == 0 else index + page
return ranking

运行结果:


将关键词从数据库加载出来后,先查询网址是否被搜狗收录,再查询已收录网站的关键词排名即可。


异步协程查询搜狗seo关键词排名_请求头_02




举报

相关推荐

0 条评论