新手用py写百度关键词筛选

在做seo的时候很多人都会要去找关键词,如果你是正规企业那么老实的去看指数最好,毕竟相对后期方面优化。
       但是如果是灰色行业,关键词会面临的被封禁的地步,所以就弄了这么一个脚本。适用于站群或者目录等需要做超多关键词的人。
脚本原理:各大搜索引擎封禁词汇不给与展示,类似这样:
新手用py写百度关键词筛选
相关结果2个,不展示收录结果!这个就是脚本的原理了。通过抽取txt文本关键词让脚本去百度。
设定一个结果数目,虽然这个办法不是很好,但是至少能帮各位筛选掉大部分封禁词。

[Python] 纯文本查看 复制代码
?
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
import requestsfrom lxml import etree
from queue import Queue
import threading
import time
import urllib
 
 
class BaiduSpider:
        def __init__(self):
                self.baseurl = "https://www.baidu.com/s?wd="
                self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"}
                # URL队列
                self.urlQueue = Queue()
                # 响应html队列
                self.resQueue = Queue()
 
 
        # 生成URL队列
        def getUrl(self):
                f = open("jieguociku.txt", "r" , encoding="utf-8")#打开词库txt
                for pNum in f:
                        # 拼接URL放到url队列中去
                        url = self.baseurl + urllib.parse.quote(pNum.strip()) + "&pn=0"
                        self.urlQueue.put((url,pNum))
 
 
        # 请求,得到响应html,放到解析队列
        def getHtml(self):
                while True:
                        global html
                        # 1.从url队列中get值
                        (url,pNum) = self.urlQueue.get()
                        # 2.发请求,得响应,put到响应队列中
                        try:
                                # 设置20秒超时等待
                                res = requests.get(url, headers=self.headers, timeout=30)
                                res.encoding = "utf-8"
                                html = res.text
                        except Exception as f:
                                with open("chaoshi.txt", "a", encoding="utf-8") as f:
                    #超时保存文本
                                        f.write(pNum)
                        # 放到响应队列
                        self.resQueue.put((html, pNum))
                        # 清除此任务
                        self.urlQueue.task_done()
 
 
        # 解析页面方法
        def getText(self):
                while True:
                        (html, pNum) = self.resQueue.get()
                        parseHtml = etree.HTML(html)
                        r_list = parseHtml.xpath('//div[@id="wrapper_wrapper"]//div[@class="nums"]/span/text()')
                        if int(str(r_list[0][11:-1]).replace(',','')) >= 20000:
                                L = (pNum.strip(), r_list[0])
                                with open("guanjianzi.txt", "a", encoding="utf-8") as f:
                                        f.write(str(pNum))               
                                        print(L)
                                        # print(r_list[0])
                                        # 清除此任务
                    #过滤完成保存文本
                        else:
                                with open("guanjianzixiaoyu.txt", "a", encoding="utf-8") as f:
                                        f.write(str(pNum))
                                        self.resQueue.task_done()
 
 
        def run(self):
                # 1.空列表,用来存放所有线程
                thList = []
                # 2.生成URL队列
                self.getUrl()
                # 3.创建请求线程,放到列表中
                for i in range(50):
                        thRes = threading.Thread(target=self.getHtml)
                        thList.append(thRes)
        #range线程数量50
 
                # 4.创建解析线程,放到列表中
                for i in range(50):
                        thParse = threading.Thread(target=self.getText)
                        thList.append(thParse)
 
 
                # 5.所有线程开始干活
                for th in thList:
                        th.setDaemon(True)
                        th.start()
 
 
                # 如果队列为空,则执行其他程序
                self.urlQueue.join()
                self.resQueue.join()
 
 
if __name__ == "__main__":
        begin = time.time()
        spider = BaiduSpider()
        spider.run()
        end = time.time()
        print(end - begin)
input()

蓝域安全网,用心做站
好了以上就是蓝域安全网为大家分享的关于新手用py写百度关键词筛选 源码分享,如果您喜欢本站,请多多转发哦

百度关键词筛选源码点我下载

更多相关文章阅读:

版权保护: 本文由admin所发布,转载请保留本文链接: http://www.yemogege.cn/cxym-wlxg/327.html

免责声明:蓝域安全网所发布的一切渗透技术视频文章,逆向脱壳病毒分析教程,以及相关实用源码仅限用于学习和研究目的
请勿用于商业用途。如有侵权、不妥之处,请联系站长并出示版权证明以便删除! 

侵权删帖/违法举报/投稿等事物联系邮箱:yemogege@vip.qq.com 网站地图