首页学术论文 python爬知网论文标题

python爬知网论文标题

python爬知网论文标题 python爬虫论文答辩问题用python爬取知网论文

海派小小甜心 2023-12-10 14:43:36

共3条回答353浏览

十四不是四

1小时前发布
- 方法1：BS版简单写了个，只是爬链接的，加上标题老报错，暂时没看出来原因，先给你粘上来吧（方法2无问题）fromBeautifulSoupimportBeautifulSoupimporturllib2importredefgrabHref(url,localfile):html=(url).read()html=unicode(html,'gb2312','ignore').encode('utf-8','ignore')content=BeautifulSoup(html).findAll('a')myfile=open(localfile,'w')pat=(r'href="([^"]*)"')pat2=(r'/tools/')foritemincontent:h=(str(item))href=(1)(href):#s=BeautifulSoup(item)#()#('\r\n')(href)('\r\n')#()defmain():url=""localfile=''grabHref(url,localfile)if__name__=="__main__":main()方法2：Re版由于方法1有问题，只能获取到下载页面链接，所以换用Re解决，代码如下：importurllib2importreurl=''find_re=(r'href="([^"]*)".+?>(.+?)')pat2=(r'/tools/')html=(url).read()html=unicode(html,'utf-8','ignore').encode('gb2312','ignore')myfile=open('','w')(html):(str(x)):print>>myfile,x[0],x[1]()print'Done!'
246 评论
小倩TINA

4小时前发布
- Python自动化可以实现，有偿服务
191 评论
dp786639854

4小时前发布
- 提取所有链接应该用循环：urls = ("//a")for url in urls: print(("href"))如果get_attribute方法报错应该是没有找到a标签对象，如果确定是有的话，可能是页面加载比较慢还没加载出来，selenium默认是不会等待对象出现的，需要在找对象前加一些等待时间；另外如果页面上有iframe的话需要先切换进去才能找到里面的对象。
117 评论

相关问题

热门问题