方法1:BS版简单写了个,只是爬链接的,加上标题老报错,暂时没看出来原因,先给你粘上来吧(方法2无问题)fromBeautifulSoupimportBeautifulSoupimporturllib2importredefgrabHref(url,localfile):html=(url).read()html=unicode(html,'gb2312','ignore').encode('utf-8','ignore')content=BeautifulSoup(html).findAll('a')myfile=open(localfile,'w')pat=(r'href="([^"]*)"')pat2=(r'/tools/')foritemincontent:h=(str(item))href=(1)(href):#s=BeautifulSoup(item)#()#('\r\n')(href)('\r\n')#()defmain():url=""localfile=''grabHref(url,localfile)if__name__=="__main__":main()方法2:Re版由于方法1有问题,只能获取到下载页面链接,所以换用Re解决,代码如下:importurllib2importreurl=''find_re=(r'href="([^"]*)".+?>(.+?)')pat2=(r'/tools/')html=(url).read()html=unicode(html,'utf-8','ignore').encode('gb2312','ignore')myfile=open('','w')(html):(str(x)):print>>myfile,x[0],x[1]()print'Done!'