|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
感谢小甲鱼和鱼C论坛,新学爬虫找了个网址试了试,唉!成功了,very高兴
- import re
- import os
- import requests
- temp_url = input('请输入需要查找的关键字:') # 比如输入 "fate"
- url = 'https://www.duitang.com/search/?kw=%s&type=feed' % temp_url
- web = requests.get(url)
- html = web.text
- temp_link = re.findall(r'src="(https://b-ssl.duitang.com.+?(\.thumb.224_0)\..+?g)" height',html)
- # https://b-ssl.duitang.com/uploads/item/201705/13/20170513225951_SFxhj.thumb.224_0.jpeg
- # 分组1是整个链接,分组2是缩略图(.thumb.224_0)
- for i in temp_link:
- my_link = i[0].replace(i[1],'') # 将缩略图(.thumb.224_0)的那份替换成空
- for_web = requests.get(my_link)
- dir_name = my_link.split('/')[-1] # 20170513225951_SFxhj.jpeg
- dir_save = 'D:/duitang/'
- if not os.path.exists(dir_save):
- os.mkdir(dir_save)
- dir_path = dir_save + dir_name
- with open(dir_path,'wb') as f:
- f.write(for_web.content)
复制代码
|
|