打卡学习Python爬虫第四天|bs4爬取优美图库的小清新图片

2026/6/3 3:55:33 来源：https://blog.csdn.net/weixin_52687711/article/details/141334951 浏览: 次关键词：打卡学习Python爬虫第四天|bs4爬取优美图库的小清新图片

bs4解析比较简单，通过HTML的标签和属性去提取值，find(标签,属性="值"）

但是需要了解HTML的语法知识，然后再使用bs4去提取，逻辑和编写难度就会比较简单和清晰。

bs4如何使用？如有如下HTML代码：

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Simple HTML Table</title>
</head>
<body><table border="1"><tr><td>Row 1, Cell 1</td><td>Row 1, Cell 2</td></tr><tr><td>Row 2, Cell 1</td><td>Row 2, Cell 2</td></tr><tr><td>Row 3, Cell 1</td><td>Row 3, Cell 2</td></tr>
</table></body>
</html>

bs4利用标签和属性去提取值，在什么代码中table是表格标签，tr是行，td是列。也就是这个HTML表格包含三个行（tr）和六个单元格（td）。

目标：爬取优美图库的小清新图片

思路：通过小清新图片的源代码获取子页面的链接，再将子页面的链接作为一个url，通过循环访问子页面来获取每一个子页面中的图片。

一、安装bs4（PyCharm终端输入）

pip install BeautifulSoup4# 用清华源
pip install beautifulsoup4 -i https://pypi.tuna.tsinghua.edu.cn/simple

二、找到网页url

三、查看页面源代码

四、获取全部子链接

import requests
from bs4 import BeautifulSoupurl = 'https://www.umeituku.com/weimeitupian/xiaoqingxintupian/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}
resp = requests.get(url, headers=headers)
resp.encoding = 'utf-8'
# print(response.text)
page = BeautifulSoup(resp.text, 'html.parser')
div = page.find("div",class_="TypeList")alist = div.find_all("a")
for i in alist:href = i.get("href")  # 通过get直接拿到属性值,即子页面链接print(href)

五、将子页面链接作为新的url访问

六、根据子页面源代码特征提取想要的内容

成功获取图片下载地址：

七、下载并保存图片

完整代码：

import requests
from bs4 import BeautifulSoupurl = 'https://www.umeituku.com/weimeitupian/xiaoqingxintupian/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}
resp = requests.get(url, headers=headers)
resp.encoding = 'utf-8'
# print(response.text)
page = BeautifulSoup(resp.text, 'html.parser')
div = page.find("div",class_="TypeList")alist = div.find_all("a")
for i in alist:href = i.get("href")  # 通过get直接拿到属性值,即子页面链接z_resp = requests.get(href)  # href就是子页面的urlz_resp.encoding = 'utf-8'z_page = BeautifulSoup(z_resp.text, 'html.parser')# p = z_page.find("p",align="center")# img = p.find("img")# src = img.get("src")src = z_page.find("p",align="center").find("img").get("src")# 下载图片img_resp = requests.get(src)img_resp.content  # 图片二进制数据with open("./img/"+src.split("/")[-1], "wb") as f:f.write(img_resp.content)print("下载成功")f.close()z_resp.close()
resp.close()

打卡学习Python爬虫第四天|bs4爬取优美图库的小清新图片

相关资讯

热文排行

最新新闻

推荐新闻

热搜词