深度解析Python-PPTX库：逐层解析PPT内容与实战技巧

思维导图结构

PPTX 文件结构解析（以 `prs.slides[0]` 为例）
├── Presentation (prs)
│   ├── slides → 获取所有幻灯片列表
│   │   └── Slide (每一页幻灯片)
│   │       ├── shapes → 获取幻灯片中的所有形状（文本框、图片、表格等）
│   │       │   └── Shape (每个形状对象)
│   │       │       ├── text_frame → 获取文本框内容（如果存在）
│   │       │       │   └── TextFrame → 包含文本段落和段落格式
│   │       │       │       ├── paragraphs → 文本段落列表
│   │       │       │       │   └── Paragraph → 段落内容、格式
│   │       │       │       │       ├── text → 文本内容
│   │       │       │       │       └── font → 字体属性（颜色、大小等）
│   │       │       ├── table → 获取表格内容（如果存在）
│   │       │       │   └── Table → 表格行、列数据
│   │       │       │       ├── rows → 表格行
│   │       │       │       └── columns → 表格列
│   │       │       ├── image → 获取图片内容（如果存在）
│   │       │       │   └── Image → 图片二进制数据、格式等
│   │       │       └── placeholder → 获取占位符（标题、内容区域等）
│   │       │           └── Placeholder → 占位符名称、类型
│   │       └── notes_slide → 获取备注页内容
│   │           └── TextFrame → 备注文本内容
│   └── slide_layout → 获取幻灯片布局（标题、内容区域定义）
│       └── SlideLayout → 布局名称、占位符类型
│           └── placeholder_formats → 布局中的占位符格式
└── 其他高级对象（如图表、形状集合等）

逐层解析与属性方法说明

1. Presentation 对象

用途：代表整个 PPT 文件。
关键方法/属性：
- prs.slides：获取所有幻灯片的列表。
- prs.slide_layouts：获取所有可用的幻灯片布局（如“标题和内容”“仅标题”等）。
- prs.save()：保存 PPT 文件。

2. Slide 对象（每一页幻灯片）

用途：代表单页幻灯片。
关键方法/属性：
- slide.shapes：获取该幻灯片中的所有形状（文本框、图片、表格等）。
- slide.placeholders：获取该幻灯片的占位符列表（如标题、内容区域）。
- slide.notes_slide：获取该幻灯片的备注页内容。
- slide.slide_layout：获取该幻灯片的布局信息。

3. Shape 对象（形状，如文本框、图片等）

用途：代表幻灯片中的单个元素（文本框、图片、表格等）。
关键方法/属性：
- 文本框：
  - shape.has_text_frame：判断是否包含文本框。
  - shape.text_frame：获取文本框对象。
  - shape.text：直接获取文本内容（快捷方式）。
  - text_frame.paragraphs：获取文本段落列表。
  - paragraph.text：获取段落文本。
  - paragraph.font：获取字体属性（颜色、大小等）。
- 表格：
  - shape.has_table：判断是否为表格。
  - shape.table：获取表格对象。
  - table.rows：获取表格行。
  - table.columns：获取表格列。
  - table.cell(row, col)：获取指定单元格内容。
- 图片：
  - shape.has_image：判断是否为图片。
  - shape.image：获取图片对象。
  - image.blob：获取图片二进制数据。
  - image.filename：获取图片文件名（如果存在）。
- 占位符：
  - shape.is_placeholder：判断是否为占位符。
  - shape.placeholder_format：获取占位符类型（如标题、内容）。
  - placeholder_format.type：占位符类型（如 MSO_PLACEHOLDER_TYPE.TITLE）。

4. TextFrame 对象（文本框内容）

用途：管理文本框的文本内容和格式。
关键方法/属性：
- text_frame.clear()：清空文本框内容。
- text_frame.paragraphs：获取文本段落列表。
- text_frame.word_wrap：是否自动换行。

5. Table 对象（表格内容）

用途：管理表格的行、列和单元格。
关键方法/属性：
- table.cell(row, col)：获取指定单元格对象。
- cell.text：获取单元格文本。
- cell.merge()：合并单元格。

6. Image 对象（图片内容）

用途：管理图片的二进制数据和属性。
关键方法/属性：
- image.blob：获取图片的二进制数据。
- image.content_type：获取图片格式（如 image/png）。
- image.embed：获取图片的嵌入方式（如 EMBED）。

可提取的内容示例

从单页幻灯片中提取内容

from pptx import Presentationprs = Presentation("your_presentation.pptx")
slide = prs.slides[0]  # 第一页幻灯片# 提取文本内容
for shape in slide.shapes:if shape.has_text_frame:for paragraph in shape.text_frame.paragraphs:print("文本内容:", paragraph.text)print("字体颜色:", paragraph.font.color.rgb)print("字体大小:", paragraph.font.size.pt)# 提取表格内容
for shape in slide.shapes:if shape.has_table:table = shape.tablefor row in table.rows:row_data = []for cell in row.cells:row_data.append(cell.text)print("表格行数据:", row_data)# 提取图片内容
for shape in slide.shapes:if shape.has_image:image = shape.imageprint("图片格式:", image.content_type)with open("extracted_image.png", "wb") as f:f.write(image.blob)  # 保存图片二进制数据

关键操作总结

遍历所有形状：

for shape in slide.shapes:if shape.has_text_frame:# 处理文本框elif shape.has_table:# 处理表格elif shape.has_image:# 处理图片

获取占位符内容：

for placeholder in slide.placeholders:if placeholder.placeholder_format.type == 1:  # 1 表示标题print("标题:", placeholder.text)else:print("内容占位符:", placeholder.text)

提取备注页内容：

notes_slide = slide.notes_slide
if notes_slide.has_text_frame:print("备注内容:", notes_slide.text_frame.text)

常见错误处理

形状类型判断：

if shape.has_text_frame:text = shape.text
else:print("该形状不包含文本")

占位符类型匹配：

from pptx.enum.shapes import MSO_PLACEHOLDER
if shape.is_placeholder and shape.placeholder_format.type == MSO_PLACEHOLDER.TITLE:print("这是标题占位符:", shape.text)