欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 财经 > 金融 > 亚马逊 SP-API 接入指南:商品详情页实时数据采集开发教程

亚马逊 SP-API 接入指南:商品详情页实时数据采集开发教程

2025/6/19 9:59:28 来源:https://blog.csdn.net/2301_78671173/article/details/148744800  浏览:    关键词:亚马逊 SP-API 接入指南:商品详情页实时数据采集开发教程

在电商数据分析领域,实时获取亚马逊商品详情页数据是企业制定营销策略、监控竞品动态的核心需求。本教程将深入讲解如何通过亚马逊 SP-API(Selling Partner API)实现商品详情页数据的自动化采集,包含完整的接入流程、代码实现及最佳实践。

一、SP-API 基础认知

1. API 架构概述

亚马逊 SP-API 是基于 RESTful 架构的新一代卖家接口,相比旧版 MWS 具有以下优势:

  • OAuth2.0 认证:更安全的第三方授权机制
  • 模块化接口:按业务领域划分(Products、Orders、Pricing 等)
  • 全球端点支持:多区域部署优化响应速度
  • 标准化响应:统一的 JSON 格式与错误码体系

2. 接入权限申请

  • 账户注册:完成注册
  • API 权限申请:提交产品权限请求,通过安全审核
  • LWA 应用配置:创建 Login with Amazon 应用获取 Client ID/Secret

二、开发环境搭建

1. 核心依赖安装

 

# 推荐使用Python 3.8+环境
pip install requests python-dotenv pycryptodome

2. 项目结构设计

amazon-sp-api/
├── config/
│   ├── config.ini          # 配置文件
│   └── logging.conf        # 日志配置
├── src/
│   ├── auth.py             # 认证模块
│   ├── product_api.py      # 商品API模块
│   ├── utils.py            # 工具函数
│   └── main.py             # 主程序
├── .env                    # 环境变量
└── requirements.txt        # 依赖清单

 

三、认证流程实现

 

1. OAuth2.0 授权流程

import os
import requests
import json
from dotenv import load_dotenvclass SPAPIAuth:def __init__(self):load_dotenv()self.client_id = os.environ.get('CLIENT_ID')self.client_secret = os.environ.get('CLIENT_SECRET')self.refresh_token = os.environ.get('REFRESH_TOKEN')self.lwa_endpoint = 'https://api.amazon.com/auth/o2/token'def get_access_token(self):"""获取临时访问令牌"""payload = {'grant_type': 'refresh_token','refresh_token': self.refresh_token,'client_id': self.client_id,'client_secret': self.client_secret}response = requests.post(self.lwa_endpoint, data=payload)if response.status_code == 200:return response.json()['access_token']else:raise Exception(f"获取访问令牌失败: {response.text}")

 2. AWS 签名生成

import hmac
import hashlib
import time
import datetime
import urllib.parseclass AWSSigner:def __init__(self, access_key, secret_key, region):self.access_key = access_keyself.secret_key = secret_keyself.region = regionself.algorithm = 'AWS4-HMAC-SHA256'self.service = 'execute-api'def generate_signature(self, method, path, host, query_params, headers, payload):"""生成AWS4-HMAC-SHA256签名"""# 1. 生成时间戳t = datetime.datetime.utcnow()amz_date = t.strftime('%Y%m%dT%H%M%SZ')date_stamp = t.strftime('%Y%m%d')# 2. 构建规范请求canonical_uri = urllib.parse.quote(path, safe='/~')canonical_querystring = '&'.join([f"{k}={urllib.parse.quote_plus(str(v))}" for k, v in sorted(query_params.items())])canonical_headers = '\n'.join([f"{k.lower()}:{v.strip()}" for k, v in headers.items()]) + '\n'signed_headers = ';'.join([k.lower() for k in headers.keys()])payload_hash = hashlib.sha256(payload.encode('utf-8')).hexdigest()canonical_request = (f"{method}\n{canonical_uri}\n{canonical_querystring}\n"f"{canonical_headers}\n{signed_headers}\n{payload_hash}")# 3. 构建待签字符串credential_scope = f"{date_stamp}/{self.region}/{self.service}/aws4_request"string_to_sign = (f"{self.algorithm}\n{amz_date}\n{credential_scope}\n"f"{hashlib.sha256(canonical_request.encode('utf-8')).hexdigest()}")# 4. 生成签名密钥k_date = self._sign(('AWS4' + self.secret_key).encode('utf-8'), date_stamp)k_region = self._sign(k_date, self.region)k_service = self._sign(k_region, self.service)k_signing = self._sign(k_service, 'aws4_request')# 5. 计算签名signature = hmac.new(k_signing, string_to_sign.encode('utf-8'), hashlib.sha256).hexdigest()return amz_date, credential_scope, signaturedef _sign(self, key, msg):return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()

 

四、商品详情数据采集实现

1. 商品 API 客户端

class ProductAPIClient:def __init__(self, auth, aws_signer, endpoint):self.auth = authself.aws_signer = aws_signerself.endpoint = endpointself.host = endpoint.split('//')[1]def get_product_details(self, asin, marketplace_id='ATVPDKIKX0DER'):"""获取商品详情"""access_token = self.auth.get_access_token()method = 'GET'path = '/products/2020-09-01/products'query_params = {'MarketplaceIds': marketplace_id,'Asins': asin}headers = {'host': self.host,'x-amz-date': '',  # 将在签名过程中设置'Authorization': '',  # 将在签名过程中设置'x-amz-access-token': access_token,'Content-Type': 'application/json'}# 生成签名amz_date, credential_scope, signature = self.aws_signer.generate_signature(method, path, self.host, query_params, headers, '')# 更新请求头headers['x-amz-date'] = amz_dateheaders['Authorization'] = (f"AWS4-HMAC-SHA256 Credential={self.aws_signer.access_key}/{credential_scope}, "f"SignedHeaders=host;x-amz-date, Signature={signature}")# 构建完整URLquery_string = '&'.join([f"{k}={urllib.parse.quote_plus(str(v))}" for k, v in query_params.items()])url = f"{self.endpoint}{path}?{query_string}"# 发送请求response = requests.get(url, headers=headers)if response.status_code == 200:return response.json()else:raise Exception(f"请求失败: {response.status_code} - {response.text}")

 2. 批量采集实现

def batch_fetch_products(asin_list, max_retries=3, delay=1):"""批量获取商品数据"""# 初始化认证和API客户端auth = SPAPIAuth()aws_signer = AWSSigner(access_key=os.environ.get('AWS_ACCESS_KEY'),secret_key=os.environ.get('AWS_SECRET_KEY'),region='us-east-1')client = ProductAPIClient(auth=auth,aws_signer=aws_signer,endpoint='https://sellingpartnerapi-na.amazon.com')results = {}for asin in asin_list:retries = 0while retries < max_retries:try:data = client.get_product_details(asin)results[asin] = dataprint(f"成功获取ASIN: {asin} 的数据")breakexcept Exception as e:print(f"获取ASIN: {asin} 失败: {str(e)}")retries += 1if retries < max_retries:print(f"重试 ({retries}/{max_retries})")time.sleep(delay * (2 ** retries))  # 指数退避else:results[asin] = None# 控制请求频率,避免触发限流time.sleep(delay)return results

 

五、数据解析与持久化

1. 数据解析示例

def parse_product_data(raw_data):"""解析商品数据并提取关键信息"""if not raw_data or 'payload' not in raw_data:return Noneproduct = raw_data['payload'][0]# 提取基本信息parsed_data = {'asin': product.get('asin'),'title': product.get('attributes', {}).get('title'),'brand': product.get('attributes', {}).get('brand'),'model': product.get('attributes', {}).get('model'),'color': product.get('attributes', {}).get('color'),'size': product.get('attributes', {}).get('size')}# 提取价格信息price_info = product.get('summary', {})parsed_data['price'] = {'lowest_new': price_info.get('lowestPrice', {}).get('value'),'lowest_used': price_info.get('lowestUsedPrice', {}).get('value'),'currency': price_info.get('lowestPrice', {}).get('currency'),'condition': price_info.get('condition')}# 提取库存信息availability = product.get('fulfillmentAvailability', {})parsed_data['availability'] = {'quantity': availability.get('quantity'),'fulfillment_channel': availability.get('fulfillmentChannel'),'is_supply_limited': availability.get('isSupplyLimited')}# 提取评分信息reviews = product.get('customerReviews', {})parsed_data['reviews'] = {'average_rating': reviews.get('averageRating'),'total_reviews': reviews.get('totalReviews'),'rating_histogram': reviews.get('ratingHistogram')}return parsed_data

 2. 数据存储方案

import pandas as pd
from sqlalchemy import create_enginedef save_to_database(products_data, db_config):"""将商品数据存入PostgreSQL数据库"""# 转换为DataFramedf = pd.DataFrame([parse_product_data(p) for p in products_data.values() if p])# 连接数据库engine = create_engine(f"postgresql://{db_config['user']}:{db_config['password']}@"f"{db_config['host']}:{db_config['port']}/{db_config['database']}")# 将数据写入表中df.to_sql(name='amazon_products',con=engine,if_exists='append',index=False,dtype={'asin': String(10),'title': Text,'brand': String(100),'model': String(100),'color': String(50),'size': String(50),'price': JSON,'availability': JSON,'reviews': JSON})print(f"成功存储 {len(df)} 条商品数据")

 

六、性能优化与错误处理

1. 并发请求实现

import asyncio
from concurrent.futures import ThreadPoolExecutorasync def fetch_product_async(client, asin):"""异步获取单个商品数据"""loop = asyncio.get_event_loop()try:# 在单独的线程中执行同步请求data = await loop.run_in_executor(None, client.get_product_details, asin)return asin, dataexcept Exception as e:print(f"异步获取ASIN: {asin} 失败: {str(e)}")return asin, Noneasync def batch_fetch_async(asin_list, max_workers=10):"""异步批量获取商品数据"""# 初始化API客户端auth = SPAPIAuth()aws_signer = AWSSigner(access_key=os.environ.get('AWS_ACCESS_KEY'),secret_key=os.environ.get('AWS_SECRET_KEY'),region='us-east-1')client = ProductAPIClient(auth=auth,aws_signer=aws_signer,endpoint='https://sellingpartnerapi-na.amazon.com')# 创建线程池执行器with ThreadPoolExecutor(max_workers=max_workers) as executor:loop = asyncio.get_event_loop()tasks = [fetch_product_async(client, asin) for asin in asin_list]results = await asyncio.gather(*tasks)return {asin: data for asin, data in results}

 2. 限流与重试机制

from ratelimit import limits, sleep_and_retryclass RateLimitedProductAPIClient(ProductAPIClient):"""带速率限制的商品API客户端"""CALLS = 10  # 每分钟最多10次调用PERIOD = 60  # 时间窗口(秒)@sleep_and_retry@limits(calls=CALLS, period=PERIOD)def get_product_details(self, asin, marketplace_id='ATVPDKIKX0DER'):"""带速率限制的商品详情获取方法"""return super().get_product_details(asin, marketplace_id)

 

七、部署与监控

1. 环境配置示例

# config.ini
[api]
region = us-east-1
endpoint = https://sellingpartnerapi-na.amazon.com[database]
host = localhost
port = 5432
database = amazon_data
user = postgres
password = your_password[logging]
config_file = logging.conf

 

2. 监控指标建议

  • API 请求成功率
  • 平均响应时间
  • 数据更新频率
  • 限流触发次数
  • 数据库写入性能

八、合规与安全注意事项

  1. 数据使用限制

    • 仅用于内部业务分析,禁止公开数据
    • 遵守亚马逊 API 服务条款
    • 不得进行数据爬虫或过度请求
  2. 安全最佳实践

    • 使用环境变量存储敏感信息
    • 定期轮换 API 密钥
    • 实施最小权限原则
    • 启用数据传输加密
  3. 防封禁策略

    • 严格遵守 API 调用频率限制
    • 实现智能限流和请求调度
    • 监控异常请求模式并及时调整

通过本教程,你可以完整实现亚马逊 SP-API 的接入与商品详情页数据采集系统。在实际应用中,建议根据业务需求扩展功能,如添加定时任务、数据可视化或机器学习分析模块。

(注:实际开发中请替换示例中的环境变量和配置参数,并遵守亚马逊 API 使用条款。)

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词