Requests 从入门到精通
背景
HTTP 是互联网应用层最核心的协议。Requests 将这个协议中每次都需要处理的 headers、cookies、认证、超时等细节封装得极其优雅,让开发者专注于业务逻辑。
第 1 章:HTTP 方法速览
import requests
# GET — 获取资源
r = requests.get("https://httpbin.org/get")
# POST — 创建资源
r = requests.post("https://httpbin.org/post", json={"name": "Alice"})
r = requests.post("https://httpbin.org/post", data={"name": "Alice"}) # form 编码
# PUT — 更新资源
r = requests.put("https://httpbin.org/put", data={"key": "value"})
# DELETE — 删除资源
r = requests.delete("https://httpbin.org/delete")
# HEAD — 只获取响应头
r = requests.head("https://httpbin.org/get")
# PATCH — 部分更新
r = requests.patch("https://httpbin.org/patch", data={"key": "value"})
第 2 章:Session 与持久化连接
# Session 保持连接和 Cookie
s = requests.Session()
s.headers.update({"User-Agent": "MyApp/1.0"})
s.auth = ("username", "password")
# 共享连接池,自动管理 Cookie
login_resp = s.post("https://example.com/login", json={"user": "a", "pass": "b"})
dashboard = s.get("https://example.com/dashboard")
# Session 级别重试策略
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503])
s.mount("https://", HTTPAdapter(max_retries=retry))
第 3 章:文件上传与流式下载
# 上传文件
files = {"file": open("report.pdf", "rb")}
r = requests.post("https://httpbin.org/post", files=files)
# 流式下载大文件
with requests.get("https://example.com/large_file.zip", stream=True) as r:
r.raise_for_status()
with open("large_file.zip", "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
第 4 章:认证方式
# Basic Auth
requests.get("https://api.example.com", auth=("user", "pass"))
# Bearer Token
headers = {"Authorization": "Bearer eyJhbGciOi..."}
requests.get("https://api.example.com", headers=headers)
# OAuth2 客户端凭证
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth("client_id", "client_secret")
data = {"grant_type": "client_credentials"}
r = requests.post("https://auth.example.com/token", auth=auth, data=data)
token = r.json()["access_token"]
第 5 章:异常处理与超时
try:
r = requests.get("https://api.example.com", timeout=(3.0, 10.0))
# timeout=(连接超时, 读取超时)
r.raise_for_status() # 非 2xx 抛出 HTTPError
except requests.exceptions.Timeout:
print("请求超时")
except requests.exceptions.ConnectionError:
print("连接失败,检查网络或 URL")
except requests.exceptions.HTTPError as e:
print(f"HTTP 错误: {e}")
except requests.exceptions.RequestException as e:
print(f"请求异常: {e}")
思考题
- Session 和直接调用
requests.get() 有什么区别?什么场景下必须使用 Session?
stream=True 下载时,iter_content 和 iter_lines 分别适合什么场景?
- 如何为 Requests 添加自定义的 DNS 解析(如使用 DoH)?