HuggingFace Transformers

技术栈

AI 框架

nlppretrained-modelstransformerllmfine-tuningsafetensors

概览

HuggingFace Transformers

HuggingFace Transformers 是全球最大的预训练模型库，提供超过 10 万个预训练模型供下载使用。它统一了 PyTorch、TensorFlow、JAX 的接口，仅需 3-5 行代码即可调用 GPT、BERT、LLaMA、Stable Diffusion 等 SOTA 模型。

核心价值：

统一 API：AutoModel / AutoTokenizer / pipeline() 跨所有模型
模型 Hub：10 万+ 社区模型，涵盖 NLP / CV / Audio / 多模态
PEFT / LoRA：高效微调大模型，消费级 GPU 也能跑
safetensors：安全的模型权重格式，比 pickle 安全
企业生态：Inference Endpoints、Text Generation Inference（TGI）

适用场景： 文本分类、QA、翻译、文生图、语音识别、LLM 微调部署。

安装

环境准备

Python：>= 3.8（推荐 3.10）
PyTorch：>= 1.10（可选 TensorFlow 或 JAX）
GPU：NVIDIA CUDA 11.8+（跑大模型必须）
磁盘：至少 10 GB（模型缓存）

安装命令

最小安装

pip install transformers

特定框架

# PyTorch（默认）
pip install transformers torch

# TensorFlow
pip install transformers tensorflow

# JAX / Flax
pip install transformers flax jax

验证安装

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
print(classifier("I love HuggingFace!"))
# [{'label': 'POSITIVE', 'score': 0.9998}]

常见安装问题

Q1: `ImportError: Using the` Trainer`with`PyTorch`requires`accelerate``

运行 pip install accelerate。Transformers 的 Trainer 强依赖 accelerate。

Q2: 模型下载慢 / 被墙

设置镜像：export HF_ENDPOINT=https://hf-mirror.com 或用 huggingface-cli download --local-dir ./model gpt2

Q3: OOM 加载大模型

使用 device_map="auto" + load_in_8bit=True：

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b", 
    device_map="auto", load_in_8bit=True)

示例

HuggingFace pipeline：一行代码搞定 6 大任务

目标

展示 pipeline() 的「瑞士军刀」能力：一个 API 覆盖情感分析、命名实体识别、文本生成、翻译、文生图、语音识别。

完整代码

from transformers import pipeline
from PIL import Image

# ─── 1. 情感分析 ───
sentiment = pipeline("sentiment-analysis")
print(sentiment("This product is amazing!"))
# [{'label': 'POSITIVE', 'score': 0.999...}]

# ─── 2. 命名实体识别 ───
ner = pipeline("ner", grouped_entities=True)
print(ner("Elon Musk founded SpaceX in Hawthorne, California."))
# [{'entity_group': 'PER', 'word': 'Elon Musk', 'score': 0.99}, ...]

# ─── 3. 文本生成 ───
generator = pipeline("text-generation", model="gpt2")
print(generator("The future of AI is", max_length=30, num_return_sequences=1)[0]["generated_text"])

# ─── 4. 翻译（英语 → 中文） ───
translator = pipeline("translation_en_to_zh", model="Helsinki-NLP/opus-mt-en-zh")
print(translator("Hello, how are you today?"))
# [{'translation_text': '你好，你今天好吗？'}]

# ─── 5. 文本摘要 ───
summarizer = pipeline("summarization")
text = """The Apollo program, also known as Project Apollo, was the third United States 
human spaceflight program carried out by NASA, which succeeded in landing the first 
humans on the Moon from 1969 to 1972."""
print(summarizer(text, max_length=30, min_length=10))

# ─── 6. 零样本分类 ───
classifier = pipeline("zero-shot-classification")
print(classifier(
    "I need to renew my passport and apply for a visa",
    candidate_labels=["travel", "cooking", "education", "finance"],
))

运行步骤

pip install transformers torch sentencepiece sacremoses pillow
python pipeline_demo.py

预期输出

6 个任务的推理结果均正确输出。首次运行会自动从 HF Hub 下载模型到 ~/.cache/huggingface/。

教程

HuggingFace Transformers 入门教程

1. Tokenizer：文本 → 数字

模型不懂文字，只懂 token ID。Tokenizer 是翻译官：

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
tokens = tokenizer("Hello, world!")
print(tokens)  # {'input_ids': [101, 7592, 1010, 2088, 999, 102], 'attention_mask': [1,1,1,1,1,1]}

# 解码回去
print(tokenizer.decode(tokens['input_ids']))  # [CLS] hello, world! [SEP]

关键概念：

特殊 token：[CLS] 句首，[SEP] 分隔，[PAD] 填充
attention_mask：0 表示忽略该位置（padding）
subword 分词：OOV 词被拆为子词，如 tokenization → token + ##ization

2. Model：加载预训练权重

from transformers import AutoModelForSequenceClassification, AutoModelForCausalLM

# BERT 系（编码器）
bert = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# GPT 系（解码器）
gpt = AutoModelForCausalLM.from_pretrained("gpt2")

# T5 系（编码-解码）
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

from_pretrained() 自动下载 config.json + 权重到缓存。

3. Trainer：一行训练

from transformers import Trainer, TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=16,
    num_train_epochs=3,
    evaluation_strategy="epoch",
    save_strategy="epoch",
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
)
trainer.train()

4. LoRA 微调：消费级 GPU 微调 LLaMA

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,                     # rank
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["q_proj", "v_proj"],  # 仅训练 Q/K/V 投影
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.0622%

仅训练 0.06% 参数，微调效果接近全量微调！

5. 模型部署：Text Generation Inference

# Docker 一键部署 Llama-3 推理服务
docker run -p 8080:80 \
  -v $PWD/models:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id meta-llama/Meta-Llama-3-8B-Instruct

6. 架构族速查

架构	类型	代表模型	强项
BERT	编码器	bert-base, roberta	NLU：分类/NER/QA
GPT	解码器	gpt2, llama, mistral	NLG：对话/生成
T5	编-解码	t5, bart, flan-t5	翻译/摘要/指令
ViT	视觉	vit, swin, dinov2	图像分类
CLIP	多模态	clip, siglip	图文检索
Whisper	语音	whisper	ASR 识别
Stable Diffusion	生成	stable-diffusion-xl	文生图

思考题

为什么 attention_mask 对 padding 位置设 0？pytorch 中 mask 值为 0 还是 -inf？
LoRA 为什么只训练 Q/V 投影而不训练 FFN？r=8 的 rank 如何影响效果？
BERT 和 GPT 的 tokenizer 有何本质区别？为什么 GPT 不能直接用 BERT tokenizer？

参考资料

暂无参考文献

HuggingFace Transformers

概览

HuggingFace Transformers

安装

环境准备

安装命令

最小安装

推荐全家桶

特定框架

验证安装

常见安装问题

Q1: `ImportError: Using the` Trainer`with`PyTorch`requires`accelerate``

Q2: 模型下载慢 / 被墙

Q3: OOM 加载大模型

示例

HuggingFace pipeline：一行代码搞定 6 大任务

目标

完整代码

运行步骤

预期输出

教程

HuggingFace Transformers 入门教程

1. Tokenizer：文本 → 数字

2. Model：加载预训练权重

3. Trainer：一行训练

4. LoRA 微调：消费级 GPU 微调 LLaMA

5. 模型部署：Text Generation Inference

6. 架构族速查

思考题

参考资料

C#

Verilog / SystemVerilog

Remix

Pandas

概览

HuggingFace Transformers

安装

环境准备

安装命令

最小安装

推荐全家桶

特定框架

验证安装

常见安装问题

Q1: ImportError: Using the TrainerwithPyTorchrequiresaccelerate``

Q2: 模型下载慢 / 被墙

Q3: OOM 加载大模型

示例

HuggingFace pipeline：一行代码搞定 6 大任务

目标

完整代码

运行步骤

预期输出

教程

HuggingFace Transformers 入门教程

1. Tokenizer：文本 → 数字

2. Model：加载预训练权重

3. Trainer：一行训练

4. LoRA 微调：消费级 GPU 微调 LLaMA

5. 模型部署：Text Generation Inference

6. 架构族速查

思考题

参考资料

Q1: `ImportError: Using the` Trainer`with`PyTorch`requires`accelerate``