文档
spaCy 10 行代码:NER + 依存句法 + 可视化
目标
用 spaCy 一条 Pipeline 完成:分词、词性标注、命名实体识别 (NER)、依存句法分析,并用 displaCy 可视化。
完整代码
import spacy
from spacy import displacy
# ─── 1. 加载模型 ───
nlp = spacy.load("en_core_web_sm")
# ─── 2. 处理文本 ───
text = "Elon Musk announced that Tesla will build a new factory in Shanghai next year, investing $2 billion."
doc = nlp(text)
# ─── 3. 分词 + 词性 + 依存分析 ───
print("=" * 70)
print(f"{'Token':<12} {'POS':<10} {'依存关系':<16} {'Head':<12}")
print("=" * 70)
for token in doc:
print(f"{token.text:<12} {token.pos_:<10} {token.dep_:<16} {token.head.text:<12}")
# ─── 4. 命名实体识别 ───
print("\n" + "=" * 40)
print("命名实体 (NER):")
print("=" * 40)
for ent in doc.ents:
print(f" {ent.text:<25} | {ent.label_:<10} | {spacy.explain(ent.label_)}")
# ─── 5. 名词短语 ───
print("\n名词短语:")
for chunk in doc.noun_chunks:
print(f" → {chunk.text}")
# ─── 6. 依存句法可视化 ───
displacy.render(doc, style="dep", jupyter=False, options={"compact": True})
# 或保存到文件:
# displacy.serve(doc, style="dep") # 启动 Web 服务器
# ─── 7. NER 可视化 ───
displacy.render(doc, style="ent", jupyter=False)
# displacy.serve(doc, style="ent")
运行步骤
pip install spacy
python -m spacy download en_core_web_sm
python spacy_demo.py
预期输出
======================================================================
Token POS 依存关系 Head
======================================================================
Elon PROPN compound Musk
Musk PROPN nsubj announced
announced VERB ROOT announced
that SCONJ mark build
...
Shanghai PROPN pobj in
...
$ SYM quantmod billion
2 NUM compound billion
billion NUM pobj of
. PUNCT punct announced
============================================
命名实体 (NER):
============================================
Elon Musk | PERSON | People, including fictional
Tesla | ORG | Companies, agencies
Shanghai | GPE | Countries, cities, states
next year | DATE | Absolute or relative dates
$2 billion | MONEY | Monetary values