首页
/ nano-graphrag项目:多跳问答任务中GraphRAG与基础RAG的性能对比评测

nano-graphrag项目:多跳问答任务中GraphRAG与基础RAG的性能对比评测

2025-07-10 03:33:34作者:傅爽业Veleda

概述

本文将对nano-graphrag项目中的GraphRAG算法与基础RAG(Retrieval-Augmented Generation)在多跳问答任务上的性能进行对比评测。多跳问答是指需要从多个文档中获取信息并进行推理才能得到正确答案的复杂问答任务,这对传统RAG系统提出了挑战。

评测环境准备

首先需要安装必要的Python依赖库:

!pip install ragas nest_asyncio datasets

然后导入所需的库并设置环境:

import os
import json
import sys
sys.path.append("../..")

import nest_asyncio
nest_asyncio.apply()
import logging

logging.basicConfig(level=logging.WARNING)
logging.getLogger("nano-graphrag").setLevel(logging.INFO)
from nano_graphrag import GraphRAG, QueryParam
from datasets import Dataset 
from ragas import evaluate
from ragas.metrics import (
    answer_correctness,
    answer_similarity,
)

数据集准备

我们使用一个公开的多跳问答数据集,包含:

  • 问题-答案对(MultiHopRAG.json)
  • 相关文档语料库(corpus.json)

数据集预处理代码如下:

with open(multi_hop_rag_file) as f:
    multi_hop_rag_dataset = json.load(f)
with open(multi_hop_corpus_file) as f:
    multi_hop_corpus = json.load(f)

corups_url_refernces = {}
for cor in multi_hop_corpus:
    corups_url_refernces[cor['url']] = cor

为了控制评测规模,我们只使用前100个查询:

multi_hop_rag_dataset = multi_hop_rag_dataset[:100]
print("Queries have types:", set([q['question_type'] for q in multi_hop_rag_dataset]))

系统初始化

我们初始化GraphRAG系统,同时启用基础RAG功能以便对比:

graphrag_func = GraphRAG(working_dir="nano_graphrag_cache_multi_hop_rag_test", 
                        enable_naive_rag=True,
                        embedding_func_max_async=4)
graphrag_func.insert(total_corups)

查询示例

让我们看一个具体的查询示例:

query = multi_hop_rag_dataset[0]
print("Question:", query['query'])
print("GroundTruth Answer:", query['answer'])

输出:

Question: Who is the individual associated with the cryptocurrency industry facing a criminal trial on fraud and other charges, as reported by The Verge and TechCrunch, and is accused by prosecutors of committing fraud for personal gain?
GroundTruth Answer: Sam Bankman-Fried

基础RAG回答

print("NaiveRAG Answer:", graphrag_func.query(query['query'], param=naive_rag_query_param))

输出:

NaiveRAG Answer: Sam Bankman-Fried

GraphRAG回答

print("Local GraphRAG Answer:", graphrag_func.query(query['query'], param=local_graphrag_query_param))

输出:

Local GraphRAG Answer: Sam Bankman-Fried

大规模评测

我们使用ragas评估框架对两种方法进行全面评测:

questions = [q['query'] for q in multi_hop_rag_dataset]
labels = [q['answer'] for q in multi_hop_rag_dataset]

# 基础RAG评测
naive_rag_answers = [
    graphrag_func.query(q, param=naive_rag_query_param) for q in tqdm(questions)
]

# GraphRAG评测
local_graphrag_answers = [
    graphrag_func.query(q, param=local_graphrag_query_param) for q in tqdm(questions)
]

评测结果

最终我们得到两种方法的性能对比:

print("Naive RAG results", naive_results)
print("Local GraphRAG results", local_graphrag_results)

输出:

Naive RAG results {'answer_correctness': 0.5896, 'answer_similarity': 0.8935}
Local GraphRAG results {'answer_correctness': 0.7380, 'answer_similarity': 0.8619}

结果分析

从评测结果可以看出:

  1. 答案正确率(Answer Correctness)

    • 基础RAG: 58.96%
    • GraphRAG: 73.80%

    GraphRAG在多跳问答任务上的正确率显著高于基础RAG,提升了约15个百分点。

  2. 答案相似度(Answer Similarity)

    • 基础RAG: 89.35%
    • GraphRAG: 86.19%

    虽然GraphRAG在相似度上略低,但这可能是因为它生成的答案更精确但表述方式与标准答案稍有不同。

结论

nano-graphrag项目中的GraphRAG算法在多跳问答任务上展现出明显优势,特别是在答案正确率方面。这表明基于图结构的检索增强生成方法能够更好地处理需要跨文档推理的复杂问题。对于需要高精度答案的应用场景,GraphRAG是一个值得考虑的选择。