提供简便的加载、使用和切换各种重排序方法的方式。 尽量减少对用户环境和代码库的侵入性修改。 确保与现有实现相比,不会导致排序性能下降。
# 不管什么模型,仅需一行代码就能初始化一个重排器
ranker = Reranker('cross-encoder')
#可以通过名称自动推断,也可以显式通过model_type指定!
ranker = Reranker(MODEL_NAME_OR_PATH, model_type='cross-encoder')
ranker = Reranker(MODEL_NAME_OR_PATH, model_type='t5', dtype=torch.float32)
ranker = Reranker("jina", api_key = API_KEY)
# 重排序一组文档返回一个RankedResults对象,保留元数据和文档ID
results = ranker.rank(query="I love you", docs=["I hate you", "I really like you"], doc_ids=[0, 1], metadata=[{'source': 'twitter'}, {'source': 'reddit'}])
results.top_k(1)
rerankers库完全集成到HuggingFace transformers生态系统中,可以直接从HuggingFace hub加载任何兼容的模型,也可以加载本地存储的模型,并通过langchain可以无缝的集成到现有应用中.
from rerankers import Reranker
from langchain.retrievers import ContextualCompressionRetriever
from langchain_openai import OpenAI
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
ranker = Reranker("mixedbread-ai/mxbai-rerank-base-v1", verbose=0)
compressor = ranker.as_langchain_compressor(k=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about the minimum wage?"
)
pretty_print_docs(compressed_docs)
总结
参考资料
Python库: https://github.com/answerdotai/rerankers
[2]评估: https://arxiv.org/abs/2408.17344
后台回复“进群”入群讨论