人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_多种查询处理器

可以看到上面首先去安装了一下向量数据库。llma-index-vector-stores-chroma

可以看到先创建一个chroma的客户端 chroma_client

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_多种查询处理器_02

可以看到创建了一个chroma_client这个客户端以后,然后用客户端创建一个chroma_collection这个集合,给这个集合起个名字

然后给llamaindex,定义的ChromaVectorStore,指定一下这个chroma_collection,然后这样就是用我们指定的向量数据库,其实就是指定了使用。chroma_collection来进行存储。也就是指定了

chroma来进行向量存储了.,这样得到一个vector_store,向量存储的变量,然后,再用,StorageContext这个存储容器,指定,我们的

vector_store这个变量,存储.然后再去使用VectorStoreIndex去创建index索引,指定storage_context.

然后就可以使用index索引进行指定similarity_top_k=2返回相关度最高的前两个值.并且返回vector_retriever,这个向量查询器.

然后

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_人工智能_03

使用向量查询器,就可以去查询内容了.可以看到里面传入要查询的内容

results = vector_retriever.retrieve("llama2有多少参数")就可以开始查询了.

然后看一下得到的结果是一样的。

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_多种查询处理器_04

然后我们再来看一个不使用chroma向量数据库,而是使用这个Qdrant这个向量数据库的案例。

下面是代码：

首先也是需要先安上这个向量数据库。

2. 使用自定义的 Vector Store，以 `Qdrant` 为例：
!pip install llama-index-vector-stores-qdrant

然后下面是代码去使用这个向量数据库。

from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient(location=":memory:")
collection_name = "demo"
collection = client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

vector_store = QdrantVectorStore(client=client, collection_name=collection_name)
# storage: 指定存储空间
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 创建 index：通过 Storage Context 关联到自定义的 Vector Store
index = VectorStoreIndex(nodes, storage_context=storage_context)

# 获取 retriever
vector_retriever = index.as_retriever(similarity_top_k=2)

# 检索
results = vector_retriever.retrieve("Llama2有多少参数")

print(results[0])
print(results[1])

我看一下它的用法chroma的用法，其实是一样的，

都是首先去创建一个客户端：client，

然后再创建一个存储变量:collection，

然后把存储变量放到存储上下文中:vector_store，

然后再把存储变了vector_store给到存储上下文:storate_context

然后在得到索引:index

然后再把创建出来的索引进行处理获取到查询器:vector_retriever

然后再用查询器进行查询:vector_retriever.retrieve("xxx")

Node ID: bd3ff0be-13b9-4f7c-bac3-71ef38978766
Text: an updated version of Llama 1, trained on a new mix of publicly
available data. We also increased the size of the pretraining corpus
by 40%, doubled the context length of the model, and adopted grouped-
query attention (Ainslie et al., 2023). We are releasing variants of
Llama 2 with 7B, 13B, and 70B parameters. We have also trained 34B
variants,...
Score:  0.790

Node ID: 18282125-5744-4da8-afe6-32b208304571
Text: Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana
Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar
Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi
Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith
Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina
Williams Jian Xiang...
Score:  0.789

然后就可以看到上面得到的两条结果以及他们的相关度。

### 5.2、更多索引与检索方式

LlamaIndex 内置了丰富的检索机制，例如：

- 关键字检索

  - [`BM25Retriever`](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/bm25/)：基于 tokenizer 实现的 BM25 经典检索算法
  - [`KeywordTableGPTRetriever`](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/keyword/#llama_index.core.indices.keyword_table.retrievers.KeywordTableGPTRetriever)：使用 GPT 提取检索关键字
  - [`KeywordTableSimpleRetriever`](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/keyword/#llama_index.core.indices.keyword_table.retrievers.KeywordTableSimpleRetriever)：使用正则表达式提取检索关键字
  - [`KeywordTableRAKERetriever`](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/keyword/#llama_index.core.indices.keyword_table.retrievers.KeywordTableRAKERetriever)：使用[`RAKE`](https://pypi.org/project/rake-nltk/)算法提取检索关键字（有语言限制）

- RAG-Fusion [`QueryFusionRetriever`](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/query_fusion/)

- 还支持 [KnowledgeGraph](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/knowledge_graph/)、[SQL](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/sql/#llama_index.core.retrievers.SQLRetriever)、[Text-to-SQL](https://docs.llamaindex.ai/en/stable/api_reference/retrievers/sql/#llama_index.core.retrievers.NLSQLRetriever) 等等

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_人工智能_05

然后可以看到llamaIndex为我们提供了更丰富的检索工具：

可以看到第1个BM25Retriever可以基于tokenizer实现BM25经典检索算法。

第2个KeywordTableGPTRetriever：使用gpt提取检索关键字进行检索。

第3个KeywordTableSimpleRetriever：支持使用正则表达式提取检索关键字。

第4个KeywordTableRAKERetriever：使用RAKE算法提取减少关键字，但是是有语言限制的。

然后另外还提供了RAG-Fusion QueryFusionRetriever：算法的检索器，这种算法可以把搜索分成多个检索，然后再把检索的结果合起来。

另外还有支持知识图谱点检索：KnowledgeGraph

以及使用SQL进行检索，还有使用Text-to-SQL:文本转成SQL来进行检索的检索。

关于以上的检索方法。上面的代码块中已经把他们帮助文档所在的地址写出来了。可以参考。

然后咱们来看下一个知识点。

### 5.3、Ingestion Pipeline 自定义数据处理流程

LlamaIndex 通过 `Transformations` 定义一个数据（`Documents`）的多步处理的流程（Pipeline）。
这个 Pipeline 的一个显著特点是，**它的每个子步骤是可以缓存（cache）的**，即如果该子步骤的输入与处理方法不变，重复调用时会直接从缓存中获取结果，而无需重新执行该子步骤，这样即节省时间也会节省 token （如果子步骤涉及大模型调用）。

可以看到LlamaIndex其实是支持允许,咱们自己自定义数据处理流程:

这个数据处理流程包含什么呢? 这个自己定义的数据处理流程其实就是包含将数据处理完以后，然后把数据灌入到向量数据库中的过程。

相当于之前代码中的:

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_人工智能_06

相当于图中这个部分,其实就是先把文档进行切割，切割完以后然后在QdrantVectorStore这个组件中，把文档灌入到了向量数据库中。就相当于把这个过程封装在到了QdrantVectorStore中了。

这里允许咱们自定义，其实就是自定义的这个部分。

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_多种查询处理器_07

然后我们再来看LlamaIndex提供的IngestionPipeline这个功能是通过多个步骤来处理数据的，并且它可以把每个步骤都缓存下来，并且把结果也缓存下来，这样的话如果我们处理完一遍某个数据了，然后下次再处理的时候，同样的处理就不会发给大模型了，就可以直接从缓存中获取，这样就节省了token就是节省了钱。同时也减少了执行时间。

然后接下来我们一起看一下如何来自定义这个数据处理流程：

import time

class Timer:
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.end = time.time()
        self.interval = self.end - self.start
        print(f"耗时 {self.interval*1000} ms")

首先我们自定义一段代码，用来统计处理流程的执行时间。

from llama_index.core.indices.vector_store.base import VectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core import VectorStoreIndex
from llama_index.readers.file import PyMuPDFReader
import nest_asyncio

nest_asyncio.apply() # 只在Jupyter笔记环境中需要此操作，否则会报错

#首先获取向量数据库的客户端。
client = QdrantClient(location=":memory:")
#然后创建一个数据集合的名字。
collection_name = "ingestion_demo"

#然后去创建一个向量集合。创建的同时指定向量切分的规则，指定1536个token切分，使用
#COSINE距离来判断句子的边界。
collection = client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

#然后将向量数据库的客户端以及文档集合都给到灌库的这个组件QdrantVectorStore。
# 创建 Vector Store
vector_store = QdrantVectorStore(client=client, collection_name=collection_name)


#然后我们开始自定义数据处理流程，使用IngestionPipeline这个组件。
#可以看到我们在组件IngestionPipeline中传入一个transformations集合
#这个稽核其实就定义了一些数据的处理步骤，下面写了三个数据处理步骤。
#SentenceSplitter 这个是使用这个组件来进行句子切分。
#TitleExtractor这个是我们找了一个组件或者是自己写的，然后用来提取句子的标题。
#这里只是为了做例子来用的。
#OpenAIEmbedding这个是将文本进行向量化。

#可以看到经过下面的处理数据就已经被切分，然后提取标题，然后进行限量处理化了。
#最终把结果存入到现在数据库中 vector_store=vector_store,也就是这一步。
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=300, chunk_overlap=100), # 按句子切分
        TitleExtractor(), # 利用 LLM 对文本生成标题
        OpenAIEmbedding(), # 将文本向量化
    ],
    vector_store=vector_store,
)


#然后剩下的数据处理步骤就跟之前是一样的了，比如首先我们去加载文件。
documents = SimpleDirectoryReader(
    "./data", 
    required_exts=[".pdf"],
    file_extractor={".pdf": PyMuPDFReader()}
).load_data()

#然后在开始处理之前，我们去统计一下时间。
# 计时
with Timer():
    # Ingest directly into a vector db
    # 然后我们调用我们自己定义的数据处理流程。然后把文档交给他去处理。
    pipeline.run(documents=documents)

#然后再去创建索引。
# 创建索引
index = VectorStoreIndex.from_vector_store(vector_store)

#然后就可以获取查询器了。
# 获取 retriever
vector_retriever = index.as_retriever(similarity_top_k=1)

#然后再进行检索，并且打印结果。
# 检索
results = vector_retriever.retrieve("Llama2有多少参数")

print(results[0])

然后我们可以执行一下，并且统计一下执行时间。

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_多种查询处理器_08

可以看到执行使用了8秒多。

#本地保存 `IngestionPipeline` 的缓存
#接下来这个操作非常重要，就是我们来保存上面我们执行代码的这个过程，将它缓存下来。
#执行下面的代码就可以将它缓存下来了，就是保存在当前文件夹的peline_storage这个位置
pipeline.persist("./peline_storage")


#然后当我们再去写这个数据处理流程。
new_pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=300, chunk_overlap=100),
        TitleExtractor(),
        OpenAIEmbedding(), # 将文本向量化

    ],
)

# 加载缓存
# 写完数据处理流程以后，然后我们在执行新的数据处理流程之前加载一下之前数据处理流程的缓存。
new_pipeline.load("./pipeline_storage")

#然后我们开启代码执行的计时功能，然后去执行新的数据处理流程。
with Timer():
    nodes = new_pipeline.run(documents=documents)

人工智能_大模型088_大语言模型开发框架_LlamaIndex002_使用自选向量库(Chroma/Qdrant)_多种查询支持器_自定义数据处理流程_Pipeline---人工智能工作笔记0233_LlamaIndex_09

可以看到如果执行过程中执行三个步骤：

SentenceSplitter(chunk_size=300, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),

结果会是使用70多毫秒，如果我们只执行两个步骤：

SentenceSplitter(chunk_size=300, chunk_overlap=100),
TitleExtractor(),
那么。就只用三毫秒多就执行完了。

是因为他使用了之前的缓存。我们不用担心我们新的数据处理流程会用不到缓存，因为它缓存的所有步骤，所以说即使我们写了一部分步骤，它也可以使用之前的缓存。

此外，也可以用远程的 Redis 或 MongoDB 等存储 `IngestionPipeline` 的缓存，具体参考官方文档：[Remote Cache Management](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/#remote-cache-management)。

`IngestionPipeline` 也支持异步和并发调用，请参考官方文档：[Async Support](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/#async-support)、[Parallel Processing](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/#parallel-processing)。