BaseDocumentTransformer#
- class langchain_core.documents.transformers.BaseDocumentTransformer[source]#
Abstract base class for document transformation.
A document transformation takes a sequence of Documents and returns a sequence of transformed Documents.
Example
class EmbeddingsRedundantFilter(BaseDocumentTransformer, BaseModel): embeddings: Embeddings similarity_fn: Callable = cosine_similarity similarity_threshold: float = 0.95 class Config: arbitrary_types_allowed = True def transform_documents( self, documents: Sequence[Document], **kwargs: Any ) -> Sequence[Document]: stateful_documents = get_stateful_documents(documents) embedded_documents = _get_embeddings_from_stateful_docs( self.embeddings, stateful_documents ) included_idxs = _filter_similar_embeddings( embedded_documents, self.similarity_fn, self.similarity_threshold ) return [stateful_documents[i] for i in sorted(included_idxs)] async def atransform_documents( self, documents: Sequence[Document], **kwargs: Any ) -> Sequence[Document]: raise NotImplementedError
Methods
__init__
()atransform_documents
(documents, **kwargs)Asynchronously transform a list of documents.
transform_documents
(documents, **kwargs)Transform a list of documents.
- __init__()#
- async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document] [source]#
Asynchronously transform a list of documents.