autofaiss.indices package
Submodules
autofaiss.indices.distributed module
Building the index with pyspark.
- autofaiss.indices.distributed.run(faiss_index, embedding_reader, memory_available_for_adding, num_cores_per_executor=None, temporary_indices_folder='hdfs://root/tmp/distributed_autofaiss_indices', embedding_ids_df_handler=None, nb_indices_to_keep=1, index_optimizer=None)[source]
Create indices by pyspark.
- Parameters
faiss_index (faiss.Index) – Trained faiss index
embedding_reader (EmbeddingReader) – Embedding reader.
memory_available_for_adding (str) – Memory available for adding embeddings.
num_cores_per_executor (int) – Number of CPU cores per executor
temporary_indices_folder (str) – Folder to save the temporary small indices
embedding_ids_df_handler (Optional[Callable[[pd.DataFrame, int], Any]]) – The function that handles the embeddings Ids when id_columns is given
nb_indices_to_keep (int) – Number of indices to keep at most after the merging step
index_optimizer (Optional[Callable]) – The function that optimizes the index
- Return type
autofaiss.indices.faiss_index_wrapper module
This file contains a wrapper class to create Faiss-like indices
- class autofaiss.indices.faiss_index_wrapper.FaissIndexWrapper(d, metric_type)[source]
Bases:
abc.ABCThis abstract class is describing a Faiss-like index It is useful to use this wrapper to use benchmarking functions written for faiss in this library
- abstract add(x)[source]
Function that adds vectors to the index
- Parameters
x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)
- abstract search(x, k)[source]
Function that search the k nearest neighbours of a batch of vectors
- Parameters
x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)
k (int) – Number of neighbours to retrieve for every vector
- Returns
D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.
I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.
autofaiss.indices.index_factory module
functions that fixe faiss index_factory function
autofaiss.indices.index_utils module
useful functions to apply on an index
- autofaiss.indices.index_utils.format_speed_ms_per_query(speed)[source]
format the speed (ms/query) into a nice string
- Return type
- autofaiss.indices.index_utils.get_bytes_from_index(index)[source]
Transforms a faiss index into a bytearray.
- Return type
- autofaiss.indices.index_utils.get_index_from_bytes(index_bytes)[source]
Transforms a bytearray containing a faiss index into the corresponding object.
- Return type
Index
- autofaiss.indices.index_utils.get_index_size(index)[source]
Returns the size in RAM of a given index
- Return type
- autofaiss.indices.index_utils.parallel_download_indices_from_remote(fs, indices_file_paths, dst_folder)[source]
Download small indices in parallel.
- autofaiss.indices.index_utils.quantize_vec_without_modifying_index(index, vecs)[source]
qantize a batch of vectors
- Return type
- autofaiss.indices.index_utils.search_speed_test(index, query=None, ksearch=40, timout_s=10.0)[source]
return the average and 99p search speed
autofaiss.indices.memory_efficient_flat_index module
This file contain a class describing a memory efficient flat index
- class autofaiss.indices.memory_efficient_flat_index.MemEfficientFlatIndex(d, metric_type)[source]
Bases:
autofaiss.indices.faiss_index_wrapper.FaissIndexWrapperFaiss-like Flat index that can support any size of vectors without memory issues. Two search functions are available to use either batch of smaller faiss flat index or rely fully on numpy.
- add(x)[source]
Function that adds vectors to the index
- Parameters
x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)
- add_all(filename, nb_items)[source]
Function that adds vectors to the index from a memmory-mapped array
- Parameters
filename (string) – path of the 2D numpy array of shape (nb_items, vector_dim) on the disk
nb_items (int) – number of vectors in the 2D array (the dim is already known)
- search(x, k, batch_size=4000000)[source]
Function that search the k nearest neighbours of a batch of vectors
- Parameters
- Returns
D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.
I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.
- search_numpy(xq, k, batch_size=4000000)[source]
Function that search the k nearest neighbours of a batch of vectors. This implementation is based on vectorized numpy function, it is slower than the search function based on batches of faiss flat indices. We keep this implementation because we can build new functions using this code. Moreover, the distance computation is more precise in numpy than the faiss implementation that optimizes speed over precision.
- Parameters
- Returns
D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.
I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.
autofaiss.indices.search module
function related to search on indices