autofaiss.indices package

Submodules

autofaiss.indices.distributed module

Building the index with pyspark.

autofaiss.indices.distributed.run(faiss_index, embedding_reader, memory_available_for_adding, num_cores_per_executor=None, temporary_indices_folder='hdfs://root/tmp/distributed_autofaiss_indices', embedding_ids_df_handler=None, nb_indices_to_keep=1, index_optimizer=None)[source]

Create indices by pyspark.

Parameters
  • faiss_index (faiss.Index) – Trained faiss index

  • embedding_reader (EmbeddingReader) – Embedding reader.

  • memory_available_for_adding (str) – Memory available for adding embeddings.

  • num_cores_per_executor (int) – Number of CPU cores per executor

  • temporary_indices_folder (str) – Folder to save the temporary small indices

  • embedding_ids_df_handler (Optional[Callable[[pd.DataFrame, int], Any]]) – The function that handles the embeddings Ids when id_columns is given

  • nb_indices_to_keep (int) – Number of indices to keep at most after the merging step

  • index_optimizer (Optional[Callable]) – The function that optimizes the index

Return type

Tuple[Optional[Index], Optional[Dict[str, str]]]

autofaiss.indices.faiss_index_wrapper module

This file contains a wrapper class to create Faiss-like indices

class autofaiss.indices.faiss_index_wrapper.FaissIndexWrapper(d, metric_type)[source]

Bases: abc.ABC

This abstract class is describing a Faiss-like index It is useful to use this wrapper to use benchmarking functions written for faiss in this library

abstract add(x)[source]

Function that adds vectors to the index

Parameters

x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)

abstract search(x, k)[source]

Function that search the k nearest neighbours of a batch of vectors

Parameters
  • x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)

  • k (int) – Number of neighbours to retrieve for every vector

Returns

  • D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.

  • I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.

autofaiss.indices.index_factory module

functions that fixe faiss index_factory function

autofaiss.indices.index_factory.index_factory(d, index_key, metric_type, ef_construction=None)[source]

custom index_factory that fix some issues of faiss.index_factory with inner product metrics.

autofaiss.indices.index_utils module

useful functions to apply on an index

autofaiss.indices.index_utils.format_speed_ms_per_query(speed)[source]

format the speed (ms/query) into a nice string

Return type

str

autofaiss.indices.index_utils.get_bytes_from_index(index)[source]

Transforms a faiss index into a bytearray.

Return type

bytearray

autofaiss.indices.index_utils.get_index_from_bytes(index_bytes)[source]

Transforms a bytearray containing a faiss index into the corresponding object.

Return type

Index

autofaiss.indices.index_utils.get_index_size(index)[source]

Returns the size in RAM of a given index

Return type

int

autofaiss.indices.index_utils.parallel_download_indices_from_remote(fs, indices_file_paths, dst_folder)[source]

Download small indices in parallel.

autofaiss.indices.index_utils.quantize_vec_without_modifying_index(index, vecs)[source]

qantize a batch of vectors

Return type

ndarray

autofaiss.indices.index_utils.search_speed_test(index, query=None, ksearch=40, timout_s=10.0)[source]

return the average and 99p search speed

Return type

Dict[str, float]

autofaiss.indices.index_utils.set_search_hyperparameters(index, param_str, use_gpu=False)[source]

set hyperparameters to an index

Return type

None

autofaiss.indices.index_utils.speed_test_ms_per_query(index, query=None, ksearch=40, timout_s=5.0)[source]

Evaluate the average speed in milliseconds of the index without using batch

Return type

float

autofaiss.indices.memory_efficient_flat_index module

This file contain a class describing a memory efficient flat index

class autofaiss.indices.memory_efficient_flat_index.MemEfficientFlatIndex(d, metric_type)[source]

Bases: autofaiss.indices.faiss_index_wrapper.FaissIndexWrapper

Faiss-like Flat index that can support any size of vectors without memory issues. Two search functions are available to use either batch of smaller faiss flat index or rely fully on numpy.

add(x)[source]

Function that adds vectors to the index

Parameters

x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)

add_all(filename, nb_items)[source]

Function that adds vectors to the index from a memmory-mapped array

Parameters
  • filename (string) – path of the 2D numpy array of shape (nb_items, vector_dim) on the disk

  • nb_items (int) – number of vectors in the 2D array (the dim is already known)

add_files(embedding_reader)[source]
delete_vectors()[source]

delete the vectors of the index

search(x, k, batch_size=4000000)[source]

Function that search the k nearest neighbours of a batch of vectors

Parameters
  • x (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)

  • k (int) – Number of neighbours to retrieve for every vector

  • batch_size (int) – Size of the batch of vectors that are explored. A bigger value is prefered to avoid multiple loadings of the vectors from the disk.

Returns

  • D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.

  • I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.

search_files(x, k, batch_size)[source]
search_numpy(xq, k, batch_size=4000000)[source]

Function that search the k nearest neighbours of a batch of vectors. This implementation is based on vectorized numpy function, it is slower than the search function based on batches of faiss flat indices. We keep this implementation because we can build new functions using this code. Moreover, the distance computation is more precise in numpy than the faiss implementation that optimizes speed over precision.

Parameters
  • xq (2D numpy.array of floats) – Batch of vectors of shape (batch_size, vector_dim)

  • k (int) – Number of neighbours to retrieve for every vector

  • batch_size (int) – Size of the batch of vectors that are explored. A bigger value is prefered to avoid multiple loadings of the vectors from the disk.

Returns

  • D (2D numpy.array of floats) – Distances numpy array of shape (batch_size, k). Contains the distances computed by the index of the k nearest neighbours.

  • I (2D numpy.array of ints) – Labels numpy array of shape (batch_size, k). Contains the vectors’ labels of the k nearest neighbours.

autofaiss.indices.search module

function related to search on indices

autofaiss.indices.search.knn_query(index, query, ksearch)[source]

Do a knn search and return a list of the closest items and the associated distance

Return type

Iterable[Tuple[Tuple[int, int], float]]

Module contents