GSI’s Elasticsearch K-NN Plugin Application Brief

2y ago
22 Views
2 Downloads
628.23 KB
5 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Julia Hutchens
Transcription

GSI’s Elasticsearch k-NN PluginApplication BriefIntroductionThe GSI Elasticsearch k-NN plugin allows you to perform nearest neighbor vector similarity search usingElasticsearch’s dense vector type. The plugin provides a high-performance, low-latency, low-power, billionscale vector similarity search solution that allows users to combine traditional Elasticsearch text filters withvector search queries for a more advanced search.Elasticsearch was originally designed as a text and document search engine. The GSI Elasticsearch k-NN pluginexpands Elasticsearch’s ability to search beyond just text. The plugin opens the door to other data types likeimages, video, audio—any data type that can be represented as a compact, semantically rich numeric vector.Vectors can be used to search for the most similar items (nearest neighbors) to a query and can accelerateapplications such as visual search, face recognition, natural language processing (NLP), and recommendationsystems.Scales to Billions of DocumentsCore Elasticsearch uses a computationally-expensive exhaustive match all which makes it too slow to handlethe large-scale initial retrieval step in a vector similarity search pipeline. This limits core Elasticsticsearch toscoring documents on a small, filtered set of vectors.Instead of using an exhaustive match all search, the GSI plugin performs an approximate nearest neighborvector similarity search. This allows the GSI plugin to scale to billions of documents and handle the importantinitial retrieval step in a search pipeline. 2021, GSI Technology, Inc.Proprietary1

GSI’s Elasticsearch k-NN PluginSupports Multimodal SearchMultimodal search, where images and text are combined to form a powerful search, is a rapidly emerging trend.Retail segments, such as fashion and home design, are one particular driver of multimodal search because theyrely heavily on visual search since style is often difficult to describe using text.In addition to visual search, text search is also a required part of the solution because product information, suchas item description, item title, category, and brand are generally used to filter the results that are returned as partof the visual search. Thus, what is needed is a solution that allows for multimodal search.Open-source search libraries do not handle multimodal search well. The popular open-source vector searchlibraries, such as FAISS and NMSLIB, are good at nearest neighbor vector search, but lack support for efficientfiltering of data. For example, it would be difficult for FAISS to use textual filters to filter the results previouslyreturned from a nearest neighbor vector search.The GSI Elasticsearch plugin allows users to perform multimodal searches. For example, as seen in Figure 1below, a user could first perform a vector similarity search to find similar images (vector) to a query image andthen filter those results using a color (text) filter.{}"query": {"bool": {"must": [{"gsi similarity": {"field": "data","vector": [1.21439, 3.10212, -1.16291],"topk": 20}}],"filter": [{"term": {"color": "red"}}]}}Figure 1: Multimodal Query ExampleA user first finds the closest 20 neighbors (topk:20) to the query“vector” and then applies a text filter using product informationto find a subset of those items that are the color “red”. 2021, GSI Technology, Inc.Proprietary2

GSI’s Elasticsearch k-NN PluginNo Need to Reindex DocumentsOther plugins require custom field type(s) and codecs to support vector search, so a user who already has indicesusing the dense vector field type needs to change his index mapping and reindex their documents to use thoseother plugins. This reindexing is costly since most other vector plugins use graph-based approaches for search,which suffer from slow indexing and memory-intensive indexing.As seen in Figure 2 below, the GSI Elasticsearch k-NN plugin uses the core Elasticsearch dense vector fieldtype and index mapping, so there is no need to reindex.{}"mappings": {"properties": {"my vector1": {"type": "dense vector","dims": 2}}}Figure 2: GSI Elasticsearch k-NN Plugin ExampleThe GSI Elasticsearch plugin requires no reindexing becausethe mapping for the dense vector field type is the same as core Elasticsearch’s.Provides a Simple, Ready-for-Production Vector Search ArchitecturePopular open-source vector search libraries, such as FAISS and NMSLIB, provide good results for nearestneighbor benchmarks, but they are not easily integrated into a production search system. For example, a userwould still need to address the difficult tasks of building a distributed system that scales and managing thecomplex indexing of those distributed systems.With Elasticsearch, that is already taken care of because two of its key strengths are horizontal scaling andindex management. The GSI Elasticsearch k-NN plugin leverages those strengths and extends the power ofElasticsearch even further by integrating nearest neighbor vector similarity search directly into Elasticsearch.The plugin uses the existing Elasticsearch query interface, resulting in a simple-to-use, ready-for-productionvector similarity search solution. 2021, GSI Technology, Inc.Proprietary3

GSI’s Elasticsearch k-NN PluginHigh Performance with Low PowerBenchmark #1Based on: erformanceGIST-960-Euclidean ResultsEngineQPSAverageLatency (ms)P95 Latency(ms)Recall@10Elastic 7.60.571752.741850.741Vespa 7.190.141.32756.61955.631GSI Plugin8.11231351GSI Plugin - APU Power: 16.1W, CPU Power: 225W-----Total: 241.1WCPU Power: 325W (1 x Intel Xeon E5-2680 v3 2.50GHz Haswell)SIFT-128-Euclidean ResultsEngineQPSAverageLatency (ms)P95 Latency(ms)Recall@10Elastic 7.63.29303.96337.891Vespa 7.190.149.14109.33148.91GSI Plugin15.863.364.71GSI Plugin - APU Power: 15.85W, CPU Power: 225W-----Total: 240.85WCPU Power: 325W (1 x Intel Xeon E5-2680 v3 2.50GHz Haswell) 2021, GSI Technology, Inc.Proprietary4

GSI’s Elasticsearch k-NN PluginBenchmark #2: DemoDB Fashion (700K)DatasetSizeEnginePlatformQPSAverage Latency(sec) Per QueryFashion700KElastic 7.8.0Xeon Gold 51150.83.31Fashion700KGSI PluginAPU: One Query200.05Fashion700KGSI PluginAPU: Batch of erage Latency(sec) Per QueryImageNet14MElastic 7.8.0Xeon Gold 51150.303.31ImageNet14MGSI PluginAPU: One Query100.1ImageNet14MGSI PluginAPU: Batch of 10Queries540.0185DB ImageNet (14M)Contact us at elasticsearch@gsitechnology.comfor a customized demo. 2021, GSI Technology, Inc.Proprietary5

Elasticsearch was originally designed as a text and document search engine. The GSI Elasticsearch k-NN plugin expands Elasticsearch’s ability to search beyond just text. The plugin opens the door to other data types like images, video, audio—any data type that can be re

Related Documents:

Configuring Elasticsearch data nodes.199 Configuring the Elasticsearch master node.200 Configuring the Elasticsearch query node .201 Validating that the Elasticsearch

then the Elasticsearch project has become one of the most popular open-source projects on GitHub. Based on Apache Lucene internally for indexing and search, Elasticsearch converts data such as logs that you supply into a JSON-like document structure using key-value pairs to identify the strings and values that are present in the data.File Size: 448KB

Elasticsearch Terms Cluster: All nodes Node: Elasticsearch instance Index: Set of documents (group of shards) Shard: Subset of documents in an index Apache Lucene instance Primary (like RAID 0) and Replica (like RAID 1) Document: JSON object in Elasticsearch

Install CentOS Server These are the steps to enable TLS encryption and HTTPS communication in Elasticsearch with CentOS (has not been tested with other Linux system but should work) with one or multiple nodes using Elasticsearch self-signed certificate authority. When more than one node is in user (multiple nodes,

Elasticsearch and Cassandra are two of the widely used databases today with Elasticsearch showing a more recent resurgence due to its unique full text search feature, akin to that of a search engine, contrasting with the conventional query language-based methods used to perform data searching and retrieval operations.

genshoah@gmail.com GSI@genshoah.org www.genshoah.org Generations of the Shoah International Newsletter May 2018 Dear Members and Friends, Registration is now open for the intergenerational conference GSI is having in conjunction with the World Federation of Jewish Child Survivors of the Holocaust and Descendants.

Beam Dynamics Simulations for the New Superconducting CW Heavy Ion LINAC at GSI 15 min F. Dziuba Current Status of the Demonstrator Project at GSI 20 min Break M. Basten Status and First Measurements of the CH-Structures for the cw Heavy Ion LINAC@GSI

there are questions to answer and diagrams to label. Marieb (2007) is the core anatomy and physiology text used, which corresponds to local undergraduate pre-registration and learning beyond registration curriculum’s at the University of Southampton. A recommended reading list is provided.