An In-Depth Exploration of GroundX Search
Introduction
In this tutorial, we’ll explore the GroundX search API.
What makes GroundX search results stand out?
Under the hood, the GroundX Search API goes through the following process:
-
First, it analyzes your search query, and improves it if necessary, to carry out a semantic search that helps avoid the hallucination trap most RAG apps fall into with their vectorized searches.
-
Next, it searches through GroundX buckets where your content and its extra search data are stored.
GroundX’s unique extra search data approach helps maintain your content within its original context.
After finding relevant content, the Search API returns much more than just simple, unformatted chunks of raw text like most RAG systems. Instead, GroundX returns a search response bundled with intelligible text chunks along with extra search data, automatically generated documentation and section summaries, source URL, and a new, more readable, contextualized, and performant version of the text chunk ready for LLM use. -
Then, GroundX uses its proprietary re-ranker that scores every chunk for how well it answers the original question to ensure that the most trustworthy results are always on top.
-
And finally, it merges all of this into a simple text block that you can send to the LLM of your choice, so you get accurate, contextualized, and hallucination-free responses from your LLMs when working with your content.
LLM integration
Sounds complex? Luckily GroundX does all the hard stuff for you in the background, you just have to follow these simple steps:
- Make an API search request.
Example:
API Key
- Go to the GroundX dashboard to get your API key.
- GroundX can be installed for Python via pip install groundx
- GroundX can be installed for NPM via npm i -s groundx
- Retrieve the
search.text
property and pass it on to the LLM of your choice.
Example:
LLM API
Make sure to get the API key, endpoints, and SDK from your LLM provider. For example, if using ChatGPT, go to the OpenAI documentation
- Get a response from the LLM using your retrieved data.
Example using :
The end result: retrievals that outperform traditional vector systems and boost the accuracy of LLM completions.
Getting started
Let’s go into the details.
To make an API request you’ll need to do the following:
Ingest content
First, make sure you’ve already uploaded to a GroundX bucket the content you want to search through. See the documentation on content ingestion for more information.
Set up environment
Next, set up your environment with the GroundX SDK.
Bucket or Group ID
Get the ID of the group or bucket you want to search through.
Example:
See the documentation on content ingestion for more information to learn about bucket creation.
Search query string
Create a search query string.
The string can be a question or the keywords you want to search for.
If query strings are more than 30 words long, GroundX automatically rewrites the string with keywords using its internal LLM.
GroundX processes the string to search through your content and its extra search data to retrieve the most relevant results.
Example:
Number of results
Optionally, you can also set the number of results that are returned by the search request. By default, search queries return up to 20 results.
Example:
API request
You’re now ready to make the API search request. Include the ID and search query string in the request body.
Example:
API response
After making the request, you will receive a Search object as the response.
Let’s go over some of the details of the Search schema.
Response sample:
Recommended data
The most significant property returned by the search response is the text
property, which contains a compilation of suggested texts with their corresponding extra search data.
In other words, it’s a string that includes all the automatically rewritten text chunks and the extra search data that was manually added to the original content.
As already mentioned, this content that is automatically generated by GroundX’s internal LLM provides you with intelligible, contextualized, and machine-understandable content that you can straightforwardly pass on to your LLM to work with.
Advanced data handling
Although we recommend passing the search.text
property directly to the LLM you’re working with for response generation, you can also add your own logic to handle the search.results
property.
Let’s take a closer look:
The results
property contains a list of all the original text chunks that matched the search query. Chunks are ordered by a score based on semantic search and re-ranking methodologies.
Here, you can also find the suggested text we’ve been mentioning, which is the more intelligible and performant version of the text chunk.
Furthermore, you can find automatically generated extra search data, like summaries of the document and section the text chunks are found in.
The extra search data added to your content is available as well. And to help keep source information available, the GroundX URL, document ID, and bucket ID of the content where the text chunks were extracted from are also provided.
Final details
You’re now ready to integrate GroundX with your LLM to generate accurate, hallucination-free responses using data from your own content.