An In-Depth Exploration of GroundX Document Ingest — Documentation

Introduction

In this tutorial, we’ll cover how to add or ingest your files to GroundX.

GroundX’s true potential commences with the file ingestion process, one of its key advantages over other RAG solutions.

With our proprietary ingest pipeline, your files undergo three critical processes in which GroundX:

formats your content for LLM use,
parses content into intelligible text chunks,
and generates contextual metadata.

Unlike other RAG solutions that require you to previously convert your files into plain text, Ground X is compatible with a wide variety of file formats, detects document structures, such as tables or page numbers, eliminates clutter, and re-writes content so that it can be clearly understood by an LLM.

Getting started

API Key

Go to the GroundX dashboard to get your API key.
GroundX can be installed for Python via pip install groundx
GroundX can be installed for NPM via npm i -s groundx

Required information

Before we begin, make sure you have the following information:

The ID of the GroundX bucket in which you wish to store your file.
The local path or URL of the file you want to upload.

You may also want to prepare the following optional values:

The file name you wish to give your file once it’s in the GroundX bucket.
Indicate the file type to get the file correctly processed.

Example:

1 bucket_id = 6830
2 file_name = "aristotle-rhetoric.pdf";
3 file_type = "pdf"
4 upload_path = "documents/Aristotle-rhetoric.pdf";

Adding extra search data

Although not required because GroundX automatically generates contextual search data for your files, you can add extra search data to take maximum advantage of GroundX’s search capabilities, help maintain document context in the search query responses, and add tags or notes indicating instructions on how to handle the search results.

Example:

1 search_data = {
2     title: "rhetoric",
3     author: "Aristotle",
4     keywords: ["Ethos", "Pathos", "Logos", "Rhetorical Triangle", "Persuasion"]
5 }

Set up environment

Set up your environment.

Example:

1 from groundx import Document, GroundX
2 
3 client = GroundX(
4     api_key="YOUR_API_KEY",
5 )

Security Note

The “GROUNDX_API_KEY” placeholder represents your API key. We recommend storing your API key as an environment variable and accessing it from there. For this purpose, you can use libraries such as dotenv in Node.js or os in Python.

API request

Make the API request to ingest local documents and include the variables in the request body.

1 response = client.ingest(
2     documents=[
3         Document(
4             bucket_id=bucket_id,
5             file_name=file_name,
6             file_path=upload_path,
7             file_type=file_type,
8             search_data=search_data
9         )
10     ]
11 )

API response

After making the request, you should receive a response with processId and status. This response indicates that GroundX is uploading or ingesting your file into the indicated bucket.

1 {
2     "ingest": {
3         "processId": "23e782ac-3829-4833-965d-e77b4e289885",
4         "status": "queued"
5     }
6 }

Final details

Processing time depends on the size of your files. File size can be up to ten megabytes.

After automatically ingesting your files and eliminating the typical complexity of other RAG solutions, GroundX has prepared your content for searchability and automated response generation for your queries.

1	bucket_id = 6830
2	file_name = "aristotle-rhetoric.pdf";
3	file_type = "pdf"
4	upload_path = "documents/Aristotle-rhetoric.pdf";

1	search_data = {
2	title: "rhetoric",
3	author: "Aristotle",
4	keywords: ["Ethos", "Pathos", "Logos", "Rhetorical Triangle", "Persuasion"]
5	}

1	from groundx import Document, GroundX
2
3	client = GroundX(
4	api_key="YOUR_API_KEY",
5	)

1	response = client.ingest(
2	documents=[
3	Document(
4	bucket_id=bucket_id,
5	file_name=file_name,
6	file_path=upload_path,
7	file_type=file_type,
8	search_data=search_data
9	)
10	]
11	)

1	{
2	"ingest": {
3	"processId": "23e782ac-3829-4833-965d-e77b4e289885",
4	"status": "queued"
5	}
6	}