Using GroundX’s Parser, X-Ray, for Document Understanding

Comprehensive Document Parsing for Modern Applications

In this guide, we’ll introduce EyeLevel’s X-Ray, a modern parser designed to extract high quality data from complicated real-world documents. X-Ray employs cutting edge parsing techniques which are specifically designed to support modern workflows like RAG, Agents, and Document Summarization, allowing developers to connect data from human-centric documents to LLM powered applications.

X-Ray in a Nutshell

You can think of X-Ray as a cocktail of document understanding and advanced parsing approaches packaged together under a single API. To give you an idea, these are some of the components which X-Ray employs to understand human-centric documents:

  • Bespoke document understanding models to detect key elements within documents.
  • Advanced OCR processes which facilitate textual extraction from a variety of document representations.
  • A repairing and reformatting pipeline that improves parse interpretability.
  • A re-contextualization system that promotes fully contextualized summarizations of parsed results.

The upshot is a system which can extract complete ideas from complex documents, and represent those ideas in a way which is easy for both developers and LLMs to understand.

See it for yourself

X-Ray’s fine tuned vision model is one of the most critical components of the system. Over the last 4 years, EyeLevel has collected a comprehensive set of documents from a variety of domains which have been used to train, in our opinion, the highest quality vision model for understanding complex real-world documents to date. You can use this demo to get an idea of how X-Ray works with your documents.

Medical billing receipt
An example of X-Ray identifying and extracting key elements from a real-world document.

Or you can get started with our APIs by following these simple steps:

How to use X-Ray

API Key
X-Ray was added to the SDKs in versions Python 1.3.19 and TypeScript 1.3.24. Older versions of the SDKs do not contain X-Ray support.

1) Creating a Bucket

Once you have a GroundX API key you may wish to create a bucket. Buckets can be used to organize documents into different groupings, which can be useful for certain applications. We can list all available buckets via List Buckets, and create a new bucket via Create Bucket.

1from groundx import GroundX, Document
2import urllib.request, json, time
3
4# authenticating
5client = GroundX(
6 api_key="YOUR_API_KEY",
7)
8
9# creating a new bucket
10bucket_response = client.buckets.create(
11 name="parsed_documents_bucket"
12)
13
14# storing the bucket_id
15bucket_id = bucket_response.bucket.bucket_id

3) Uploading Documents

Uploading documents to a GroundX bucket will automatically trigger X-Ray. There are a variety of uploading options which might be useful for a variety of use cases. In this example we’re uploading a document which is stored locally using Ingest Local.

1ingest_response = client.ingest(
2 documents=[
3 Document(
4 bucket_id=bucket_id,
5 file_name="sample",
6 file_path="sample.pdf",
7 file_type="pdf"
8 )
9 ]
10)

4) Querying Upload Status

Ingesting returns a process_id, which can be used with Get Processing Status to query the progress of the upload. This code checks the status of the process every 10 seconds until ingestion is done.

1while (True):
2 ingest_response = client.documents.get_processing_status_by_id(
3 process_id=ingest_response.ingest.process_id,
4 )
5 if (ingest_response.ingest.status == "complete" || ingest_response.ingest.status == "cancelled"):
6 break
7 if (ingest_response.ingest.status == "error"):
8 raise ValueError('Error Ingesting Document')
9 print(ingest_response.ingest.status)
10 time.sleep(3)

5) Getting X-Ray Results

Now that our documents are fully uploaded we can get all the documents in our bucket via Get Document. We only uploaded a single document, so we can get the one and only document at index 0, and then get the URL in which the X-Ray output is stored.

1# Getting parsed documents from the bucket
2document_response = client.documents.lookup(
3 id=bucket_id
4)
5
6# Getting the X-Ray parsing results for one of the documents
7xray_url = document_response.documents[0].xray_url
8with urllib.request.urlopen(xray_url) as url:
9 data = json.loads(url.read().decode())
10 print(data)

6) Interpreting X-Ray Results

X-Ray provides a rich set of results which may be useful in a variety of use cases. Here are some noteworthy outputs of X-Ray:

  • fileKeywords: A list of keywords which describe the document
  • fileSummary: A summary of the entire document
  • boundingBoxes: Key regions within the document which contain meaningful content.
  • contentType: The type of content a certain chunk is. Textual paragraph, graphical figures, or tables.
  • json: A reformatted representation of graphs and figures in a json format, useful for both LLM and programatic workflows.
  • narrative: A reformatted representation of graphs and figures in a narrative format, often useful in LLM applications.
  • sectionSummary: A contextually summarized representation of a particular section of the document.

This is the full structure of an X-Ray parse:

1// Successful lookup response
2{
3 "fileType": "string", // One of the supported file types
4 "language": "string", // Language detected on the first page of your document during processing
5 "fileKeywords": "string", // Auto-generated comma-delimited list of keywords describing your document
6 "fileName": "string", // Name you gave the document when it was uploaded
7 "fileSummary": "string", // Auto-generated document summary
8 "documentPages": [ // Pages and metadata within the document
9 {
10 "chunks": [ // Semantic objects found on page, some semantic objects are spread across multiple pages
11 {
12 "boundingBoxes": [ // Boxes containing semantic object elements
13 {
14 "bottomRightX": number, // X coordinate for lower right corner of semantic object element
15 "bottomRightY": number, // Y coordinate for lower right corner of semantic object element
16 "pageNumber": number, // Number of page for semantic object element, starting at 1
17 "topLeftX": number, // X coordinate for upper left corner of semantic object element
18 "topLeftY": number, // Y coordinate for upper left corner of semantic object element
19 }
20 ],
21 "chunk": number, // Unique integer ID for the semantic object
22 "contentType": [ // Types of elements represented within the semantic object
23 "string" // "table" | "figure" | "paragraph"
24 ],
25 "json": [ // Element text, reformatted into JSON format, for "table" and "figure" elements only
26 "object" // Auto-generated JSON object describing a section of the information within the "table" or "figure"
27 ],
28 "multimodalUrl": "string", // Element image, for multimodal processing "table" and "figure" elements only
29 "narrative": [ // Element text, reformatted into narrative format, for "table" and "figure" elements only
30 "string" // Auto-generated narrative description of a section of the information within the "table" or "figure"
31 ],
32 "pageNumbers": [ // Pages where semantic object exists
33 number // Number of page where semantic object exists
34 ],
35 "sectionSummary": "string", // Auto-generated section summary for the document section containing the semantic object
36 "suggestedText": "string", // Element text, reformatted for LLM completion
37 "text": "string" // Element text, extracted and unprocessed
38 }
39 ],
40 "height": number, // Pixel height of page image
41 "pageNumber": number, // Number of page, starting at 1
42 "pageUrl": "string", // Hosted URL for the page image
43 "width": number // Pixel width of page image
44 }
45 ],
46 "sourceUrl": "string", // Hosted URL for your document
47}