Use GroundX when you want a document to come back as JSON your application can
use. For example, a utility statement can return statement.account_number,
statement.total_amount_due, and service.service_address.
This guide uses the GroundX Python SDK to create the workflow, upload a document, and read the extracted JSON.
get_extract to read the extracted JSON.Create a YAML file. Names such as statement and service become top-level
objects in the returned JSON. Under each one, fields: is the list of values
GroundX should extract.
This YAML tells GroundX to return JSON like this:
Keep this file focused on the JSON your application needs. Use names your
application will read, such as statement or service, not names that describe
how extraction runs.
Use prepare_extraction_yaml to check the YAML and produce the setting you pass
as extract when you create the workflow.
You do not need to inspect prepared.workflow_groups in most applications. Pass
it to GroundX as the workflow’s extract setting.
Create the workflow with the settings from the previous step. Then assign the workflow to the bucket where you will upload documents.
Use client.workflows.add_to_account(...) instead when the workflow should be
the account default.
Upload documents to the bucket that has the workflow assigned to it. Use
process_level="full" so GroundX runs the workflow during ingest.
After ingest completes, find the processed document and request its extracted JSON.
The result uses the same names from statement.yaml.
When a value is missing or wrong, change the smallest part of the YAML that explains the miss.
description, identifiers, or
instructions.statement values are wrong, improve the prompt under
statement:.Then prepare the YAML again, update the GroundX workflow, ingest another document, and read the result again.