Debugging GroundX On-Prem
This page discusses the general data-flow model of GroundX On-Prem, and some key approaches to debugging your GroundX On-Prem deployment.
Observability
We recommend installing a metric server, like with the following command:
or we recommend installing a monitoring tool like prometheus. This will allow you to monitor CPU and memory usage on a per-pod and per-node basis, allowing you to profile failures due to inadequate resources.
Profiling Ingest flow
When uploading a document to GroundX On-Prem, the document data flows through the following pods before being uploaded.
The communication between pods is by kafka topic, where the kafka topics are specified here. Below is the same flow between pods with information about kafka communication between the pods.
When debugging, it’s often best to start with a particular documentID
. When calling the ingest endpoint, for instance, you will get a processId
which can be used to retrieve documentID
s with the get_processing_status_by_id endpoint. You can then read the logs throughout the chain of pods and kafka topics in the ingest pipeline to isolate processing issues to a particular point in the pipeline. This can be used, in conjunction with resource metrics, to profile most ingestion issues. Typically, GroundX on-prem fails due to insufficient resource allocation within the ingest pipeline.
Profiling Data
GroundX On-Prem contains a mysql database, which can be accessed by running:
This database contains the processor_relationships
table, which shows the status of processing for a particular document. A la, for instance:
The field processor_id
is an auto-incremented value, meaning it may be inconsistent on certain edge cases, but the vast majority of the time:
results in:
- 3 is usually the layout pods. and if it is complete, the file made it back to layout-webhook.
- 4 is usually mapping step in the pre-process pod. if it is complete, the file made it to summary-client.
- 8 is usually the document re-writer summary-client
These can be useful in profiling the traversal of a document throughout various pods.