Supported Document Types

The currently supported file types are:

  • pdf
  • jpg (or jpeg)
  • png
  • docx
  • pptx
  • xlsx
  • csv
  • tsv
  • json
  • txt

Additional data sources

The GroundX ingestion pipeline can also crawl and ingest the content from websites using the Crawl Website endpoint.

The crawler scrapes the page content from the source HTML and can sometimes be confused by the structure of the page.