Supported Document Types and Ingestion Restrictions
Supported File Types
The currently supported document types are:
- docx
- pptx
- xlsx
- csv
- tsv
- json
- txt
- hwp
The currently supported image types are:
- bmp
- gif (not animated)
- heif (or heic)
- ico
- jpg (or jpeg)
- png
- svg
- tiff (or tif)
- webp
Restrictions
Maximum File Size
If you are a free trial user, restrictions on document ingestion include:
- Using
python
SDK:- 25 MB (for ingest, ingest_directory, or ingest_remote)
- 8 MB (for ingest_local)
- Using
TypeScript
SDK:- 25 MB (for hosted files using either ingest or ingest_remote)
- 8 MB (for local files using either ingest or ingest_local)
- Using APIs:
- 25 MB (for hosted files using ingest_remote)
- 8 MB (for local files using ingest_local)
If you are a subscription user, restrictions on document ingestion include:
- Using
python
SDK:- 50 MB (for ingest, ingest_directory, or ingest_remote)
- 8 MB (for ingest_local)
- Using
TypeScript
SDK:- 50 MB (for hosted files using either ingest or ingest_remote)
- 8 MB (for local files using either ingest or ingest_local)
- Using APIs:
- 50 MB (for hosted files using ingest_remote)
- 8 MB (for local files using ingest_local)
Why the difference?
The python SDK will automatically upload local files using the EyeLevel.ai file upload endpoint and temporary pre-signed URLs. The API and TypeScript SDK do not include this feature.Maximum Concurrent Files
There is a restriction of a maximum 50 files being concurrently ingested at a time.
Document Type Restrictions
PDF | PPTX | DOCX | HWP
Maximum Pages
For document types with pages including: PDF, PPTX, DOCX, and HWP, there is a restriction of a maximum 750 pages.
CSV | TSV | XLSX
Maximum Words
For document types without pages including: CSV, TSV, and XLSX, there is a restriction of a maximum 375,000 words.
Maximum Rows
For document types with rows including: CSV, TSV, and XLSX, there is a restriction of a maximum 1,500 lines.
TXT
Maximum Words
For raw text files, there is a restriction of a maximum 375,000 words.
JSON
Maximum File Size
For JSON files, there is a maximum 5 MB file size restriction. This restriction is specific JSON files and supercedes the file size restrictions described above.
Maximum Levels
For JSON files, there is a maximum 20 levels of nesting for any JSON object. This refers to dictionaries or arrays with nested dictionaries or arrays.
Visual X-Ray Support
Processed documents of every supported document type include an x-ray analysis that is accessible via API and can be downloaded in the dashboard.
Some documents types do not go through the visual layout analysis pipeline and, therefore, are not viewable in the visual x-ray viewer in the dashboard. The document types that ARE NOT viewable in the x-ray viewer in the dashboard are:
- csv
- json
- tsv
- txt
- xlsx
Additional data sources
The GroundX ingestion pipeline can also crawl and ingest the content from websites using the Crawl Website endpoint.