crawl_website | Documentation

Note1: This endpoint is currently not supported for on-prem deployments. Note2: The source_url must include the protocol, http:// or https://.

Supported Document Types and Ingest Capacities

Upload the content of a publicly accessible website for ingestion into a GroundX bucket. This is done by following links within a specified URL, recursively, up to a specified depth or number of pages. Note1: This endpoint is currently not supported for on-prem deployments. Note2: The `source_url` must include the protocol, http:// or https://. [Supported Document Types and Ingest Capacities](https://docs.eyelevel.ai/documentation/fundamentals/document-types-and-ingest-capacities)

Authentication

X-API-Keystring

API Key authentication via header

Request

This endpoint expects an object.

websiteslist of objectsRequired

callbackUrlstringOptionalformat: "uri"

The URL that will receive processing event updates.

callbackDatastringOptional

A string that is returned, along with processing event updates, to the callback URL.

Response

Website successfully queued

ingestobject

Errors

400

Bad Request Error

401

Unauthorized Error

1	from groundx import GroundX, WebsiteSource
2
3	client = GroundX(
4	api_key="YOUR_API_KEY_HERE",
5	)
6
7	client.documents.crawl_website(
8	websites=[
9	WebsiteSource(
10	bucket_id=1234,
11	source_url="https://my.website.com",
12	cap=10,
13	depth=2,
14	search_data={
15	"key": "value"
16	},
17	)
18	],
19	)