Connect DataSource

This guide details how to add a data source, deploy it, and monitor its status within a project's dashboard in CrawlDesk, accessible at Crawldesk Dashboard. After selecting a project, you can configure data sources to define what content the AI will process. Ensure you have admin access to the project before proceeding.

Prerequisites:

Active CrawlDesk account with access to a project.
Browser access to the project dashboard.
Valid data source details (e.g., website URL or file access permissions).

info

While we provide self-serve options to connect data sources, these are best suited for testing and evaluation.

For production use—especially when connecting developer documentation to ASK AI search—we strongly recommend submitting a support request to the CrawlDesk team.

Our team will handle the crawling process to ensure your documentation is indexed accurately, prevent crawling issues, and guarantee proper content injection with fine-tuning.

Data Source Management

Add a Data Source
Monitor Data Source

Follow these steps to add a new data source to a CrawlDesk project. This process configures the AI to crawl and index your content.

Access the Project Dashboard
Log in to CrawlDesk and navigate to https://app.crawldesk.com/dashboard/. Click on the desired project to open its dashboard.
Initiate Data Source Creation
In the project dashboard, locate and click the Add Data Source button (typically found in the data sources section).
Complete the Setup Wizard
The setup wizard, titled "Connect Your Data Source," will guide you through the following steps:
- Choose Data Source Type: Select the type of data source from the available options:
  - Website
  - PDF
  - Google Doc
  - Confluence
  - Notion
  - Google Drive
    Click Next to proceed.
- Enter Details: Provide the required information:
  - Name*: Enter a descriptive name for the data source (e.g., "Developer Docs").
  - Website URL* (for Website type): Specify the URL to crawl (e.g., https://example.com).
  - Max Pages (for Website type): Set a limit for the number of pages to crawl (e.g., 10).
    Use the Back button to revise or Next to continue.
- Review & Deploy: The wizard validates your input with the following checks:
  - Validating Source
  - Checking Crawling Service Health
  - Adding Data to Queue
    Once validated, the system confirms with "Deployment Started" and a message: "Your data source is now queued. We will notify you when deployment is complete." Click View Data Sources to return to the list.

Once a data source is added, CrawlDesk initiates crawling and indexing. You can monitor its status and progress in the Data Source List.

Access the Data Source List
From the project dashboard at https://app.crawldesk.com/dashboard/, navigate to the Data Sources section. This displays a table titled "Data Sources: Manage and monitor your connected data sources."
Review Summary Metrics
The list provides an overview of all data sources:
- Total Sources: Number of data sources (e.g., 1).
- Active Sources: Number of fully deployed sources (e.g., 0).
- Deploying: Number of sources in progress (e.g., 1).
- Failed: Number of sources with errors (e.g., 0).
  Use the search bar to filter by name, URL, or namespace, and apply filters (e.g., "All").
Check Individual Data Source Status
Each data source entry includes:
- Name: E.g., "Developer Docs."
- Status: E.g., Inactive, Deploying, or Active.
- URL: E.g., https://docusaurus.io/docs/next/markdown-features/admonitions.
- Deployment Progress: Percentage complete (e.g., 0%).
- Date: Creation or last updated timestamp (e.g., Sep 27, 2025).
View Detailed Progress
Click on a data source (e.g., "Developer Docs") to access its detailed view, which includes:
- Overview:
  - Processing Overview: Progress bar (e.g., 0% Complete).
  - Pages: Number processed (e.g., 0/0).
  - Sections: Structured content count (e.g., 0).
  - Chunks: Search-ready content units (e.g., 0).
- Logs: View crawling logs for troubleshooting.
- URLs: List of crawled URLs (e.g., 6).
- Settings: Adjust configurations (e.g., console.log(source)).
- Metadata: Unique ID (e.g., feb8e785-6241-4b19-92f3-4f33bcd04db0), creation, and last updated dates (e.g., 27/09/2025, 17:52:52).

tip

Ensure the URL is publicly accessible or properly authenticated for crawling.
Monitor deployment progress in the Data Source List after initiating.
Check the "Failed" count for any issues and review logs for details.

Data Source Management​

Data Source Management