Connect DataSource
This guide details how to add a data source, deploy it, and monitor its status within a project's dashboard in CrawlDesk, accessible at Crawldesk Dashboard. After selecting a project, you can configure data sources to define what content the AI will process. Ensure you have admin access to the project before proceeding.
Prerequisites:
- Active CrawlDesk account with access to a project.
- Browser access to the project dashboard.
- Valid data source details (e.g., website URL or file access permissions).
While we provide self-serve options to connect data sources, these are best suited for testing and evaluation.
For production use—especially when connecting developer documentation to ASK AI search—we strongly recommend submitting a support request to the CrawlDesk team.
Our team will handle the crawling process to ensure your documentation is indexed accurately, prevent crawling issues, and guarantee proper content injection with fine-tuning.
Data Source Management
- Add a Data Source
- Monitor Data Source
Follow these steps to add a new data source to a CrawlDesk project. This process configures the AI to crawl and index your content.
-
Access the Project Dashboard
Log in to CrawlDesk and navigate to https://app.crawldesk.com/dashboard/. Click on the desired project to open its dashboard. -
Initiate Data Source Creation
In the project dashboard, locate and click the Add Data Source button (typically found in the data sources section). -
Complete the Setup Wizard
The setup wizard, titled "Connect Your Data Source," will guide you through the following steps:- Choose Data Source Type: Select the type of data source from the available options:
- Website
- Google Doc
- Confluence
- Notion
- Google Drive
Click Next to proceed.
- Enter Details: Provide the required information:
- Name*: Enter a descriptive name for the data source (e.g., "Developer Docs").
- Website URL* (for Website type): Specify the URL to crawl (e.g.,
https://example.com
). - Max Pages (for Website type): Set a limit for the number of pages to crawl (e.g., 10).
Use the Back button to revise or Next to continue.
- Review & Deploy: The wizard validates your input with the following checks:
- Validating Source
- Checking Crawling Service Health
- Adding Data to Queue
Once validated, the system confirms with "Deployment Started" and a message: "Your data source is now queued. We will notify you when deployment is complete." Click View Data Sources to return to the list.
- Choose Data Source Type: Select the type of data source from the available options:
Once a data source is added, CrawlDesk initiates crawling and indexing. You can monitor its status and progress in the Data Source List.
-
Access the Data Source List
From the project dashboard at https://app.crawldesk.com/dashboard/, navigate to the Data Sources section. This displays a table titled "Data Sources: Manage and monitor your connected data sources." -
Review Summary Metrics
The list provides an overview of all data sources:- Total Sources: Number of data sources (e.g., 1).
- Active Sources: Number of fully deployed sources (e.g., 0).
- Deploying: Number of sources in progress (e.g., 1).
- Failed: Number of sources with errors (e.g., 0).
Use the search bar to filter by name, URL, or namespace, and apply filters (e.g., "All").
-
Check Individual Data Source Status
Each data source entry includes:- Name: E.g., "Developer Docs."
- Status: E.g., Inactive, Deploying, or Active.
- URL: E.g.,
https://docusaurus.io/docs/next/markdown-features/admonitions
. - Deployment Progress: Percentage complete (e.g., 0%).
- Date: Creation or last updated timestamp (e.g., Sep 27, 2025).
-
View Detailed Progress
Click on a data source (e.g., "Developer Docs") to access its detailed view, which includes:- Overview:
- Processing Overview: Progress bar (e.g., 0% Complete).
- Pages: Number processed (e.g., 0/0).
- Sections: Structured content count (e.g., 0).
- Chunks: Search-ready content units (e.g., 0).
- Logs: View crawling logs for troubleshooting.
- URLs: List of crawled URLs (e.g., 6).
- Settings: Adjust configurations (e.g.,
console.log(source)
). - Metadata: Unique ID (e.g.,
feb8e785-6241-4b19-92f3-4f33bcd04db0
), creation, and last updated dates (e.g., 27/09/2025, 17:52:52).
- Overview:
- Ensure the URL is publicly accessible or properly authenticated for crawling.
- Monitor deployment progress in the Data Source List after initiating.
- Check the "Failed" count for any issues and review logs for details.