Skip to main content

Connect DataSource

This guide details how to add a data source, deploy it, and monitor its status within a project's dashboard in CrawlDesk, accessible at Crawldesk Dashboard. After selecting a project, you can configure data sources to define what content the AI will process. Ensure you have admin access to the project before proceeding.

Prerequisites:

  • Active CrawlDesk account with access to a project.
  • Browser access to the project dashboard.
  • Valid data source details (e.g., website URL or file access permissions).
info

While we provide self-serve options to connect data sources, these are best suited for testing and evaluation.

For production use—especially when connecting developer documentation to ASK AI search—we strongly recommend submitting a support request to the CrawlDesk team.

Our team will handle the crawling process to ensure your documentation is indexed accurately, prevent crawling issues, and guarantee proper content injection with fine-tuning.

Data Source Management

Follow these steps to add a new data source to a CrawlDesk project. This process configures the AI to crawl and index your content.

  1. Access the Project Dashboard
    Log in to CrawlDesk and navigate to https://app.crawldesk.com/dashboard/. Click on the desired project to open its dashboard.

  2. Initiate Data Source Creation
    In the project dashboard, locate and click the Add Data Source button (typically found in the data sources section).

  3. Complete the Setup Wizard
    The setup wizard, titled "Connect Your Data Source," will guide you through the following steps:

    • Choose Data Source Type: Select the type of data source from the available options:
      • Website
      • PDF
      • Google Doc
      • Confluence
      • Notion
      • Google Drive
        Click Next to proceed.
    • Enter Details: Provide the required information:
      • Name*: Enter a descriptive name for the data source (e.g., "Developer Docs").
      • Website URL* (for Website type): Specify the URL to crawl (e.g., https://example.com).
      • Max Pages (for Website type): Set a limit for the number of pages to crawl (e.g., 10).
        Use the Back button to revise or Next to continue.
    • Review & Deploy: The wizard validates your input with the following checks:
      • Validating Source
      • Checking Crawling Service Health
      • Adding Data to Queue
        Once validated, the system confirms with "Deployment Started" and a message: "Your data source is now queued. We will notify you when deployment is complete." Click View Data Sources to return to the list.
tip
  • Ensure the URL is publicly accessible or properly authenticated for crawling.
  • Monitor deployment progress in the Data Source List after initiating.
  • Check the "Failed" count for any issues and review logs for details.