Documentation
HomeAPISign In
  • Getting Started
    • Overview
      • Core Concepts
      • Building your First Workflow
    • API Reference
  • Your Data
    • Brand Kits
    • Knowledge Bases
      • Add Data
        • Upload Files
        • Web Scrape
        • Import from Google Drive
        • Import from SQL Database
        • Import from Shopify
      • Knowledge Base Search
      • Knowledge Base Metadata
      • Knowledge Base API
  • Building Workflows
    • Workflow Concepts
      • Workflow Inputs
        • Input Types
      • Workflow Outputs
      • Variable Referencing
      • Liquid Templating
    • Workflow Steps
      • AI
        • Prompt LLM
          • Model Selection Guide
          • Prompting Guide
        • Transcribe Audio File
      • Web Research
        • Google Search
        • Web Page Scrape
      • Code
        • Run Code
        • Call API
        • Format JSON
        • Run SQL Query
        • Write Liquid Text
      • Flow
        • Condition
        • Iteration
        • Human Review
        • Content Comparison
        • Error
      • Data
        • Read from Grid
        • Write to Grid
        • Search Knowledge Base
        • Write to Knowledge Base
        • Get Knowledge Base File
      • AirOps
        • Workflow
        • Agent
      • Image & Video
        • Generate Image with API
        • Search Stock Images
        • Fetch Stock Image with ID
        • Resize Image
        • Screenshot from URL
        • Create OpenGraph Image
        • Create Video Avatar
      • SEO Research
        • Semrush
        • Data4SEO
      • Content Quality
        • Detect AI Content
        • Scan Content for Plagiarism
      • Content Processing
        • Convert Markdown to HTML
        • Convert PDF URL to Text
        • Group Keywords into Clusters
      • B2B Enrichment
        • Hunter.io
        • People Data Labs
      • CMS Integrations
        • Webflow
        • WordPress
        • Shopify
        • Contentful
        • Sanity
        • Strapi
      • Analytics Integrations
        • Google Search Console
      • Collaboration Integrations
        • Gmail
        • Google Docs
        • Google Sheets
        • Notion
        • Slack
    • Testing and Iteration
    • Publishing and Versioning
  • Running Workflows
    • Run Once
    • Run in Bulk (Grid)
    • Run via API
    • Run via Trigger
      • Incoming Webhook Trigger
      • Zapier
    • Run on a Schedule
    • Error Handling
  • Grids
    • Create a Grid
      • Import from Webflow
      • Import from Wordpress
      • Import from Semrush
      • Import from Google Search Console
    • Add Columns in the Grid
    • Run Workflows in the Grid
      • Add Workflow Column
      • Run Workflow Column
      • Map Workflow Outputs
      • Review Workflow Run Metadata
    • Review Content in the Grid
      • Review Markdown Content
      • Review HTML Content
      • Compare Content Difference
    • Publish to CMS from Grid
    • Pull Analytics in the Grid
    • Export as CSV
  • Copilot
    • Chat with Copilot
    • Edit Workflows with Copilot
    • Fix Errors with Copilot
  • Monitoring
    • Task Usage
    • Analytics
    • Alerts
    • Execution History
  • Your Workspace
    • Create a Workspace
    • Folders
    • Settings
    • Billing
    • Use your own LLM API Keys
    • Secrets
    • Team and Permissions
  • Chat Agents (Legacy)
    • Agent Quick Start
    • Chat Agents
    • Integrate Agents
      • Widget
      • Client Web SDK
  • About
    • Ethical AI and IP Production
    • Principles
    • Security and Compliance
Powered by GitBook
On this page
  • Using Web Scrape in Knowledge Bases
  • Single Page
  • Multiple Pages via Sitemap

Was this helpful?

  1. Your Data
  2. Knowledge Bases
  3. Add Data

Web Scrape

Last updated 4 days ago

Was this helpful?

Using Web Scrape in Knowledge Bases

Web Scrape allows you to import content directly from websites into your Knowledge Base, making it easy to leverage existing pages on your website for use cases like Internal Linking. Choose between scraping individual pages or multiple pages via a sitemap based on your specific needs.

Single Page

To scrape content from a single webpage:

  • Click "Add Data" and select "Web Scrape" from the dropdown menu

  • Choose the "Single Page" option

  • Click "Start Scraping" to begin the process

  • AirOps will extract and import the content into your Knowledge Base, making it searchable

Multiple Pages via Sitemap

To scrape content from multiple webpages using a Sitemap:

  • Click "Add Data" and select "Web Scrape" from the dropdown menu

  • Choose the "Multiple Pages (via Sitemap)" option

  • Enter the sitemap URL (typically found at domain.com/sitemap.xml)

  • Optionally, specify URL patterns to include or exclude specific pages (e.g., include "/blog/" or exclude "/support/")

  • Toggle "Scrape Metadata Only" if you only need page titles and descriptions rather than full content

  • Click "Start Scraping" to begin the process and import all matching pages

Once a sitemap has been added to your Knowledge Base, you can keep the content up-to-date by clicking the gear icon next to "1 Sync Scheduled" and selecting either:

  • "Sync Data" to manually trigger an immediate sync and fetch the latest content

  • "Schedule Data Sync" to set up an automated recurring sync on your preferred schedule (e.g., weekly at 9:30 AM on Mondays), ensuring your Knowledge Base stays current without manual intervention

Enter the complete URL of the webpage you want to scrape (e.g., )

www.domain.com/blog/article-title