Web Scrape

Using Web Scrape in Knowledge Bases

Web Scrape allows you to import content directly from websites into your Knowledge Base, making it easy to leverage existing pages on your website for use cases like Internal Linking. Choose between scraping individual pages or multiple pages via a sitemap based on your specific needs.

Single Page

To scrape content from a single webpage:

  • Click "Add Data" and select "Web Scrape" from the dropdown menu

  • Choose the "Single Page" option

  • Enter the complete URL of the webpage you want to scrape (e.g., www.domain.com/blog/article-title)

  • Click "Start Scraping" to begin the process

  • AirOps will extract and import the content into your Knowledge Base, making it searchable

Multiple Pages via Sitemap

To scrape content from multiple webpages using a Sitemap:

  • Click "Add Data" and select "Web Scrape" from the dropdown menu

  • Choose the "Multiple Pages (via Sitemap)" option

  • Enter the sitemap URL (typically found at domain.com/sitemap.xml)

  • Optionally, specify URL patterns to include or exclude specific pages (e.g., include "/blog/" or exclude "/support/")

  • Toggle "Scrape Metadata Only" if you only need page titles and descriptions rather than full content

  • Click "Start Scraping" to begin the process and import all matching pages

Once a sitemap has been added to your Knowledge Base, you can keep the content up-to-date by clicking the gear icon next to "1 Sync Scheduled" and selecting either:

  • "Sync Data" to manually trigger an immediate sync and fetch the latest content

  • "Schedule Data Sync" to set up an automated recurring sync on your preferred schedule (e.g., weekly at 9:30 AM on Mondays), ensuring your Knowledge Base stays current without manual intervention

Last updated

Was this helpful?