# Web Scrape

## Using Web Scrape in Knowledge Bases

Web Scrape allows you to import content directly from websites into your Knowledge Base, making it easy to leverage existing pages on your website for use cases like Internal Linking. Choose between scraping individual pages or multiple pages via a sitemap based on your specific needs.

<figure><img src="/files/jNptTEzOpFMeLniWPU2w" alt=""><figcaption></figcaption></figure>

## Single Page

To scrape content from a single webpage:

* Click "Add Data" and select "Web Scrape" from the dropdown menu
* Choose the "Single Page" option
* Enter the complete URL of the webpage you want to scrape (e.g., [www.domain.com/blog/article-title](http://www.domain.com/blog/article-title))
* Click "Start Scraping" to begin the process
* AirOps will extract and import the content into your Knowledge Base, making it searchable

<figure><img src="/files/r4KeU6Go3BQ0qEUOpWIo" alt=""><figcaption></figcaption></figure>

### Multiple Pages via Sitemap

To scrape content from multiple webpages using a Sitemap:

* Click "Add Data" and select "Web Scrape" from the dropdown menu
* Choose the "Multiple Pages (via Sitemap)" option
* Enter the sitemap URL (typically found at domain.com/sitemap.xml)
* Optionally, specify URL patterns to include or exclude specific pages (e.g., include "/blog/" or exclude "/support/")
* Toggle "Scrape Metadata Only" if you only need page titles and descriptions rather than full content
* Click "Start Scraping" to begin the process and import all matching pages

<figure><img src="/files/4Gef5mOrjbTp3osRkydB" alt=""><figcaption></figcaption></figure>

Once a sitemap has been added to your Knowledge Base, you can keep the content up-to-date by clicking the gear icon next to "1 Sync Scheduled" and selecting either:

* "Sync Data" to manually trigger an immediate sync and fetch the latest content
* "Schedule Data Sync" to set up an automated recurring sync on your preferred cadence. Sitemap connectors support **Hour**, **Day**, **Week**, and **Month** schedules. Choose Month for large or slowly-changing sites where weekly resyncs are heavier than you need.

{% @arcade/embed url="<https://app.arcade.software/share/sqgFTsbvoWnJJgwiRYIU>" flowId="sqgFTsbvoWnJJgwiRYIU" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.airops.com/context/memory-stores/add-data/web-scrape.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
