# Cloudfront Agent Analytics

This guide walks you through configuring Amazon CloudFront **real-time logs** to stream into AirOps via **Amazon Kinesis Data Firehose**. Once connected, AirOps will automatically analyze your traffic, classify AI bot visits (ChatGPT, Claude, Perplexity, Google, and more), and surface insights in your Agent Analytics dashboard.

***

### Prerequisites

* An active **AirOps** account with Agent Analytics enabled
* An **AWS account** with access to CloudFront, Kinesis Data Firehose, and IAM
* A **CloudFront distribution** serving your website
* Your **Workspace API key:** available in Settings (left nav bar) > Workspace
* Your **Brand Kit ID:** available in the url by navigating to Context (left nav bar) > BrandKits, something in the form of `https://app.airops.com/<YOUR_WORKSPACE_SLUG>/data/brand_kits/<YOUR_BRAND_KIT_ID>`

***

### Step 1: Create a Kinesis Data Firehose Delivery Stream

1. Open the **Amazon Kinesis** console and navigate to **Data Firehose**.
2. Click **Create Firehose stream**.
3. Configure the source and destination:
   * **Source:** `Direct PUT`
   * **Destination:** `HTTP Endpoint`
4. Name your stream (e.g., `airops-cloudfront-analytics`).

#### Configure the HTTP Endpoint Destination

| Setting               | Value                                                                         |
| --------------------- | ----------------------------------------------------------------------------- |
| **HTTP endpoint URL** | `https://xyz-staging.airops.com/api/v1/cloudfront/<YOUR_BRAND_KIT_ID>/ingest` |
| **Access key**        | Your AirOps Workspace API key                                                 |
| **Content encoding**  | `GZIP`                                                                        |
| **Retry duration**    | `300` seconds (recommended)                                                   |

<figure><img src="https://3762890407-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FX2n5yPRPynbnWuO4SH0M%2Fuploads%2FVj41I6mzmUrWTy5eAlr2%2FScreenshot%202026-04-02%20at%206.52.32%E2%80%AFPM.png?alt=media&#x26;token=33c45fe0-7d89-4008-9a49-c85f7a823972" alt=""><figcaption></figcaption></figure>

#### Configure Buffering Hints

| Setting             | Recommended Value |
| ------------------- | ----------------- |
| **Buffer size**     | `5 MiB`           |
| **Buffer interval** | `60` seconds      |

These settings control how frequently Firehose delivers batches to AirOps. These are the recommended settings to not receive a 429 errors from our servers.

#### Configure Backup Settings

* **S3 backup bucket:** Select or create an S3 bucket for failed delivery backups.
* **Backup mode:** `Failed data only` (recommended).

5. Click **Create Firehose stream**.

***

### Step 2: Create a CloudFront Real-Time Log Configuration

1. Open the **CloudFront** console.
2. In the left navigation, go to **Logging > Add > Amazon Cloudwatch Logs**.
3. Select Deliver to Amazon Kinesis Firehose.
4. Select Destination Stream: Your configured Firehose
5. Configure the following additional settings:

| Setting           | Value  |
| ----------------- | ------ |
| **Fields**        | `all`  |
| **Output format** | `JSON` |

#### Select Log Fields

Select **all** of the following fields — AirOps requires these for full analytics:

| Field                 | Required |
| --------------------- | -------- |
| `timestamp(date)`     | Yes      |
| `timestamp(time)`     | Yes      |
| `c-ip`                | Yes      |
| `cs-method`           | Yes      |
| `cs-uri-stem`         | Yes      |
| `cs-uri-query`        | Yes      |
| `cs(User-Agent)`      | Yes      |
| `cs(Referer)`         | Yes      |
| `cs-protocol-version` | Yes      |
| `cs-protocol`         | Yes      |
| `cs-bytes`            | Yes      |
| `sc-status`           | Yes      |
| `sc-bytes`            | Yes      |
| `sc-content-type`     | Yes      |
| `time-taken`          | Yes      |
| `time-to-first-byte`  | Yes      |
| `ssl-protocol`        | Yes      |
| `ssl-cipher`          | Yes      |
| `x-edge-location`     | Yes      |
| `x-edge-result-type`  | Yes      |
| `x-host-header`       | Yes      |

5. Click **Create configuration**.

***

### Step 3: Attach the Log Configuration to Your Distribution

1. Go to **CloudFront > Distributions** and select your distribution.
2. Navigate to the **Behaviors** tab.
3. Select the behavior you want to track (typically `Default (*)`) and click **Edit**.
4. Under **Real-time log configuration**, select the `airops-analytics` configuration you just created.
5. Click **Save changes**.

Repeat for any additional cache behaviors you want to monitor.

***

### Step 4: Verify the Connection

After attaching the configuration, data should begin flowing within **2-5 minutes**.

1. Generate some traffic to your CloudFront distribution (visit your website).
2. Check the Firehose **Monitoring** tab in the AWS console to confirm records are being delivered.
3. In your **AirOps dashboard**, navigate to **Agent Analytics** — you should see traffic data appearing shortly.

#### Troubleshooting

| Symptom                              | Likely Cause            | Fix                                                                                |
| ------------------------------------ | ----------------------- | ---------------------------------------------------------------------------------- |
| Firehose shows delivery errors (401) | Invalid API key         | Verify your API key in AirOps Settings > Integrations                              |
| Firehose shows delivery errors (429) | Rate limit exceeded     | Increase buffer interval or buffer size to reduce delivery frequency               |
| No data in Firehose monitoring       | Log config not attached | Confirm the real-time log config is attached to your distribution behavior         |
| Data in Firehose but not in AirOps   | Processing delay        | Wait up to 10 minutes; check that your Brand Kit ID is correct in the endpoint URL |

***

### Rate Limits

The AirOps ingestor enforces a rate limit of **5 requests per minute per API key**. With the recommended buffer settings (5 MiB / 60 seconds), you will stay well within this limit. If you have very high-traffic distributions, increase the buffer size or interval accordingly.

***

### What AirOps Tracks

Once connected, AirOps automatically:

* **Ingests all CloudFront requests** — page views, assets, API calls
* **Classifies AI bot traffic** — identifies visits from ChatGPT, Claude, Google, Perplexity, Meta, Apple, Bing, Bytedance, Common Crawl, and others
* **Categorizes crawl behavior** — distinguishes between AI training crawlers, AI search fetches, AI indexing, and user-initiated AI browsing
* **Surfaces analytics** — traffic breakdowns, bot-vs-human ratios, and crawl trends in your dashboard
