Web Page Scrape

Scrape text, markdown or HTML from a website

The "Web Page Scrape" Step allows you to automate a text/markdown/HTML scrape from a specific URL. You can combine this with an Iteration Step to scrape through multiple websites, and parse the output separately.

Configuring the "Web Page Scrape" Step

Configuring the step requires setting the parameters shown below:


Add the specific URL you want the step to scrape.

Maximum Length

Optionally, you may limit the number of characters returned by the step.

This parameter can be helpful to limit the amount of text passed to a subsequent LLM step, which has a limited context window.

1 token is approximately 4 characters in English. To estimate the number of characters, you should pass to an LLM step, multiply the # of tokens you want to pass by 4.

How to continue if the Web Scrape step fails

By default, the code step will terminate the workflow if it fails. However, to continue the workflow if the step fails, simply click Continue at the bottom of the step.

The step will return the following keys:

  • output : this will be null

  • error :

    • message: the message returned from the step

    • code : the error code representing the error

Enable Javascript rendering?

By default, the Web Page Scrape Step will not render websites that use Javascript.

Check this box to enable scraping from websites that use Javascript to help render dynamic content (examples include Facebook, Airbnb, and more).


The maximum time to wait for your webscraped results to return in milliseconds.

Type of Proxy:

Use the residential proxy for sites that require more reliability and higher success rates. On the other hand, use the datacenter proxy where reliability and success rates are not a concern.


  • Private IP addresses that are housed in data centers

  • Offer higher speed but they are less reliable in terms of anonymity

  • More likely to be detected and blocked by websites and internet services.


  • A real IP address attached to a physical location

  • Webscraping will appear as if it's coming from a residential home in a certain location

  • Considered more legitimate and less likely to be blocked by websites

Note: Using a residential proxy is more expensive than using the datacenter, so it's good to measure this against your use-case when deciding which proxy to use.

Output Type

  • Text: No formatting included

  • HTML: Extract headers and formatting from a website in HTML

  • Markdown: Extract the headers from a website in markdown

Last updated