Convert PDF URL to Text
The "Convert PDF URL to Text" step allows you to extract the textual content from a PDF document available at a specified URL. This powerful tool enables you to analyze, process, and work with text content from PDFs without manual copying or extraction.
How to Configure the Convert PDF URL to Text Step
Configuring the step requires setting the parameters shown below:
[IMAGE: Screenshot of the Convert PDF URL to Text step configuration panel]
PDF URL
Enter the full URL of the PDF document you want to convert to text. This should be a direct link to a PDF file accessible via the internet.
You can use Liquid syntax to dynamically reference a PDF URL from a previous step:
Important: The URL must point directly to a PDF file that is publicly accessible. Password-protected PDFs cannot be processed.
Page Range (Optional)
Specify which pages of the PDF document you want to extract:
All Pages (default): Extract text from the entire document
Specific Pages: Extract text from selected pages only
When selecting "Specific Pages", you can specify:
Individual pages (e.g., "1, 3, 5")
Ranges (e.g., "1-5")
A combination of both (e.g., "1-3, 5, 7-9")
Output Format (Optional)
Choose how you want the extracted text to be formatted:
Plain Text (default): Simple text extraction without preserving layout
Markdown: Attempts to preserve headings, lists, and basic formatting as markdown
Structured: Returns JSON with page-by-page text extraction
Output Format
The Convert PDF URL to Text step returns the extracted text in your chosen format:
Plain Text Format (default)
Markdown Format
Structured Format
[IMAGE: Screenshot showing an example of the Convert PDF URL to Text results in the AirOps interface]
How to Use the Results
You can access the extracted text in subsequent workflow steps using Liquid syntax:
If you selected the "Structured" format, you can access specific pages:
Common Use Cases
Extracting content from PDF reports for analysis
Converting PDF documentation into searchable text
Feeding PDF content into LLMs for summarization or question-answering
Creating searchable knowledge bases from PDF collections
Automating data extraction from PDF forms or invoices
Analyzing PDF research papers or articles
How to Continue if the Conversion Fails
By default, the Convert PDF URL to Text step will terminate the workflow if it fails. To continue the workflow if the step fails, click "Continue" at the bottom of the settings panel.
[IMAGE: Screenshot showing the "Continue" option in the step settings]
The step will return the following keys when it fails:
output
: this will benull
error
:message
: the message returned from the stepcode
: the error code representing the error
Common error causes include:
Invalid PDF URL
Inaccessible PDF (requiring authentication or behind a firewall)
Corrupted PDF file
Scanned PDFs without OCR
PDFs with content as images rather than text
Example Workflow
Here's a common pattern using the Convert PDF URL to Text step:
Start with a URL to a PDF document (research paper, report, etc.)
Use the "Convert PDF URL to Text" step to extract the textual content
Use a "Knowledge Base Search" step to find related information in your knowledge base
Use an LLM step to analyze or summarize the PDF content with additional context
Generate insights, summaries, or answers to specific questions about the document
This workflow enables automated processing of PDF content, allowing you to quickly extract insights from documents without manual review and incorporate that information into your larger knowledge ecosystem.
Last updated
Was this helpful?