Scan Content for Plagiarism
The "Scan Content for Plagiarism" step allows you to check text for potential plagiarism by comparing it against billions of web pages, academic papers, and other published sources. This powerful tool uses Originality.ai's advanced plagiarism detection technology to help you ensure content originality and avoid copyright issues.
How to Configure the Scan Content for Plagiarism Step
Configuring the step requires setting the parameters shown below:
[IMAGE: Screenshot of the Scan Content for Plagiarism step configuration panel]
Text
Enter the text content you want to scan for plagiarism. This can be a paragraph, article, or any written content you need to verify for originality.
You can use Liquid syntax to dynamically reference text from a previous step:
Minimum Length (Optional)
The minimum text length required for reliable plagiarism detection. If your text is shorter than this value, the step will return a warning about potential reduced accuracy.
Default: 100 characters
Match Sensitivity (Optional)
Adjust how sensitive the plagiarism detection should be. Higher sensitivity will detect smaller matches and paraphrased content, while lower sensitivity will only flag more substantial, direct matches.
Range: 1-5 (1 = low sensitivity, 5 = high sensitivity) Default: 3
Output Format
The Scan Content for Plagiarism step returns a JSON object containing detailed analysis results:
[IMAGE: Screenshot showing an example of the Scan Content for Plagiarism results in the AirOps interface]
Understanding the Output Metrics
original_text: The first portion of the analyzed text (truncated for display purposes)
originality_score: A score from 0 to 1 indicating the overall originality of the content (higher = more original)
is_original: A boolean value indicating if the system considers the content adequately original (true/false)
plagiarism_score: A score from 0 to 1 indicating the overall level of potential plagiarism (higher = more potential plagiarism)
analysis: Detailed information about the plagiarism detection
originality_percentage: Originality score expressed as a percentage
plagiarism_percentage: Plagiarism score expressed as a percentage
matches: Array of specific matched text segments found during the analysis
matched_text: The text segment that matches an existing source
similarity: How similar the match is to the source (0-1 scale)
source_url: The URL of the matching source
source_title: The title of the matching source
source_date: The publication date of the matching source
word_count: Total number of words in the analyzed text
character_count: Total number of characters in the analyzed text
language: Detected language of the content
How to Use the Results
You can access the plagiarism scan results in subsequent workflow steps using Liquid syntax. For example:
To access the originality score:
To check if content is considered original:
To loop through matched sources in an LLM prompt:
Common Use Cases
Verifying the originality of submitted content
Quality control for content creation workflows
Checking AI-generated content for potential plagiarism
Ensuring academic integrity of papers and articles
Pre-publication review to avoid copyright issues
Validating guest blog submissions
Reviewing website content before launch
How to Continue if the Scan Fails
By default, the Scan Content for Plagiarism step will terminate the workflow if it fails. To continue the workflow if the step fails, click "Continue" at the bottom of the settings panel.
[IMAGE: Screenshot showing the "Continue" option in the step settings]
The step will return the following keys when it fails:
output
: this will benull
error
:message
: the message returned from the stepcode
: the error code representing the error
Common error causes include:
Text too short for reliable analysis
Unsupported language
Service unavailability
Example Workflow
Here's a common pattern using the Scan Content for Plagiarism step:
Start with content submitted for publication or review
Use the "Scan Content for Plagiarism" step to analyze the content for originality
Use a Conditional step to branch the workflow based on the plagiarism score:
If plagiarism score is high (>0.3): Reject and return for revision
If plagiarism score is moderate (0.1-0.3): Route to human review
If plagiarism score is low (<0.1): Proceed with normal publication
For content requiring revision or review, use an LLM step to:
Identify specific problematic sections
Suggest potential rewrites
Generate a detailed feedback report with sources of potential plagiarism
Output the appropriate response based on the plagiarism analysis
This workflow enables you to implement a robust content originality policy, catching potential copyright issues before publication while providing constructive feedback for content improvement.
Last updated
Was this helpful?