PDF compare
Key features
step’s PDF comparison plugin allows users to validate the exactness of a rendered PDF file (called actual) when compared against a model (called expected document).
The default type of comparison is an image-based comparison of every page of the actual document against that of the expected document. However, one of the key features of the plugin is the ability for users to define conditions (currently called Anchors) upon which parts of the document (called Excludes) will be ignored during comparisons. This mechanism allows users to focus their tests on the important area of the document and to ignore areas which may vary for reasons which are not relevant to the test strategy.
Another important point of focus is the ability to extract text strings and match them against other pre-defined expected strings, allowing users to not base their test cases solely on image comparisons but also on the thorough inspection of the business information which is actually contained within the documents.
As you can see on the screenshot below, the PDF Comparison plugin comes with a comprehensive design environment which is directly embedded into step’s web application. Drag-and-drop functionality allows for highly intuitive interactions with the document and the effortless selection of excluded zones.

Main use cases
Two standard use cases will be described in this section. For further information, check out the following document which contains diagrams covering additional comparison use cases: PDFPluginDiagrams.
Basic comparison use case
A basic comparison scenario is a scenario in which two PDF files simply need to be compared to each other. Each page of the actual file will be matched against the corresponding page of the expected file based on page numbers.

Running a simple web-based test
Here’s how to create a basic comparison test from step’s web application.
First, go to the Keywords view and create a new keyword of type “PDF Test”.

Click the “Save and edit” button in order to land directly in your PDF Test lab.

Scroll down until you see the Test box and enter the paths to your actual and expected documents. For the sake of testing, we can use the same document as both the actual and expected to make sure the comparison is successful (since we know we are using an actual document that is identical to the expected).

As a result of the comparison, a success result text box pops up.

Using documents which don’t exactly match would result in a different outcome. In this case, “diff” information is provided in order to help the user figure out why and where the documents differ. A different number of pages as well as differences within two compared pages will be reported using respectively messages and colors in a red/blue “diff’ed” image result:

Automatic headless execution
The same can be done by invoking this newly created keyword from a plan and passing the actual and expected paths as inputs.
Here’s how a very simple test plan would look:

And what the result of the execution is, assuming the actual and expected documents successfully match:

In the event of an unsuccessful comparison (i.e when the actual differs from the expected) error messages will be returned as an output and the diff information will be provided in the form of attachments (attached images):

Mask-based use cases
Important concepts
Conditional exclusion concept
As mentioned above in the section Key Features, one of the strength of step’s PDF Comparison plugin is the ability to define a series of conditions to focus test cases on specific patterns (for instance, page numbers) and areas of the document. Anchors and Excludes are two features which make this possible. The combination of one or more Anchor with one or more Excludes is called a Mask.
Excludes 
Excludes are rectangular zones within a page area which are supposed to be ignored during comparison. This is especially useful when documents contain dynamic data such as document identifiers, which will cause for documents to never match a model exactly, that is, unless the area containing this dynamic data is excluded from comparison.
Anchors 
Anchors serve a conditional purpose: they are used to decide whether or not Excludes have to be applied.
Currently 3 different anchor types are supported:
- Page number based: when selecting this anchor type, at execution time the Excludes defined in the mask will only apply to pages having the same page number as the page where these Excludes are defined. In this case the page number serve as Anchor. For instance: if a set of Excludes is defined on page 2, these Excludes will only apply to the page 2 of the actual document.
- Applies to all pages: when selecting this anchor type, all the Excludes defined in the mask will apply to all pages independently of page content and page numbers. This can be useful to exclude an header that appears on every page. In this case the Anchor is a kind of wildcard.
- Image based: this is an advanced anchor type. In connection with this anchor type users can select specific rectangular areas called Anchors that use image patterns to define whether or not Excludes have to be applied, independently of the page number. This can be very useful to create modular Exclude-sets for a specific document page type which can occur on different pages:
- If Anchors are defined on a specific page of the mask, at execution time the corresponding Excludes (the Excludes defined on the same page of the mask) will only apply to a page if the Anchors are found within that page.
- If no Anchor is defined on a specific page of the mask, the entire page of the mask will be used as Anchor. At execution time the corresponding Excludes will only apply to a page if that page has exactly the same layout as the page of the mask where these Excludes are defined
- A list of Anchors on a specific page and the corresponding Excludes define an Anchor-Excludes combination. It is possible to define an Anchor-Excludes combination per page. For instance, you could define an Anchor-Excludes combination on page 2 using a list of 2 Anchors and 4 Excludes, and an Anchor-Excludes combination on page 4 with an empty list of Anchors and 2 Excludes thus using the all page 4 as Anchor for the corresponding Excludes on page 4.
Masks
Masks are the container for anchors and excludes. Multiple masks can be passed as an input of the PDF Compare Keyword (as a list).
Text Extraction
The text located within the region of an Exclude can be extracted and made available automatically as Keyword output named after the region itself.
Mask design
This section contains instructions regarding the design of masks and manual testing from step’s GUI.
Creating a new test
Currently, the PDF Comparison editor can only be accessed via the creation or edition of a Keyword, as already described in section “Running a simple test” of the “Basic comparison use case”. This will be subject to changes in the future, as mask and mask-related resources will eventually be isolated in their own separate view.
For the time being however, users have to go to the Keyword view, create a new keyword, set its type to PDF Test and click Save and Edit. This will take automatically take them to step’s PDF Comparison editor where design can begin.
Loading a document
In order to load a document, just provide the path as technically seen and to be accessed by the controller into the text box, and click “Load”.

Excludes & Anchors
You can toggle between the Anchor and Exclude views respectively by clicking the pin and scissors icons. New regions will then be automatically added by drawing rectangles directly onto the document.

A drag-and-drop action on the document will automatically create a new region (i.e a Anchor or a Exclude, depending on which view you is currently selected), add it to the list and text found within the drawn rectangle will be extracted.
Here’s an example in the case of an Exclude:

You can define if the text content of an exclude region shall be extracted during comparison on the exclude panel. If text extraction is enabled, the content of the region will be extracted and it will be returned as keyword output.

And another in the event of an Anchor creation:

Running a simple web-based test
Once you’ve successfully designed a Mask (i.e selected a list of Anchors and Excludes), you can test out your design using the “Test” box. All you need to do is pick a page or pages to be tested against and the actual document from which these pages originate from. Then click the “Compare” button.

Test results
Successful matches (i.e exact comparisons) will display the message “Document match. No difference found.”

While failed comparisons will display a diff screenshot pattern, providing you with diagnosis information in case the result of the test differs from what you assumed it would be

Automatic headless execution
Mask list
The PDF comparison keyword can take a list of masks as inputs. As previously mentioned, as of right now, keywords themselves contain the masks to be applied during comparison.
This means that the mask list is a series of keywords containing the masks to apply. For instance, assuming I’ve designed two masks inside two keywords named “Mask1” and “Mask2”, I can pass these masks as an input of the PDF_compare in a plan.
It will be a comma-separated list:
In addition, the page order option can be passed via the box “unsortedPageTolerance”.
Extracted text
If you’ve checked the extraction text box in either an Anchor or an Exclude region’s definition, the corresponding content will be exposed in the PDF_compare output with the name of the region as a key and the extracted text as a value:
Advanced scenarios summary
The following diagram taken out of the attached slides summarizes the most advanced comparison use cases:

Plugin installation
Assuming you’re running a properly licensed installation of step Enterprise Edition, there’s no need to install the plugin itself, as it comes prepackaged as part of the distribution.
We currently recommend installing ghost script version 9.22 on your controller and agents. GS is available here for download.
Environment properties
Two properties need to be defined within step’s configuration file called step.properties and located in the conf folder of the controller:
plugins.pdftest.gsexe=/path/to/ghost/script
plugins.pdftest.scenariodir=/path/to/pdf/folder
In order to scale executions on agents, GS will also have to be installed on each agent host, and the same properties will have to be set within the corresponding AgentConf.json files, under their respective “conf/” folders.
"properties":{
"plugins.pdftest.gsexe" : "/path/to/ghost/script",
"plugins.pdftest.scenariodir" : "/path/to/pdf/folder"
}
Special keyword “PDF Compare”
A special pre-created keyword is available starting with version 3.8.0 of step. In the event that the keyword is not present, you can create a new one using the keyword type PDF Test.
The keyword PDF Compare is a generic (i.e reusable) keyword in the sense that PDF Files can be passed dynamically as expected and actual document inputs. In addition, the list of masks to be applied during comparison is also a dynamic input. This means that the same keyword can be used for all PDF comparisons in your projects.