Barcode processing PDF files

Use open source applications combined with PowerShell to process barcodes from PDF files.

Barcode processing PDF files is a common ability for file handling and scanning software.  What can you do if you are working from a 3rd parties Windows 10 machine without admin rights and so cannot “install” anything?

Tools & research sources

Overview

  1. Scan hardcopy to PDF or obtain PDF file containing barcode
  2. Convert the PDF to a PNG (via Ghostscript)
  3. Process the PNG to extract information from the barcode (via zBar)
  4. Write the barcode information out to a CSV and rename the original PDF
  5. Clean up working and temporary files before exit

Grab the zip of project files and support folders from above to follow along.

Detailed Process

Step 1
You want to scan the file to PDF at least 150dpi, you may have to go up to 300dpi in order to get a high enough quality file to work with during the conversion to PNG.

Step 2
Use “Start-Process” PowerShell command to run the GhostScript executable “gswin32c.exe”

Using basic PowerShell file handling get all the PDF files and process them in a loop until you have converted them all to PNG.  300dpi in my case here, any lower and the next step would yield no barcode information.

Step 3
Again, using basic PowerShell, read all the files in the “/_Processed” folder and loop over them.

Process each PNG file with zBar in turn and use the defined function “Get-ProcessOutput” to capture the output of zBar.  (thanks to jackgruber for work on this function)

It is this output that contains the value of the barcode.

Step 4
Using the return from the function “Get-ProcessOutput” access the StandardOutput and perform string manipulation to get the clean value of the barcode.

In my example I created a simple object to hold all the information I wished to save to a CSV file.  This object is created using “PSCustomObject” and written to CSV with “Export-Csv”

Having all the working values for the file being processing in the custom object makes it easy to copy the current file to the “/_Converted” folder and apply the correct name using the data taken from the previously read barcode.

This nicely cleans up the scanned PDF auto file name that scanners like to assign into a useable piece of information for an end user.

Step 5
All that is left to do is remove the temporary PNG files from “/_Processed” and the original scan files from “/_Scanned”

You may wish to disable the deletion of the original scans or hold a copy elsewhere in case there are unforeseen processing errors in the output.  Nothing worse that taking hard copy back to the scanner!!

Conclusion

Please use the project zip to help your task of barcode processing PDF files. There are options to tweak with Ghostscript and zBar, along with the string handling of the output to suit. You might also need to tune performance if running 1000’s of PDF files at once.

Ideally, I like to run all this from a subfolder within my users “Downloads” folder.  It’s usually the one place that is free from backup or replication processes and crucially, as a basic user, I have read, write, execution permission.

Leave a Reply

Your email address will not be published. Required fields are marked *