Monday, May 18, 2009

PDF Decoding Bugfix and Open Source from Adobe

I fixed a bug today, which was causing some of the scripts to fail decoding.
Basically, the JavaScript contained within a PDF file can be part of a special tag where it escapes special characters like (, ), &, \, and so on. The problem with this is that some of the regular expressions would incorrectly show up like this:
(hOPz).replace(/\\&/g,BiY+(13+20-8))
p=p.replace(new RegExp('\\\\b'+e(c)+'\\\\b','g'),k[c])

In those cases, you could make them function correctly by replacing '\\' with a single '\' fixes the problem.
(hOPz).replace(/\&/g,BiY+(13+20-8))
p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])


This bug has been corrected in any future decodings see examples here and here.

I originally thought this might be related to custom Adobe Reader javascript engine, since Adobe uses a custom version of SpiderMonkey. I would still like to integrate Adobe reader's custom JavaScript engine whenever processing PDF files, however, their website says
In some Adobe products, Adobe uses a modified version of the open source SpiderMonkey code. Use of that source is subject to the Mozilla Public License Version 1.1 (the License). You may obtain a copy of the License on the Mozilla website or in the download files.

However, read on and you will see "Download files - Coming soon!". There goes that plan! Anyone at Adobe care to comment on when they plan to release this code?

No comments:

Post a Comment