Monday, April 22, 2013

Update to Jsunpack PDF parsing

Hey guys, I just added a patch from David Dorsey of Visiblerisk, Inc. (Thanks David, you are a boss!).

Below is a sample PDF you can test with just to see how awesome it is:

David described a lot of the improvements and the analysis he performed at the following blog posts entitled "Analyzing Malicious PDFs or: How I Learned to Stop Worrying and Love Adobe Reader"
Part 1:
Part 2:

In brief, this update improves's XFA parsing, PDF encryption tags, and generally the update will help you to decode some malicious PDFs where had trouble decoding them before.

Thanks to David and please if you see any bugs related to this update please report them at and I'll fix them.



  1. I have a question about Does handle obsfucated feature names? For instance /Page can be obfsucated with hex values like /P#61ge. A normal string search for /Page won't catch this for obvious reasons, but I'm wondering if does it?

    1. Yes, has a function to handle #XX values. It is defined as:
      def fixPound(i):
      #returns '#3a' substituted with ':', etc
      #strips newlines, '[', and ']' characters
      #this allows indexing in arrays

      i = re.sub('[\[\]\n]', '', i)
      i = re.sub('<<$', '', i)
      return re.sub('#([a-fA-F0-9]{2})', lambda mo: chr(int('0x' +, 0)), i)

    2. Hey and thanks for the reply.

      I was wondering if handling name obfsucation is something one only should do after you have parsed the document? I.e. a bruteforce method of scanning for "#", then check the two next bytes and convert, can cause trouble if the "match" you found wasn't a part of a feature name (/Page, /OpenAction etc).

  2. handles name obfuscation after parsing the document. The only reason being that once the decoding occurs, decoded tags might contain ending tags and mess up parsing the document later.