Monday, April 22, 2013

Update to Jsunpack PDF parsing


Hey guys, I just added a patch from David Dorsey of Visiblerisk, Inc. (Thanks David, you are a boss!).

Below is a sample PDF you can test with just to see how awesome it is:
http://jsunpack.jeek.org/?report=2afae1f7a9b2552f2e38713e47c3371cc8a2d23c

David described a lot of the improvements and the analysis he performed at the following blog posts entitled "Analyzing Malicious PDFs or: How I Learned to Stop Worrying and Love Adobe Reader"
Part 1: http://visiblerisk.com/blog/2013/4/8/analyzing-malicious-pdfs-or-how-i-learned-to-stop-worrying-a.html
Part 2: http://visiblerisk.com/blog/2013/4/15/analyzing-malicious-pdfs-or-how-i-learned-to-stop-worrying-a.html

In brief, this update improves pdf.py's XFA parsing, PDF encryption tags, and generally the update will help you to decode some malicious PDFs where jsunpackn.py had trouble decoding them before.

Thanks to David and please if you see any bugs related to this update please report them at https://code.google.com/p/jsunpack-n/issues/list and I'll fix them.

Blake

4 comments:

  1. I have a question about pdf.py. Does pdf.py handle obsfucated feature names? For instance /Page can be obfsucated with hex values like /P#61ge. A normal string search for /Page won't catch this for obvious reasons, but I'm wondering if pdf.py does it?

    ReplyDelete
    Replies
    1. Yes, pdf.py has a function to handle #XX values. It is defined as:
      @staticmethod
      def fixPound(i):
      #returns '#3a' substituted with ':', etc
      #strips newlines, '[', and ']' characters
      #this allows indexing in arrays

      i = re.sub('[\[\]\n]', '', i)
      i = re.sub('<<$', '', i)
      return re.sub('#([a-fA-F0-9]{2})', lambda mo: chr(int('0x' + mo.group(1), 0)), i)

      Delete
    2. Hey and thanks for the reply.

      I was wondering if handling name obfsucation is something one only should do after you have parsed the document? I.e. a bruteforce method of scanning for "#", then check the two next bytes and convert, can cause trouble if the "match" you found wasn't a part of a feature name (/Page, /OpenAction etc).

      Delete
  2. pdf.py handles name obfuscation after parsing the document. The only reason being that once the decoding occurs, decoded tags might contain ending tags and mess up parsing the document later.

    ReplyDelete