Combining PDF files

I frequently need to combine several PDF files into one large PDF so I don’t have to send a mess of small files via email. Though I have accomplished this task before without much of a problem by issuing the following command to Ghostscript, I decided that my usual method is inefficient:

gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=combinedpdf.pdf -dBATCH 1.pdf 2.pdf 3.pdf

Later on I found a terminal-based application called PDFtk that allowed for a more easily remembered command:

pdftk PART1.pdf PART2.pdf PART3.pdf cat output COMBINED.pdf

where one simply replaces the capitalised portions with the appropriate names of the input and output PDF files. However, to use that utility within Gentoo, one has to compile sys-devel/gcc with the gcj USE flag enabled. That USE flag builds GCC with support for the Java Programming Language. While this was not necessarily a big dependency, I didn’t feel like recompiling GCC with support for Java in order to use a terminal utility. Instead, I wanted to use a lightweight GTK GUI application that would allow me to do some basic PDF tasks, and I did so with PDFshuffler.

This application is incredibly minimalistic, easy to use, and it accomplishes a few PDF tasks very nicely. Lifted directly from the project’s SF page, “PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.” I have not used it to rotate any documents, but I have found that it allows me to take care of the other tasks quite effectively and efficiently. Even better, it really doesn’t have many dependencies not already on my system.

I have filed a stablisation request (STABLEREQ) for this application as well as its two explicitly-listed dependencies (python-poppler and pyPDF) in the Gentoo bugzilla. If you use this application or either of its dependencies, please comment on your experiences, especially regarding runtime stability.

I hope that some of you find this application to be as helpful as I have. 🙂

Take care for now,
Zach

13 comments

Skip to comment form

    • Tomas on Thursday, 3 December 2009 at 11:42
    • Reply

    There is also pspdftool on sourceforge. It should have minimal dependencies as well.

    • Zach on Wednesday, 2 December 2009 at 21:25
    • Reply

    @Michael,

    I have not, but I would be interested in seeing it in action. I am trying to find a list of the dependencies to see whether or not it will pull in many components of GNOME. I like my lightweight Openbox setup. 😉

    Thank you for the recommendation!

    • Michael on Wednesday, 2 December 2009 at 20:29
    • Reply

    Have you tried pdfmod (http://live.gnome.org/PdfMod)? It is available in suka overlay.

    • Zach on Wednesday, 2 December 2009 at 16:12
    • Reply

    @Andre,

    PDFjam is another good choice, and I believe it uses LaTeX if I’m not mistaken. Thanks for mentioning it.

    @Toralf,

    I mentioned a strange error in the bug report, but I didn’t have those ones in particular. You may want to add them to the report:

    http://bugs.gentoo.org/show_bug.cgi?id=295393

    @jkt,

    Yup, pdfjoin (as part of the pdfjam package will do the trick as well). Thanks for bringing it to my attention.

    @Karl,

    PDFsam is also written in Java, and that doesn’t work for my particular needs, but thank you for mentioning it here so that others may readily find it.

    @luke123,

    Thanks for the shorter, more efficient command using pdftk. That will work nicely if the PDF files are all numbered accordingly.

    • luke123 on Wednesday, 2 December 2009 at 15:21
    • Reply

    pdftk PART[1-3].pdf cat output COMBINED.pdf

    • Karl E. Brunk on Wednesday, 2 December 2009 at 13:22
    • Reply

    Have a look at pdfsam. It works great.

    • jkt on Wednesday, 2 December 2009 at 12:03
    • Reply

    What about pdfjoin from app-text/pdfjam?

    • Toralf Förster on Wednesday, 2 December 2009 at 10:34
    • Reply

    nice tool, works fine here too at x86, althought some more warnings were shown :

    tfoerste@n22 /mnt/E/my/kochen $ pdfshuffler
    /usr/lib/python2.6/site-packages/pyPdf/pdf.py:52: DeprecationWarning: the sets module is deprecated
    from sets import ImmutableSet
    /usr/lib/python2.6/site-packages/pyPdf/generic.py:406: DeprecationWarning: object.__init__() takes no parameters
    str.__init__(self, data)
    /usr/lib/python2.6/site-packages/pyPdf/generic.py:216: DeprecationWarning: object.__init__() takes no parameters
    int.__init__(self, value)
    (u’exporting to:’, ‘/mnt/E/my/kochen/sdf.pdf’)
    /usr/lib/python2.6/site-packages/pyPdf/pdf.py:163: DeprecationWarning: the md5 module is deprecated; use hashlib instead
    import struct, md5

    • Andre on Wednesday, 2 December 2009 at 09:45
    • Reply

    There’s also pdfjam for the task,
    and I am very happy with it.

    • Zach on Wednesday, 2 December 2009 at 07:32
    • Reply

    Excellent! While PDFtk didn’t work for my needs, I hope it meets yours. Take care.

    –Zach

  1. Thanks for the report. I will have a look on pdftk.

    • Zach on Wednesday, 2 December 2009 at 04:51
    • Reply

    No problem at all. I hope you find the application useful! 🙂

    –Zach

  2. Thanks for the suggestion.

    I found gcj dependency for PDFtk cumbersome too.

    I will definitely check it out.

    Kamil.

Leave a Reply to David Cancel reply

Your email address will not be published.