Somebody please package the html into a pdf!

etiam · on April 7, 2016

Hacky solution: On a suitably equipped unix-like system one might...

  mkdir dlbook;cd dlbook;wget --recursive --level=1 http://www.deeplearningbook.org/

  cd www.deeplearningbook.org/contents

  python
  import pdfkit
  pdfkit.from_file(
  ["TOC.html","acknowledgements.html","notation.html","intro.html","part_basics.html","linear_algebra.html","prob.html","numerical.html","ml.html","part_practical.html","mlp.html","regularization.html","optimization.html","convnets.html","rnn.html","guidelines.html","applications.html","part_research.html","linear_factors.html","autoencoders.html","representation.html","graphical_models.html","monte_carlo.html","partition.html","inference.html","generative_models.html","bib.html","index-.html"], 
  "Goodfellow-et-al-2016-Book.pdf")

Better solutions?

edit: changed ".pdf" from a slightly longer approach to ".html" which actually exists in this workflow. Thanks @TheCabin!

edit(2): ... and gave a valid path to this listdir. Check before I post... check before I post...

edit(3): ... and removed the os.listdir line no longer needed in this approach. Gosh. Just ignore what I was saying and build your own approach. That'll probably be faster at this rate.

edit(4): Don't import os without using it.

yorwba · on April 7, 2016

Just found this gist: https://gist.github.com/luoyetx/a44eea84272123f608dcf737588c...

Avoids the awkward file system structure by using pdfkit.from_url, but creates one .pdf for each chapter. I tried using a list of urls, but pdfkit failed because my version of wkhtmltopdf did not accept multiple input files.

Edit: pdfkit.from_file also fails on my system when passing multiple files. If that works for you, multiple urls are probably fine, too.

TheCabin · on April 7, 2016

This isn't working, is it? Where is the script supposed to get the files ..., linear_algebra.pdf,... from?

etiam · on April 7, 2016

Absolutely right. That's erroneously copied from converting each HTML file separately with the intention to merge afterwards. Thank you for pointing that out!

ninov · on April 7, 2016

Only returns a blank pdf page for me...

TheCabin · on April 7, 2016

In my case too. I guess it is related to the wkhtmltopdf version and the woff fonts.

etiam · on April 7, 2016

For what it's worth I got pretty decent conversion results of one file at a time with the Ubuntu repository version (0.9.9) But that version doesn't work with collecting it into a single output file, no.

I think it was a mistake to try shaving off a couple of lines and a step at the price of a more convoluted install and a brittle process. Most people would probably be best off just converting one HTML file at a time to pdf (e.g. pdfkit.from_file(filename_in_html,filename_in_html[:-4]+".pdf") in some sort of iteration over the file names) and then concatenating the resulting PDF:s, for instance from command line by

  pdftk TOC.pdf acknowledgements.pdf notation.pdf intro.pdf part_basics.pdf linear_algebra.pdf prob.pdf numerical.pdf ml.pdf part_practical.pdf mlp.pdf regularization.pdf optimization.pdf convnets.pdf rnn.pdf guidelines.pdf applications.pdf part_research.pdf linear_factors.pdf autoencoders.pdf representation.pdf graphical_models.pdf monte_carlo.pdf partition.pdf inference.pdf generative_models.pdf bib.pdf index-.pdf cat output Goodfellow-et-al-2016-Book.pdf

Sorry if I contributed to wasting your time.

bootload · on April 7, 2016

"Can I get a PDF of this book? No, our contract with MIT Press forbids distribution of too easily copied electronic formats of the book."

dandermotj · on April 7, 2016

Look, I'm not planning on printing the whole thing, binding it and sticking it on my shelf. I want the pdf because then I can use it when I'm offline, across devices and search it easily.

If I want this book in hard copy, then I will purchase it - I've done this regularly with free digital books - but when it is offered free digitally then in my opinion prohibiting to only certain file formats is futile (as evidenced here), and such constraints are ineffective attempts to encourage people to buy the hard copy through inconvenience.

And I must add that this is no slight to the authors, whom have my greatest appreciation for compiling their vast knowledge into a book and offering it for free. These guys are legends.

bootload · on April 8, 2016

@dandermotj I understand that, just included the reason why you cannot get it. The online book really sucks. I turned the styling off.

cced · on April 7, 2016

This type of thing is what gets people thinking about a way around the no-pdf solution.

thinkMOAR · on April 7, 2016

What does the contract say about 3rd parties wrapping it in PDF?

whatok · on April 7, 2016

I was able to print to PDF in Chrome and combined the PDFs in Acrobat.

TheCabin · on April 7, 2016

I guess this could be automated. For instance, you could download all html files using a plugin like "Download Them All" with a renaming mask like "inum-nameinum.ext" and then try:

find . -iname '*.html' -exec wkhtmltopdf {} {}.pdf \;

There are also tools to convert the resulting files to a single pdf. The only problem I got is, that the woff fonts are not rendered by "wkhtmltopdf" :-/ Ideas?

make3 · on April 7, 2016

you mean, like people have proposed multiple times already in this thread (with actual working scripts)

TheCabin · on April 8, 2016

Who did before? Also, this solution would work with a newer release of wkhtmltopdf.

ing33k · on April 7, 2016

in google chrome you can do a control + p ( print ) and save as pdf

plingamp · on April 7, 2016

Saving as PDF was not working on OSX Chrome. However, using Safari with "show no footers" option, generated a perfect PDF.

newman314 · on April 7, 2016

Literally, the first link is a link to a pdf version (granted it's probably not quite intuitive).

For the lazy: http://www.deeplearningbook.org/front_matter.pdf

Personally, I was looking for an ePub version but no biggie.

heinrichf · on April 7, 2016

A pdf of the front matter.

jjawssd · on April 7, 2016