h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[h5md-user] [Provenance] Attaching metadata to PDF files


From: Peter Colberg
Subject: [h5md-user] [Provenance] Attaching metadata to PDF files
Date: Thu, 23 May 2013 17:19:40 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi all,

Sorry for the following off-topic post, but I have to share this tidbit.

Recently, I was looking for a way to attach metadata to a plot file in
PDF format, such as a list of files used to generate the plot, a set
of plot parameters, or even the plot script itself.

PDF has two different ways of storing metadata, in form of an info
dictionary with a predfined set of attributes, or additional streams
in XMP format. The former can be modified using the pdf backend of
matplotlib [1], but does not allow arbitrary fields. The latter,
free-form metadata is inaccessible from matplotlib.

But it turns out that PDF supports file attachments [2].

[1] 
http://matplotlib.org/api/backend_pdf_api.html#matplotlib.backends.backend_pdf.PdfPages.infodict
[2] http://blogs.adobe.com/insidepdf/2010/11/pdf-file-attachments.html

So I appended this small snippet to my plot script:

  path = tempfile.mkdtemp()
  fn = os.path.join(path, os.path.basename(args.output))
  fig.savefig(fn + ".pdf")
  f = h5.File(fn + ".h5", "w")
  f.attrs["input"] = args.input
  f.attrs["nbin"] = args.nbin
  f.attrs["range"] = args.range
  f.attrs["moment"] = (mean, var)
  f.attrs["temperature"] = T
  f.create_dataset("bins", data=bins)
  f.create_dataset("hist", data=hist)
  f.close()
  shutil.copy(__file__, fn + ".py")
  subprocess.check_call(["pdftk", fn + ".pdf", "attach_files", fn + ".h5", fn + 
".py", "output", args.output])
  shutil.rmtree(path)

Using the pdftk tool, the resulting PDF file contains not only the
plot itself, but also the numerical data, the input filenames, plot
parameters, and the source code. When needed, the metadata is later
extracted using `pdftk <filename> unpack_files'.

Regards,
Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]