From mboxrd@z Thu Jan 1 00:00:00 1970 From: Danny Milosavljevic Subject: bug#27563: [PATCH v3 2/2] gnu: ghostscript: Write document ID only when encrypting. Date: Fri, 7 Jul 2017 15:21:49 +0200 Message-ID: <20170707152149.3235f3aa@scratchpost.org> References: <20170703200844.3f6d9e19@scratchpost.org> <20170706103216.25939-1-dannym@scratchpost.org> <20170706103216.25939-3-dannym@scratchpost.org> <87podca20z.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56130) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dTTCy-0007Wm-11 for bug-guix@gnu.org; Fri, 07 Jul 2017 09:22:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dTTCs-0007R9-9I for bug-guix@gnu.org; Fri, 07 Jul 2017 09:22:08 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:53183) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dTTCs-0007Qk-5K for bug-guix@gnu.org; Fri, 07 Jul 2017 09:22:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dTTCs-0008LR-11 for bug-guix@gnu.org; Fri, 07 Jul 2017 09:22:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87podca20z.fsf@gnu.org> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 27563@debbugs.gnu.org Hi Ludo, On Fri, 07 Jul 2017 14:02:04 +0200 ludo@gnu.org (Ludovic Court=C3=A8s) wrote: > Also, do you know whether the PDF specs are OK with that? =20 Yeah, at the upstream bug link we discussed that (somewhat). While they don't want to carry t= he patches (because they don't want to lose functionality) they explained t= hat it might well be that *future* versions of the spec could make ID and U= UID mandatory. Right now there's a stringent spec, called PDF/A (for "archiving"; which is= intended for governing bodies where you don't want existing documents that= dynamically alter their contents after some time - like with Javascript or= something) which already sets the instance UUID to "". So I just set it t= o "" always rather than just for PDF/A. Also, as far as I understand the "/ID" is currently only mandatory when enc= rypting, although in the future it might change. That leaves the document UUID - and upstream, in some of the other bugrepor= ts, explained that they want UNIQUE document UUIDs. So I figured that we s= hould just leave it off - so it's not the same over multiple documents. Th= ey are definitely not fine with non-unique UUIDs. This RDF metadata stuff (the instance UUID and document UUID) is quite new.= In a former life I wrote PDF parsers and I didn't handle the RDF back the= n at all. So I guess it would even work to leave the entire RDF metadata o= ff - after all, it worked back then. If someone is well-versed in XMP RDF metadata for PDF, I wonder what is bet= ter: leaving the entire RDF off or just leaving the element containing the = document id (as an attribute) off. Currently, the patch does the latter. = The specification by adobe (XMP Specification Part 1, ISO 16684-1:2011(E) A= nnex A) says "The use of robust GUIDs is encouraged; having globally unique= values is important" but as far as I can see doesn't say whether they are = mandatory. I also thought of patching groff instead. But it seems that groff is now s= earching for a maintainer - I'm not sure anyone would integrate it there. = Also, I'm not well-versed in perl. Also, patching finished PDFs (using reg= exps or something) is kinda dangerous because nobody *forces* you to encode= the streams (think: attachements) in PDFs. So it could be that some other= non-PDF thing is integrated into the PDF as a stream and the regexp substi= tuter would just substitute it in there as well. There's a program "pdfmark" which is supposed to be for changing the metada= ta for PDFs but upstream said that it can't change those fields. It could = change the CreationDate, ModDate etc. In short, I think the lowest risk is patching ghostscript as we did here.