From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Cournoyer Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Date: Thu, 25 Jan 2018 00:11:26 -0500 Message-ID: <87po5y8qv5.fsf@gmail.com> References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> <87o9lmp3c2.fsf@gnu.org> <87k1w9ryhz.fsf@gmail.com> <87o9lk64xz.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:55056) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eeZpW-0006GV-Nx for bug-guix@gnu.org; Thu, 25 Jan 2018 00:12:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eeZpS-0004ko-Ir for bug-guix@gnu.org; Thu, 25 Jan 2018 00:12:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:33723) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eeZpS-0004kk-EO for bug-guix@gnu.org; Thu, 25 Jan 2018 00:12:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eeZpS-000237-2w for bug-guix@gnu.org; Thu, 25 Jan 2018 00:12:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87o9lk64xz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 23 Jan 2018 15:11:04 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 30116@debbugs.gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Maxim Cournoyer skribis: > >> In the `patch-el-files' phase of the emacs-build-system, we find the >> following snippet: >> >> (with-directory-excursion el-dir >> ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still enc= oded >> ;; with the "ISO-8859-1" locale. >> (unless (false-if-exception (substitute-cmd)) >> (with-fluids ((%default-port-encoding "ISO-8859-1")) >> (substitute-cmd)))) >> >> In case an exception is returned while processing the file, it is >> retried being opened with the "ISO-8859-1" encoding. Or, this resolves >> to a call to `open-file', which documentation says: >> >> =E2=80=98b=E2=80=99 >> Use binary mode, ensuring that each byte in the file will be >> read as one Scheme character. >> >> To provide this property, the file will be opened with the >> 8-bit character encoding "ISO-8859-1", ignoring the default >> port encoding. *Note Ports::, for more information on port >> encodings. >> >> So, by opening an file whose encoding is unknown as a ISO-8859-1 file, >> we are doing the same as if we had passed the 'binary option. Could this >> explain why we end up with NUL characters where we were expecting text? > > That could be the reason. Guile provides a way to honor Emacs-style > =E2=80=98encoding=E2=80=99 declarations, and =E2=80=98call-with-input-fil= e=E2=80=99 does that if we pass > #:guess-encoding #t (info "(guile) Character Encoding of Source Files"). > > Did the faulty file have such a declaration? Sadly, it doesn't. Although even if it did, I don't think it would be very robust to expect every misbehaving files we might encounter to include one! So I think we should apply my v2 patch to core-updates for now (see my previous reply on this thread), until we have our substitute routine implemented using srfi-115! Thanks, Maxim