From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Bernd Paysan via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#37633: Column part interpreted wrong in compilation mode Date: Sun, 06 Oct 2019 21:02:14 +0200 Message-ID: <7240153.3ZlepMpCQE@daiyu> References: <2282407.NbK4RY0fEn@daiyu> <20191006123112.ej2heyy2qudfcvep@a4.complang.tuwien.ac.at> <831rvp3glu.fsf@gnu.org> Reply-To: Bernd Paysan Mime-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart21515511.aIRQMYrt1L"; micalg="pgp-sha256"; protocol="application/pgp-signature" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="88916"; mail-complaints-to="usenet@blaine.gmane.org" Cc: anton@mips.complang.tuwien.ac.at, 37633@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Oct 06 21:04:55 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iHBpu-000Mzx-CT for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Oct 2019 21:04:54 +0200 Original-Received: from localhost ([::1]:36428 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iHBps-0000bo-Bp for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Oct 2019 15:04:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50763) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iHBo7-0007ay-O1 for bug-gnu-emacs@gnu.org; Sun, 06 Oct 2019 15:03:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iHBo6-0001sa-Bo for bug-gnu-emacs@gnu.org; Sun, 06 Oct 2019 15:03:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:38059) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iHBo6-0001sT-7y for bug-gnu-emacs@gnu.org; Sun, 06 Oct 2019 15:03:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iHBo6-0002CA-3V for bug-gnu-emacs@gnu.org; Sun, 06 Oct 2019 15:03:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Bernd Paysan Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 06 Oct 2019 19:03:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 37633 X-GNU-PR-Package: emacs Original-Received: via spool by 37633-submit@debbugs.gnu.org id=B37633.15703885428378 (code B ref 37633); Sun, 06 Oct 2019 19:03:02 +0000 Original-Received: (at 37633) by debbugs.gnu.org; 6 Oct 2019 19:02:22 +0000 Original-Received: from localhost ([127.0.0.1]:46880 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iHBnS-0002B3-GT for submit@debbugs.gnu.org; Sun, 06 Oct 2019 15:02:22 -0400 Original-Received: from mail.net2o.de ([185.183.156.191]:50400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iHBnQ-0002At-56 for 37633@debbugs.gnu.org; Sun, 06 Oct 2019 15:02:21 -0400 Original-Received: from daiyu.localnet (200116b826959f009a939674d530470e.dip.versatel-1u1.de [IPv6:2001:16b8:2695:9f00:9a93:9674:d530:470e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by mail.net2o.de (Postfix) with ESMTPSA id B8AE3400A4; Sun, 6 Oct 2019 21:02:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=net2o.de; s=mail; t=1570388538; bh=vVGEoOrkvRsAGdG+krJmx2C/NCPvh3n/f5oetfLR6Co=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RqiX2G91JQC8ZaZA0t2TnpfbB92tlZBal7MStvxXv00ioIxtqc+XXYz0USD38lN1a o+RR64uYRE9tyu1jmcNWo8MTede1OdUyxv/y/27Zsseaca04UqwDgh6TERvBS77aBe yxhSqykIFmaRHqKCqyBMg5E3YyFeuU9SxhOjPqZs= In-Reply-To: <831rvp3glu.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:168452 Archived-At: --nextPart21515511.aIRQMYrt1L Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Am Sonntag, 6. Oktober 2019, 19:53:49 CEST schrieb Eli Zaretskii: > > Date: Sun, 6 Oct 2019 14:31:12 +0200 > > From: Anton Ertl > > Cc: bernd@net2o.de, 37633@debbugs.gnu.org, > > anton@mips.complang.tuwien.ac.at > >=20 > > On Sat, Oct 05, 2019 at 07:16:53PM +0300, Eli Zaretskii wrote: > > > For byte offsets in external text we have bufferpos-to-filepos, but > > > that requires us to know the encoding of the external text. We need > > > to find a reasonable way of getting that. Suggestions and patches > > > welcome. > >=20 > > It's the encoding that you assumed for the text when you loaded the > > file into the buffer. >=20 > I'm not sure this is correct. You are saying that the compiler counts > bytes in the original file, not in its output (which might be encoded > differently). Do we have conclusive evidence that this is always > true? Almost always. gcc has a gazillion of options almost nobody uses. E.g., you can use -finput-encoding=3D to transcode input files on= =20 reading. It's a not well tested option, as the output (still iso8859-1)=20 shows: % gcc -finput-charset=3Diso8859-1 test-iso.c test-iso.c: In function =E2=80=98foo=E2=80=99: test-iso.c:2:2: warning: implicit declaration of function =E2=80=98printf= =E2=80=99 [- Wimplicit-function-declaration] 2 | printf("test %i", b); | ^~~~~~ test-iso.c:2:2: warning: incompatible implicit declaration of built-in=20 function =E2=80=98printf=E2=80=99 test-iso.c:1:1: note: include =E2=80=98=E2=80=99 or provide a decl= aration of =E2=80=98printf=E2=80=99 +++ |+#include 1 | void foo() { test-iso.c:2:20: error: =E2=80=98b=E2=80=99 undeclared (first use in this f= unction) 2 | printf("test %i", b); | ^ test-iso.c:2:20: note: each undeclared identifier is reported only once for= =20 each function it appears in test-iso.c:3:26: error: =E2=80=98c=E2=80=99 undeclared (first use in this f= unction) 3 | printf("test=EF=BF=BD=EF=BF=BD=EF=BF=BD %i", c); | ^ Here, due to the conversion on read in, the position reported is different = (it=20 was 3:23 before). This transparent conversion on reading is used rarely. Or rather: There is= no=20 search result in the entire github database. > > the byte position does not depend on the encoding (unlike the > > character position). >=20 > ??? The same Latin-1 characters encoded in ISO-8859-1 and in UTF-8 > will yield a different number of bytes. So I don't think I understand > how can you say the above. What I'm trying to tell: The compiler (unless instructed to convert the fil= e=20 on reading) reports the byte position it found in the file. That's the sam= e=20 byte position the editor calculates for that file =E2=80=94 and that is reg= ardless of=20 what the editor assumed as encoding. I.e. if the editor mistook a UTF-8 fi= le=20 for an iso8859-1, it will see an UTF-8 string "=C3=A4=C3=B6=C3=BC" (6 bytes= UTF-8) as=20 "=C3=83=C2=A4=C3=83=C2=B6=C3=83=C2=BC" (6 bytes iso8859-1). But it's still= 6 bytes. =2D-=20 Bernd Paysan "If you want it done right, you have to do it yourself" net2o id: kQusJzA;7*?t=3Duy@X}1GWr!+0qqp_Cn176t4(dQ* https://net2o.de/ --nextPart21515511.aIRQMYrt1L Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERJ1NDqPQRwYnwBjr9y2Uk5MtoGcFAl2aOjYACgkQ9y2Uk5Mt oGdS0BAAjNc3rQJMAUAILGZDebL4QBrNjAJaOUHEiKuBmBC5M3a5Jg6YdJKK/1/T 3gO+frn+eaU9m7w8dlsRwBILhUyEm92zUfasgsiC/JvjlOY3aT48GmR4munbWk0T 1uTzTxSa/8EeG++3HweBJ8NYIuvNvDxxKgtpxlZXwSnBfqxP8SR2X7f7nkA1/JA+ NgMKqQphy7+YuIrri2zXwx9RIy9UJMxT3r7jFHz2inz23WRy6Ol/svMObe816CZo UVLtP56YYobZ32iLFCWOjlHpS2iM/hMZ0dsteUO59XFJ/eE0w/5XloZqELOHhiPy M9ucuPNDwpU8Jh8ZQAmLXPnPz++5fPaU9DuTRQMT311fsBVBYCHhyrZXCx9iORKv tr1RuLlJbOGHWkzDEKYjQaJOwPVT9pvzL4u++1oDIJhZzUy3cd0+MrczKoghQHX0 7lKGqPOjxFno7ABUShq/5DA/h5shm8kxJpG9/0GsxQiD7YHbF9ep2Usphbtu1bjw sik4cgEQeU1YGVtM5n1WG2RCx+YSCWV7kWE/7gWgtw6jGLbq7UbyaGsXFszZML2E N5lLvzdxophhX5YMjeO9f7XaIyqcpUG+ljmjSGXxJsoK/d6sjgmbw0cGSg+ptZ7p Q6L+DItB049QvVGi9J7BOScchAvUeKlKbOKMhYhygY2hDZXXShI= =DtR9 -----END PGP SIGNATURE----- --nextPart21515511.aIRQMYrt1L--