unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Bernd Paysan via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: anton@mips.complang.tuwien.ac.at, 37633@debbugs.gnu.org
Subject: bug#37633: Column part interpreted wrong in compilation mode
Date: Sun, 06 Oct 2019 21:02:14 +0200	[thread overview]
Message-ID: <7240153.3ZlepMpCQE@daiyu> (raw)
In-Reply-To: <831rvp3glu.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]

Am Sonntag, 6. Oktober 2019, 19:53:49 CEST schrieb Eli Zaretskii:
> > Date: Sun, 6 Oct 2019 14:31:12 +0200
> > From: Anton Ertl <anton@mips.complang.tuwien.ac.at>
> > Cc: bernd@net2o.de, 37633@debbugs.gnu.org,
> > anton@mips.complang.tuwien.ac.at
> > 
> > On Sat, Oct 05, 2019 at 07:16:53PM +0300, Eli Zaretskii wrote:
> > > For byte offsets in external text we have bufferpos-to-filepos, but
> > > that requires us to know the encoding of the external text.  We need
> > > to find a reasonable way of getting that.  Suggestions and patches
> > > welcome.
> > 
> > It's the encoding that you assumed for the text when you loaded the
> > file into the buffer.
> 
> I'm not sure this is correct.  You are saying that the compiler counts
> bytes in the original file, not in its output (which might be encoded
> differently).  Do we have conclusive evidence that this is always
> true?

Almost always.  gcc has a gazillion of options almost nobody uses.

E.g., you can use -finput-encoding=<endoding> to transcode input files on 
reading.  It's a not well tested option, as the output (still iso8859-1) 
shows:

% gcc -finput-charset=iso8859-1 test-iso.c
test-iso.c: In function ‘foo’:
test-iso.c:2:2: warning: implicit declaration of function ‘printf’ [-
Wimplicit-function-declaration]
    2 |  printf("test %i", b);
      |  ^~~~~~
test-iso.c:2:2: warning: incompatible implicit declaration of built-in 
function ‘printf’
test-iso.c:1:1: note: include ‘<stdio.h>’ or provide a declaration of ‘printf’
  +++ |+#include <stdio.h>
    1 | void foo() {
test-iso.c:2:20: error: ‘b’ undeclared (first use in this function)
    2 |  printf("test %i", b);
      |                    ^
test-iso.c:2:20: note: each undeclared identifier is reported only once for 
each function it appears in
test-iso.c:3:26: error: ‘c’ undeclared (first use in this function)
    3 |  printf("test��� %i", c);
      |                          ^

Here, due to the conversion on read in, the position reported is different (it 
was 3:23 before).

This transparent conversion on reading is used rarely.  Or rather: There is no 
search result in the entire github database.

> > the byte position does not depend on the encoding (unlike the
> > character position).
> 
> ??? The same Latin-1 characters encoded in ISO-8859-1 and in UTF-8
> will yield a different number of bytes.  So I don't think I understand
> how can you say the above.

What I'm trying to tell: The compiler (unless instructed to convert the file 
on reading) reports the byte position it found in the file.  That's the same 
byte position the editor calculates for that file — and that is regardless of 
what the editor assumed as encoding.  I.e. if the editor mistook a UTF-8 file 
for an iso8859-1, it will see an UTF-8 string "äöü" (6 bytes UTF-8) as 
"äöü" (6 bytes iso8859-1).  But it's still 6 bytes.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=uy@X}1GWr!+0qqp_Cn176t4(dQ*
https://net2o.de/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-10-06 19:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-05 11:12 bug#37633: Column part interpreted wrong in compilation mode Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 16:08 ` Eli Zaretskii
2019-10-05 16:16   ` Eli Zaretskii
2019-10-05 17:05     ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 18:53       ` Eli Zaretskii
2019-10-05 18:54         ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 19:14           ` Eli Zaretskii
2019-10-05 19:24             ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 17:16               ` Eli Zaretskii
2019-10-06 17:35                 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 18:54                   ` Eli Zaretskii
2019-10-06 19:16                     ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 17:34     ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 12:31     ` Anton Ertl
2019-10-06 17:53       ` Eli Zaretskii
2019-10-06 19:02         ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2019-10-06 19:16           ` Eli Zaretskii
2019-10-06 19:22             ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 19:34               ` Eli Zaretskii
2019-10-06 19:35                 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-07  7:09         ` Anton Ertl
2019-10-05 16:58   ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-04-23 13:36 ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7240153.3ZlepMpCQE@daiyu \
    --to=bug-gnu-emacs@gnu.org \
    --cc=37633@debbugs.gnu.org \
    --cc=anton@mips.complang.tuwien.ac.at \
    --cc=bernd@net2o.de \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).