From: Bernd Paysan via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: anton@mips.complang.tuwien.ac.at, 37633@debbugs.gnu.org
Subject: bug#37633: Column part interpreted wrong in compilation mode
Date: Sun, 06 Oct 2019 21:02:14 +0200 [thread overview]
Message-ID: <7240153.3ZlepMpCQE@daiyu> (raw)
In-Reply-To: <831rvp3glu.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]
Am Sonntag, 6. Oktober 2019, 19:53:49 CEST schrieb Eli Zaretskii:
> > Date: Sun, 6 Oct 2019 14:31:12 +0200
> > From: Anton Ertl <anton@mips.complang.tuwien.ac.at>
> > Cc: bernd@net2o.de, 37633@debbugs.gnu.org,
> > anton@mips.complang.tuwien.ac.at
> >
> > On Sat, Oct 05, 2019 at 07:16:53PM +0300, Eli Zaretskii wrote:
> > > For byte offsets in external text we have bufferpos-to-filepos, but
> > > that requires us to know the encoding of the external text. We need
> > > to find a reasonable way of getting that. Suggestions and patches
> > > welcome.
> >
> > It's the encoding that you assumed for the text when you loaded the
> > file into the buffer.
>
> I'm not sure this is correct. You are saying that the compiler counts
> bytes in the original file, not in its output (which might be encoded
> differently). Do we have conclusive evidence that this is always
> true?
Almost always. gcc has a gazillion of options almost nobody uses.
E.g., you can use -finput-encoding=<endoding> to transcode input files on
reading. It's a not well tested option, as the output (still iso8859-1)
shows:
% gcc -finput-charset=iso8859-1 test-iso.c
test-iso.c: In function ‘foo’:
test-iso.c:2:2: warning: implicit declaration of function ‘printf’ [-
Wimplicit-function-declaration]
2 | printf("test %i", b);
| ^~~~~~
test-iso.c:2:2: warning: incompatible implicit declaration of built-in
function ‘printf’
test-iso.c:1:1: note: include ‘<stdio.h>’ or provide a declaration of ‘printf’
+++ |+#include <stdio.h>
1 | void foo() {
test-iso.c:2:20: error: ‘b’ undeclared (first use in this function)
2 | printf("test %i", b);
| ^
test-iso.c:2:20: note: each undeclared identifier is reported only once for
each function it appears in
test-iso.c:3:26: error: ‘c’ undeclared (first use in this function)
3 | printf("test��� %i", c);
| ^
Here, due to the conversion on read in, the position reported is different (it
was 3:23 before).
This transparent conversion on reading is used rarely. Or rather: There is no
search result in the entire github database.
> > the byte position does not depend on the encoding (unlike the
> > character position).
>
> ??? The same Latin-1 characters encoded in ISO-8859-1 and in UTF-8
> will yield a different number of bytes. So I don't think I understand
> how can you say the above.
What I'm trying to tell: The compiler (unless instructed to convert the file
on reading) reports the byte position it found in the file. That's the same
byte position the editor calculates for that file — and that is regardless of
what the editor assumed as encoding. I.e. if the editor mistook a UTF-8 file
for an iso8859-1, it will see an UTF-8 string "äöü" (6 bytes UTF-8) as
"äöü" (6 bytes iso8859-1). But it's still 6 bytes.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=uy@X}1GWr!+0qqp_Cn176t4(dQ*
https://net2o.de/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-10-06 19:02 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-05 11:12 bug#37633: Column part interpreted wrong in compilation mode Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 16:08 ` Eli Zaretskii
2019-10-05 16:16 ` Eli Zaretskii
2019-10-05 17:05 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 18:53 ` Eli Zaretskii
2019-10-05 18:54 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 19:14 ` Eli Zaretskii
2019-10-05 19:24 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 17:16 ` Eli Zaretskii
2019-10-06 17:35 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 18:54 ` Eli Zaretskii
2019-10-06 19:16 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-05 17:34 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 12:31 ` Anton Ertl
2019-10-06 17:53 ` Eli Zaretskii
2019-10-06 19:02 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2019-10-06 19:16 ` Eli Zaretskii
2019-10-06 19:22 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-06 19:34 ` Eli Zaretskii
2019-10-06 19:35 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2019-10-07 7:09 ` Anton Ertl
2019-10-05 16:58 ` Bernd Paysan via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-04-23 13:36 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7240153.3ZlepMpCQE@daiyu \
--to=bug-gnu-emacs@gnu.org \
--cc=37633@debbugs.gnu.org \
--cc=anton@mips.complang.tuwien.ac.at \
--cc=bernd@net2o.de \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).