From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: David Malcolm Newsgroups: gmane.emacs.bugs Subject: bug#25987: 25.2; support gcc fixit notes Date: Sat, 14 Nov 2020 14:46:29 -0500 Message-ID: References: <87lgsj1jle.fsf@tromey.com> <1521218887.2913.237.camel@redhat.com> <83muz7pyde.fsf@gnu.org> <83o8lf9p68.fsf@gnu.org> <26f277bb345f10efe6340ac4074960905064fc97.camel@redhat.com> <83362i2nul.fsf@gnu.org> <8666386379d22239075d9237f00f40469c5be454.camel@redhat.com> <837drkopuf.fsf@gnu.org> <83mtzmznmw.fsf@gnu.org> <0b88a592c7611d740b9dfa4bd4d853d14264be8d.camel@redhat.com> <83tutsuihm.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22964"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) Cc: 25987@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Nov 14 20:47:28 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ke1WC-0005rz-Ai for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 14 Nov 2020 20:47:28 +0100 Original-Received: from localhost ([::1]:56940 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ke1WB-0006fE-CS for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 14 Nov 2020 14:47:27 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57372) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ke1Vm-0006f6-Dt for bug-gnu-emacs@gnu.org; Sat, 14 Nov 2020 14:47:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:40512) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ke1Vm-0004KT-3D for bug-gnu-emacs@gnu.org; Sat, 14 Nov 2020 14:47:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ke1Vm-0001kQ-1z for bug-gnu-emacs@gnu.org; Sat, 14 Nov 2020 14:47:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: David Malcolm Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 14 Nov 2020 19:47:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25987 X-GNU-PR-Package: emacs Original-Received: via spool by 25987-submit@debbugs.gnu.org id=B25987.16053832036693 (code B ref 25987); Sat, 14 Nov 2020 19:47:02 +0000 Original-Received: (at 25987) by debbugs.gnu.org; 14 Nov 2020 19:46:43 +0000 Original-Received: from localhost ([127.0.0.1]:52058 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ke1VS-0001jt-RC for submit@debbugs.gnu.org; Sat, 14 Nov 2020 14:46:43 -0500 Original-Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:55243) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ke1VQ-0001jk-0j for 25987@debbugs.gnu.org; Sat, 14 Nov 2020 14:46:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605383199; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9i8LcgO23nRhxiVyeyAQFXDB2sF0yUDGk63708hD/Hs=; b=LKJKpYOjcNdNiKKX6FQvLQSCjLTheQZNDggZUoBWz7hban/eedemHLdUHrvYgqJ4gLyaqN PGcU1m78iebVPCYW2vhLQlz9tpw+1AsbuF/bCW83JFXnxQVw7w6ByqcYBLN78QlyiXk1Wz /EbSfXmQUrphicqlBqPA0NnbV3Gcsrc= Original-Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-196-RXBQApD0Py2Vi7_0pSoC8A-1; Sat, 14 Nov 2020 14:46:31 -0500 X-MC-Unique: RXBQApD0Py2Vi7_0pSoC8A-1 Original-Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AE4BC5700A; Sat, 14 Nov 2020 19:46:30 +0000 (UTC) Original-Received: from ovpn-112-135.phx2.redhat.com (ovpn-112-135.phx2.redhat.com [10.3.112.135]) by smtp.corp.redhat.com (Postfix) with ESMTP id 46E2660C13; Sat, 14 Nov 2020 19:46:30 +0000 (UTC) In-Reply-To: <83tutsuihm.fsf@gnu.org> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dmalcolm@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:193326 Archived-At: On Sat, 2020-11-14 at 16:21 +0200, Eli Zaretskii wrote: > > From: David Malcolm > > Cc: 25987@debbugs.gnu.org > > Date: Fri, 13 Nov 2020 11:47:18 -0500 > > > > The names are identifiers from the user's program (names of > > variables, > > types, macros, etc), where an error has been issued, typically due > > to a > > misspelling of an identifier. For example, somewhere there's a > > declaration of a constant named "two_π", and later the code > > erroneously > > references it as "two_pi"; we want to emit a diagnostic saying: > > did you mean "two_π"? > > and provide a machine-readable fix-it hint suggesting the > > replacement > > of the pertinent source range with "two_π". > > > > GCC converts the source code from any encoding specified by > > -finput- > > charset= to use UTF-8 internally... > > > > https://gcc.gnu.org/onlinedocs/cpp/Character-sets.html > > And then GCC outputs these identifiers in UTF-8? Or does it convert > back to the original input-charset? It emits them as UTF-8 when emitting diagnostics. > > ...however there's a bug in GCC in how we print the source code > > itself, > > where we blithely emit the undecoded bytes directly to stderr when > > quoting the lines of source. This GCC bug is > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93067 (aka PR > > other/93067). We ought to encode the source code into UTF-8 when > > printing it (which may be a no-op for the common case). > > I'm not sure you are right here: I think it is better for GCC to use > the original bytestream, because the user's locale might not support > UTF-8 well; it is better to show the source to the user in the > encoding in which it was written. This seems to me to lead to a bigger question: what should the encoding of GCC's stderr be? Right now I believe we emit a mix of UTF-8 and other encodings, as noted in my earlier post. > However, I'm not familiar with GCC internals, so it is not clear to > me > whether the bug report will indeed affect the way source fragments > will be output: the bug report only talks about converting the input, > and I don't know enough to understand how will that affect output. > > > The annotation lines we print under the source lines for fix-it > > hints and labels are already printed in UTF-8, however. > > The annotations are in US English, though, right? If not, when will > they include non-ASCII characters? Annotation lines can contain labels as of GCC 9, and these can contain identifiers; for example in this C++ type mismatch error, where the types of the pertinent expressions are labeled: $ g++ t.cc t.cc: In function 'int test(const shape&, const shape&)': t.cc:15:4: error: no match for 'operator+' (operand types are 'boxed_value' and 'boxed_value') 14 | return (width(s1) * height(s1) | ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]> 15 | + width(s2) * height(s2)); | ^ ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]> where "boxed_value" is an identifier and in theory could have non-ASCII characters in it. > > That said, the above bug is orthogonal to the fix-it hint issue, > > which > > prints the names in a different way (using UTF-8 encoded strings in > > GCC's symbol table, rather than scraping them from the filesystem, > > which is how the buggy source-quoting routines work). > > [...] > > As far as I can tell GCC handles filenames as raw bytes, and > > doesn't > > make any attempt to decode them, and emits them as bytes again in > > diagnostic messages. > > This is okay, but since the other parts are in UTF-8, this will > complicate things, as I mentioned in my previous message. > > > > > I tried creating file with the name "byte 0xff" .txt, and with > > > > valid > > > > UTF-8 non- ascii names and emacs reported them as \377.txt and > > > > with > > > > the UTF-8 names respectively, so perhaps I should simply emit > > > > the > > > > bytes and pretend they are UTF-8? > > > > > > What do you mean by "pretend" in this context? > > > > By "pretend" I mean simply re-emitting the bytes of the filename to > > stderr and ignoring encoding issues in them, despite the fact that > > the > > rest of the stream is supposed to be UTF-8-encoded. > > As explained, it will be easier for Emacs to process GCC output if > its > encoding is consistent. Indeed. I'll raise this issue on the GCC mailing list. > > Currently the parseable-fixits option uses IS_PRINT on each "char" > > (i.e. byte) so that any non-printable bytes get octal-escaped. Is > > that > > acceptable for filenames? The other approach, to "pretend they're > > UTF- > > 8", would mean to not escape such bytes, so that if they are UTF-8 > > they > > are faithfully re-emitted. > > > > I think I like the approach where the filename part of the fixit > > line > > is octal-escaped, and the replacement text is UTF-8, but I don't > > know > > what's going to be best for you. > > Given your description, it sounds like it will not be simple whatever > you do. > > I guess we should first try getting the plain-ASCII case to work, as > that is the most frequent use case anyway. I added some test cases and posted the patch to the gcc-patches mailing list here: "[PATCH/RFC] Add GCC_EXTRA_DIAGNOSTIC_OUTPUT environment variable for fix-it hints" https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559105.html Thanks Dave