unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: 44173@debbugs.gnu.org
Subject: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Fri, 23 Oct 2020 16:19:42 +0300	[thread overview]
Message-ID: <834kmljd01.fsf@gnu.org> (raw)
In-Reply-To: <B013DC71-2966-47C8-8124-6D377C29736E@acm.org> (message from Mattias Engdegård on Fri, 23 Oct 2020 14:41:02 +0200)

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 23 Oct 2020 14:41:02 +0200
> Cc: 44173@debbugs.gnu.org
> 
> 23 okt. 2020 kl. 14.01 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > I'm okay with writing a GDB/MI parser, but I'm not sure I understand
> > how would that help to solve this particular conundrum.  AFAIR,
> > there's a genuine ambiguity there regarding non-ASCII characters
> > reported from GDB.
> 
> Would you mind explaining the ambiguity? Do you mean what coding system should be used for "\303\266" -- whether it should be interpreted as a string of those two bytes, the string "ö", the string "ö", or something else?

My memory is imperfect, but luckily I was wise enough to summarize the
problems in a comment to gdb-mi-decode, which you mentioned.  Let me
now quote it:

  ;; FIXME: This is fragile: it relies on the assumption that all the
  ;; non-ASCII strings output by GDB, including names of the source
  ;; files, values of string variables in the inferior, etc., are all
  ;; encoded in the same encoding.  It also assumes that the \nnn
  ;; sequences are not split between chunks of output of the GDB process
  ;; due to buffering, and arrive together.  Finally, if some string
  ;; included literal \nnn strings (as opposed to non-ASCII characters
  ;; converted by GDB/MI to octal escapes), this decoding will mangle
  ;; those strings.  When/if GDB acquires the ability to not
  ;; escape-protect non-ASCII characters in its MI output, this kludge
  ;; should be removed.

The basic ambiguity, AFAIR, is what is described last here: a string
reported bu GDB could include literal \nnn sequences, which are not
non-ASCII characters that GDB/MI converts to octal escapes.  The
information which was which is lost once we receive the GDB/MI output.

> This bug is not about the encoding; it's about not interpreting the string as "303266".

AFAIU, this bug's root cause is the way we solved the ambiguity, which
basically assumes one of the possible interpretations should be
preferred to another, because it is more popular/useful.

Let me turn the table and ask you how did you get that string you show
in the original report?  What kind of application were you debugging,
and what did that string mean in that application?

> >  Could you tell how will this be solved by a
> > different parser?
> 
> Again, I'm not sure what you mean. The bug arises because we feed incorrectly translated data into a JSON parser. If we parse the string ourselves instead of going via JSON, that particular problem goes away.

And what will then happen to non-ASCII strings and file names reported
by GDB?  How will our parser solve that?

> > P.S. Btw: gdb-mi.el already has a BNF parser for GDB/MI.
> 
> It doesn't parse the lower parts of the grammar -- 'result', 'value' and so on. JSON is used for that.

Do you intend to extend the existing parser or write a new one from
scratch?





  reply	other threads:[~2020-10-23 13:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-23 11:50 bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes Mattias Engdegård
2020-10-23 12:01 ` Eli Zaretskii
2020-10-23 12:41   ` Mattias Engdegård
2020-10-23 13:19     ` Eli Zaretskii [this message]
2020-10-23 14:21       ` Mattias Engdegård
2020-10-23 14:44         ` Eli Zaretskii
2020-10-23 17:31           ` Mattias Engdegård
2020-10-23 18:20             ` Eli Zaretskii
2020-10-24 16:21               ` Mattias Engdegård
2020-10-24 17:23                 ` Eli Zaretskii
2020-10-24 18:27                   ` Mattias Engdegård
2020-10-24 18:44                     ` Eli Zaretskii
2020-10-24 19:41                       ` Mattias Engdegård
2020-10-27 18:16                         ` Mattias Engdegård
2020-10-31  8:22                           ` Eli Zaretskii
2020-10-31 13:57                             ` Mattias Engdegård
2020-11-06 13:01                               ` Mattias Engdegård
2020-10-25 12:47                       ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=834kmljd01.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=44173@debbugs.gnu.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).