unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Kenichi Handa <handa@gnu.org>
Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Sat, 30 Nov 2013 15:20:13 +0200	[thread overview]
Message-ID: <83siue58mq.fsf@gnu.org> (raw)
In-Reply-To: <83a9gn6v2f.fsf@gnu.org>

> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 15984@debbugs.gnu.org
> 
> > From: nisse@lysator.liu.se (Niels Möller)
> > Cc: 15984@debbugs.gnu.org
> > Date: Fri, 29 Nov 2013 13:41:01 +0100
> > 
> >   $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el
> > 
> > where bug.el contains
> > 
> >   (setq gnus-init-file nil)
> >   (setq gnus-nntp-server nil)
> >   (gnus-no-server)
> > 
> > Then create the group with G d, pointing out the spool-like directory,
> > enter the group (RET), view the message (RET), try to write out the
> > attachment ("o" on the attachment button). Still crashes for me.
> 
> It crashes in the current development trunk as well, but only if the
> locale is set to Latin-1, like yours.
> 
> I'm looking at this.

There's something strange going on here; I'm CC'ing Handa-san, because
the problem is related to processing character compositions on a TTY.

The reason for the crash is simple: the following code from
indent.c:scan_for_column

      /* Check composition sequence.  */
      if (cmp_it.id >= 0
	  || (scan == cmp_it.stop_pos
	      && composition_reseat_it (&cmp_it, scan, scan_byte, end,
					w, NULL, Qnil)))
	composition_update_it (&cmp_it, scan, scan_byte, Qnil);
      if (cmp_it.id >= 0)
	{
	  scan += cmp_it.nchars;
	  scan_byte += cmp_it.nbytes;
	  if (scan <= end)
	    col += cmp_it.width;
	  if (cmp_it.to == cmp_it.nglyphs)
	    {
	      cmp_it.id = -1;
	      composition_compute_stop_pos (&cmp_it, scan, scan_byte, end,
					    Qnil);
	    }
	  else
	    cmp_it.from = cmp_it.to;
	  continue;
	}

incorrectly steps into the middle of a multibyte sequence #xCC #x88
for the character u+0308, the Combining Diaeresis, because
cmp_it.nbytes is computed as 1 instead of 2.  The question is why it
does so.

From stepping through composition_reseat_it and composition_update_it,
it looks like the code contradicts itself: it thinks that 'a' and the
combining diaeresis should be composed, but then acts as if no
composition should happen.  As result, this code in
composition_update_it:

      glyph = LGSTRING_GLYPH (gstring, cmp_it->from);
      cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from;
      cmp_it->nbytes = 0;
      cmp_it->width = 0;
      for (i = cmp_it->nchars - 1; i >= 0; i--)
	{
	  c = XINT (LGSTRING_CHAR (gstring, i));
	  cmp_it->nbytes += CHAR_BYTES (c);
	  cmp_it->width += CHAR_WIDTH (c);
	}

always considers only 'a', never the diaeresis, and so cmp_it->nbytes
is always computed as 1.  So scan_for_column advances only 1 byte,
instead of 2, and finds itself in the middle of a multibyte sequence.
From there, it's a sure way to a crash.

I hope Handa-san will be able to find the problem.  The crash is 100%
reproducible with the steps described above and a mail message that
Niels can send you off-list.

TIA





  reply	other threads:[~2013-11-30 13:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-28  8:08 bug#15984: 24.3; Problem with combining characters in attachment filename Niels Möller
2013-11-28 20:25 ` Eli Zaretskii
2013-11-28 22:17   ` Niels Möller
2013-11-28 22:46     ` Niels Möller
2013-11-29  7:16     ` Eli Zaretskii
2013-11-29  8:49       ` Niels Möller
2013-11-29  9:00         ` Eli Zaretskii
2013-11-29 10:43           ` Niels Möller
2013-11-29 11:26             ` Eli Zaretskii
2013-11-29 12:41               ` Niels Möller
2013-11-29 14:50                 ` Eli Zaretskii
2013-11-29 16:18                 ` Eli Zaretskii
2013-11-30 13:20                   ` Eli Zaretskii [this message]
2013-11-30 14:25                     ` Kenichi Handa
2013-11-30 16:09                       ` Eli Zaretskii
2013-11-30 15:50                     ` Niels Möller
2013-11-29 15:04             ` Stefan Monnier
2013-11-29 15:27               ` Eli Zaretskii
2013-11-30  8:53               ` Niels Möller
2013-11-29 13:11   ` Kenichi Handa
     [not found] ` <87eh574qmm.fsf@gnu.org>
2014-01-17 13:30   ` K. Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83siue58mq.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=15984@debbugs.gnu.org \
    --cc=handa@gnu.org \
    --cc=nisse@lysator.liu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).