From: Eli Zaretskii <eliz@gnu.org>
To: Kenichi Handa <handa@gnu.org>
Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Sat, 30 Nov 2013 15:20:13 +0200 [thread overview]
Message-ID: <83siue58mq.fsf@gnu.org> (raw)
In-Reply-To: <83a9gn6v2f.fsf@gnu.org>
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 15984@debbugs.gnu.org
>
> > From: nisse@lysator.liu.se (Niels Möller)
> > Cc: 15984@debbugs.gnu.org
> > Date: Fri, 29 Nov 2013 13:41:01 +0100
> >
> > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el
> >
> > where bug.el contains
> >
> > (setq gnus-init-file nil)
> > (setq gnus-nntp-server nil)
> > (gnus-no-server)
> >
> > Then create the group with G d, pointing out the spool-like directory,
> > enter the group (RET), view the message (RET), try to write out the
> > attachment ("o" on the attachment button). Still crashes for me.
>
> It crashes in the current development trunk as well, but only if the
> locale is set to Latin-1, like yours.
>
> I'm looking at this.
There's something strange going on here; I'm CC'ing Handa-san, because
the problem is related to processing character compositions on a TTY.
The reason for the crash is simple: the following code from
indent.c:scan_for_column
/* Check composition sequence. */
if (cmp_it.id >= 0
|| (scan == cmp_it.stop_pos
&& composition_reseat_it (&cmp_it, scan, scan_byte, end,
w, NULL, Qnil)))
composition_update_it (&cmp_it, scan, scan_byte, Qnil);
if (cmp_it.id >= 0)
{
scan += cmp_it.nchars;
scan_byte += cmp_it.nbytes;
if (scan <= end)
col += cmp_it.width;
if (cmp_it.to == cmp_it.nglyphs)
{
cmp_it.id = -1;
composition_compute_stop_pos (&cmp_it, scan, scan_byte, end,
Qnil);
}
else
cmp_it.from = cmp_it.to;
continue;
}
incorrectly steps into the middle of a multibyte sequence #xCC #x88
for the character u+0308, the Combining Diaeresis, because
cmp_it.nbytes is computed as 1 instead of 2. The question is why it
does so.
From stepping through composition_reseat_it and composition_update_it,
it looks like the code contradicts itself: it thinks that 'a' and the
combining diaeresis should be composed, but then acts as if no
composition should happen. As result, this code in
composition_update_it:
glyph = LGSTRING_GLYPH (gstring, cmp_it->from);
cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from;
cmp_it->nbytes = 0;
cmp_it->width = 0;
for (i = cmp_it->nchars - 1; i >= 0; i--)
{
c = XINT (LGSTRING_CHAR (gstring, i));
cmp_it->nbytes += CHAR_BYTES (c);
cmp_it->width += CHAR_WIDTH (c);
}
always considers only 'a', never the diaeresis, and so cmp_it->nbytes
is always computed as 1. So scan_for_column advances only 1 byte,
instead of 2, and finds itself in the middle of a multibyte sequence.
From there, it's a sure way to a crash.
I hope Handa-san will be able to find the problem. The crash is 100%
reproducible with the steps described above and a mail message that
Niels can send you off-list.
TIA
next prev parent reply other threads:[~2013-11-30 13:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-28 8:08 bug#15984: 24.3; Problem with combining characters in attachment filename Niels Möller
2013-11-28 20:25 ` Eli Zaretskii
2013-11-28 22:17 ` Niels Möller
2013-11-28 22:46 ` Niels Möller
2013-11-29 7:16 ` Eli Zaretskii
2013-11-29 8:49 ` Niels Möller
2013-11-29 9:00 ` Eli Zaretskii
2013-11-29 10:43 ` Niels Möller
2013-11-29 11:26 ` Eli Zaretskii
2013-11-29 12:41 ` Niels Möller
2013-11-29 14:50 ` Eli Zaretskii
2013-11-29 16:18 ` Eli Zaretskii
2013-11-30 13:20 ` Eli Zaretskii [this message]
2013-11-30 14:25 ` Kenichi Handa
2013-11-30 16:09 ` Eli Zaretskii
2013-11-30 15:50 ` Niels Möller
2013-11-29 15:04 ` Stefan Monnier
2013-11-29 15:27 ` Eli Zaretskii
2013-11-30 8:53 ` Niels Möller
2013-11-29 13:11 ` Kenichi Handa
[not found] ` <87eh574qmm.fsf@gnu.org>
2014-01-17 13:30 ` K. Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83siue58mq.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=15984@debbugs.gnu.org \
--cc=handa@gnu.org \
--cc=nisse@lysator.liu.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).