unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: nisse@lysator.liu.se (Niels Möller)
To: Eli Zaretskii <eliz@gnu.org>
Cc: 15984@debbugs.gnu.org
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Fri, 29 Nov 2013 13:41:01 +0100	[thread overview]
Message-ID: <nnsiufmlcy.fsf@bacon.lysator.liu.se> (raw)
In-Reply-To: <83r49z78jp.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Nov 2013 13:26:50 +0200")

Eli Zaretskii <eliz@gnu.org> writes:

> However, we do want to give the user a way to
> delete only one or more of the combining characters, so forcing the
> entire combination to be a single indivisible entity would not be TRT
> for users.

Good question, how to handle this.

Today, to remove the dots from an "ä" character, I'll have to delete the
complete "ä" character and insert a new "a" character. Or similarly for
the reverse edit. I think this "atomic" handling is the desired
behaviour in many cases. And I don't think it should behave differently
depending on the representation of "ä" in the original file. But if you
have a complex sequence of unicode combining characters, I agree there's
some need to be able to edit it. Maybe put point on the character and
invoke edit-char to go in some special mode which explodes the usually
"atomic" character into smaller pieces.

And such a character edit mode might be useful for more things than
unicode composing characters, e.g, manipulationg the different sub-parts
of a chinese character. Anyway, this user interface is not intimately
tied to the internal character representation; its overall effect on the
buffer will be the same as replacing any substring.

>> When reading text files, the character boundaries may be configurble.
>
> The important question is what to do by default,

I'm pretty sure the default should be that a sequence of one unicode
base char and all following unicode combining chars is interned as a
single "emacs character". (I think the detailed rules for this are
spelled out in the unicode book). With some arbitrary limit to prevent a
GByte file with only unicode combining characters to get read as a
single emacs character; say at most 10 combining characters.

> You are mixing display issues with editing issues and with how
> characters are represented internally in an Emacs buffer.

I think it's confusing for users if the units of text which forward-char
skips over, do not correspond to the units matched by "." in
isearch-forward-regexp.

My suggested internal representation seems to be a natural way to get
this correspondence right, at the cost of some memory (or lots of
complexity in reducing memory usage). I'm sure there are other ways, and
maybe also a lot better ways, to implement the same thing.

> Thanks, I will try that.

Now I've also reproduced it on the same machine, without my normal Gnus
setup getting in the way. I start emacs with

  $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el

where bug.el contains

  (setq gnus-init-file nil)
  (setq gnus-nntp-server nil)
  (gnus-no-server)

Then create the group with G d, pointing out the spool-like directory,
enter the group (RET), view the message (RET), try to write out the
attachment ("o" on the attachment button). Still crashes for me.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.





  reply	other threads:[~2013-11-29 12:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-28  8:08 bug#15984: 24.3; Problem with combining characters in attachment filename Niels Möller
2013-11-28 20:25 ` Eli Zaretskii
2013-11-28 22:17   ` Niels Möller
2013-11-28 22:46     ` Niels Möller
2013-11-29  7:16     ` Eli Zaretskii
2013-11-29  8:49       ` Niels Möller
2013-11-29  9:00         ` Eli Zaretskii
2013-11-29 10:43           ` Niels Möller
2013-11-29 11:26             ` Eli Zaretskii
2013-11-29 12:41               ` Niels Möller [this message]
2013-11-29 14:50                 ` Eli Zaretskii
2013-11-29 16:18                 ` Eli Zaretskii
2013-11-30 13:20                   ` Eli Zaretskii
2013-11-30 14:25                     ` Kenichi Handa
2013-11-30 16:09                       ` Eli Zaretskii
2013-11-30 15:50                     ` Niels Möller
2013-11-29 15:04             ` Stefan Monnier
2013-11-29 15:27               ` Eli Zaretskii
2013-11-30  8:53               ` Niels Möller
2013-11-29 13:11   ` Kenichi Handa
     [not found] ` <87eh574qmm.fsf@gnu.org>
2014-01-17 13:30   ` K. Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nnsiufmlcy.fsf@bacon.lysator.liu.se \
    --to=nisse@lysator.liu.se \
    --cc=15984@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).