From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#15984: 24.3; Problem with combining characters in attachment filename Date: Fri, 29 Nov 2013 13:26:50 +0200 Message-ID: <83r49z78jp.fsf@gnu.org> References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT X-Trace: ger.gmane.org 1385724493 10819 80.91.229.3 (29 Nov 2013 11:28:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 29 Nov 2013 11:28:13 +0000 (UTC) Cc: 15984@debbugs.gnu.org To: nisse@lysator.liu.se (Niels =?UTF-8?Q?M=C3=B6ller?=) Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Nov 29 12:28:17 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VmMF6-0007jt-5M for geb-bug-gnu-emacs@m.gmane.org; Fri, 29 Nov 2013 12:28:16 +0100 Original-Received: from localhost ([::1]:46882 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmMF5-0006ru-Oa for geb-bug-gnu-emacs@m.gmane.org; Fri, 29 Nov 2013 06:28:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43222) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmMEy-0006er-7b for bug-gnu-emacs@gnu.org; Fri, 29 Nov 2013 06:28:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VmMEt-0003Rv-2J for bug-gnu-emacs@gnu.org; Fri, 29 Nov 2013 06:28:08 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35814) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmMEs-0003Rq-Tf for bug-gnu-emacs@gnu.org; Fri, 29 Nov 2013 06:28:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1VmMEr-0002Ac-UF for bug-gnu-emacs@gnu.org; Fri, 29 Nov 2013 06:28:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 29 Nov 2013 11:28:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 15984 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 15984-submit@debbugs.gnu.org id=B15984.13857244328276 (code B ref 15984); Fri, 29 Nov 2013 11:28:01 +0000 Original-Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 11:27:12 +0000 Original-Received: from localhost ([127.0.0.1]:49831 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmME3-00029P-1w for submit@debbugs.gnu.org; Fri, 29 Nov 2013 06:27:11 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:37696) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmMDz-00028v-9Z for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 06:27:08 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MX000500VIGKD00@a-mtaout20.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 13:27:01 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX00059PVT061C0@a-mtaout20.012.net.il>; Fri, 29 Nov 2013 13:27:01 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:81090 Archived-At: > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 11:43:45 +0100 > > Eli Zaretskii writes: > > >> Good! I thought emacs used a simpler mapping character <-> a single > >> unicode value. > > > > Maybe I misunderstood you: what's the difference between those two > > alternatives? > > What I think is the right thing, is to allow a sequence of unicode > values, e.g., "A" + combining character, or "A" + any random sequence of > combining characters, intern this string, and treat this as a single > "character". That's not how Emacs represents and treats characters. The composition happens only at display time, and normalization, as it's currently implemented, happens when text is read into a buffer. Thereafter, each Unicode character is a single character, and there's no combining of them for any purpose except display. > The idea is that this character object should correspond to what the > user thinks of as a single character. E.g, one glyph per character, and > treated as a unit by forward-char, and regexp matching with "." and > character sets. What gets displayed as a single unit is a "grapheme cluster", not a single glyph. Whether a grapheme cluster that corresponds to "A" + any random sequence of combining characters maps to a single glyph depends on the font being used, which is something the user should not need to worry about. However, we do want to give the user a way to delete only one or more of the combining characters, so forcing the entire combination to be a single indivisible entity would not be TRT for users. Cursor motion does consider the entire thing as a single entity and moves across all of it, but that requires special code. IOW, things are not that simple, and I think the design you are suggesting is problematic in that it will remove several important features, or make them harder to implement. > When reading text files, the character boundaries may be configurble. The important question is what to do by default, as many users will not be happy if asked too many questions or requested to specify too many parameters for reading text. Compare this with the need to specify the encoding in too many cases in the early days of multilingual Emacs -- there was a user outcry about that. > E.g, there could be a mode which makes each and every unicode value a > single character, which will then be displayed as separate glyphs, > separate characters for regexp matching, etc. You are mixing display issues with editing issues and with how characters are represented internally in an Emacs buffer. These all are separate, and do not necessarily need to handle characters in the same rigid way. > Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*). > > Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to > "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer, > press G d to create a directory group, enter ~/tmp/mail. You should now > be able to enter that group, and select the message in the *Summary* > buffer. > > To mimic my setup, do this in an xterm running in a latin-1 locale. (I > have to send this off now, I'll try later to really see if this recipe > reproduces the problem for me). Thanks, I will try that.