* bug#15984: 24.3; Problem with combining characters in attachment filename @ 2013-11-28 8:08 Niels Möller 2013-11-28 20:25 ` Eli Zaretskii [not found] ` <87eh574qmm.fsf@gnu.org> 0 siblings, 2 replies; 21+ messages in thread From: Niels Möller @ 2013-11-28 8:08 UTC (permalink / raw) To: 15984 I'm reading email with Gnus. I received an email with an attachment containing the headers Content-Type: application/pdf; name="Brev =?UTF-8?B?YWt0aWVhzIhnYXIgMTMxMTI3LnBkZg==?=" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0*=UTF-8''%42%72%65%76%20%61%6B%74%69%65%61%CC%88%67%61%72%20%31; filename*1*=%33%31%31%32%37%2E%70%64%66 Apparently sent by a Mac user, User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 The attachement was displayed in the *Article* buffer as [2. application/pdf; Brev aktiea?gar 131127.pdf]... I was running emacs-24.3 in a tty, in a latin-1 locale, on a sparc Solaris system. (In a latin-1 tty, emacs ought to display "ä" instead of "a?", but that's a less severe and possibly unrelated problem). SunOS bacon 5.10 Generic_147147-26 sun4u sparc SUNW,Sun-Fire-15000 When I tried to save the attachment by pressing "o" on that button (gnus-mime-save-part), emacs immediately crashed with a segmentation violation signal. Since emacs very rarely crashes, I was a bit surprised. I just restarted emacs and Gnus and tried again, and it crashed again. So at least for me, the problem is reproducible. And a crash triggered by untrusted data in a received email is always scary. After fixing the bug, exploit possibilities ought to be analyzed. The gdb backtrace, based on the generated core file, looks like this: (gdb) bt #0 0xfec4ebd4 in _lwp_kill () from /lib/libc.so.1 #1 0xfebe7bb8 in raise () from /lib/libc.so.1 #2 0x000e7f78 in terminate_due_to_signal () #3 0x00103d04 in handle_fatal_signal () #4 0x001037d0 in deliver_thread_signal () #5 0xfec4b014 in __sighndlr () from /lib/libc.so.1 #6 0xfec3f6c4 in call_user_handler () from /lib/libc.so.1 #7 <signal handler called> #8 0x000b5748 in char_table_ref () #9 0x001ad54c in composition_compute_stop_pos () #10 0x001266ec in scan_for_column () #11 0x00127328 in current_column () #12 0x00114cec in read_minibuf () #13 0x00115688 in Fread_from_minibuffer () #14 0x0015c538 in Ffuncall () #15 0x00190de0 in exec_byte_code () #16 0x0015c368 in Ffuncall () #17 0x001158a0 in Fcompleting_read () #18 0x0015c4e4 in Ffuncall () #19 0x00190de0 in exec_byte_code () #20 0x0015c368 in Ffuncall () #21 0x00190de0 in exec_byte_code () #22 0x0015c368 in Ffuncall () #23 0x00190de0 in exec_byte_code () #24 0x0015bf18 in funcall_lambda () #25 0x0015c368 in Ffuncall () #26 0x00190de0 in exec_byte_code () #27 0x0015bf18 in funcall_lambda () #28 0x0015c368 in Ffuncall () #29 0x0015cbf0 in apply1 () #30 0x001573b4 in Fcall_interactively () #31 0x0015c574 in Ffuncall () #32 0x0015c77c in call3 () #33 0x000f0ac0 in Fcommand_execute () #34 0x000f829c in command_loop_1 () #35 0x001591dc in internal_condition_case () #36 0x000ea2a0 in command_loop_2 () #37 0x001590c0 in internal_catch () #38 0x000ea11c in recursive_edit_1 () #39 0x000ea264 in Frecursive_edit () #40 0x000e9b28 in main () The emacs binary I use appear to have been stripped, so bt full gives no additional information, and xbacktrace fails with No symbol "CHECK_LISP_OBJECT_TYPE" in current context. If I decode the base-64 part of the Content-type "name" value, I get $ od -tx1c fname.txt 0000000 61 6b 74 69 65 61 cc 88 67 61 72 20 31 33 31 31 a k t i e a 314 210 g a r 1 3 1 1 0000020 32 37 2e 70 64 66 2 7 . p d f 0000026 So it appears to contain the character "ä" (a with two dots), coded as "a" followed by a unicode combining character. All in utf-8. If I run cat fname.txt in xterm with a utf-8 locale, it displays the string as "aktieägar 131127.pdf", which seems correct. I don't understand the meaning of the Content-disposition: header, but I guess it's possible that Content-type: ...; name= *is* processed correctly, and it's the code processing Content-disposition which crashes. But looking at the backtrace, it looks like the problem is related to handling of combining characters. Below is the info generated by report-emacs-bug, except that I deleted recent input and recent messages, since the problem was in the emacs process which crashed, not in this one where I'm composing this message. Environment should otherwise be identical (same emacs, same Gnus, same machine, same tty). Regards, /Niels In GNU Emacs 24.3.1 (sparc-sun-solaris2.10, X toolkit, Xaw scroll bars) of 2013-03-15 on stalhein Configured using: `configure '--prefix=/pkg/emacs/sparc-sol10/24.3' '--with-gif=no' '--with-jpeg=no' '--with-tiff=no' '--with-png=no' '--with-dbus=no' '--with-gsettings=no' '--with-gnutls=no' 'CC=gcc' 'CFLAGS=-O2 -mcpu=v9' 'LDFLAGS=-L/usr/local/lib -R/usr/local/lib' 'CPPFLAGS=-I/usr/local/include'' Important settings: value of $LC_COLLATE: C value of $LC_CTYPE: sv_SE.ISO8859-1 value of $LC_MESSAGES: C value of $LC_MONETARY: en_US.ISO8859-1 value of $LC_NUMERIC: en_US.ISO8859-1 value of $LC_TIME: en_US.ISO8859-1 locale-coding-system: iso-latin-1-unix default enable-multibyte-characters: t Major mode: Summary Minor modes in effect: type-break-mode: t tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t line-number-mode: t transient-mark-mode: t Recent input: [omitted] Recent messages: [omitted] Features: (shadow emacsbug help-mode sort ansi-color gnus-cite flow-fill mm-archive mail-extr gnus-async gnus-bcklg qp parse-time gnus-ml disp-table misearch multi-isearch gnus-topic byte-opt bytecomp byte-compile cconv nndraft nnmh nnml gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 epg-config mm-view mml-smime smime password-cache dig mailcap nntp gnus-cache gnus-sum nnoo gnus-group gnus-undo nnmail mail-source gnus-start gnus-spec gnus-int gnus-range message sendmail format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus gnus-ems nnheader gnus-util mail-utils mm-util mail-prsvr wid-edit bbdb-autoloads package cl-macs gv bookmark pp recurse cl time-date type-break uniquify advice help-fns cl-lib advice-preload info easymenu tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dynamic-setting x-toolkit x multi-tty emacs) -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-28 8:08 bug#15984: 24.3; Problem with combining characters in attachment filename Niels Möller @ 2013-11-28 20:25 ` Eli Zaretskii 2013-11-28 22:17 ` Niels Möller 2013-11-29 13:11 ` Kenichi Handa [not found] ` <87eh574qmm.fsf@gnu.org> 1 sibling, 2 replies; 21+ messages in thread From: Eli Zaretskii @ 2013-11-28 20:25 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Date: Thu, 28 Nov 2013 09:08:54 +0100 > > I'm reading email with Gnus. I received an email with an attachment > containing the headers > > Content-Type: application/pdf; > name="Brev =?UTF-8?B?YWt0aWVhzIhnYXIgMTMxMTI3LnBkZg==?=" > Content-Transfer-Encoding: base64 > Content-Disposition: attachment; > filename*0*=UTF-8''%42%72%65%76%20%61%6B%74%69%65%61%CC%88%67%61%72%20%31; > filename*1*=%33%31%31%32%37%2E%70%64%66 > > Apparently sent by a Mac user, > > User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 > > The attachement was displayed in the *Article* buffer as > > [2. application/pdf; Brev aktiea?gar 131127.pdf]... > > I was running emacs-24.3 in a tty, in a latin-1 locale, on a sparc > Solaris system. (In a latin-1 tty, emacs ought to display "ä" instead of > "a?", but that's a less severe and possibly unrelated problem). If ä was supposed to be produced by character compositions, then Emacs cannot do that on a TTY, because compositions require drawing one glyph over the other (with certain offsets). If you expected Emacs to perform normalization in this case, then I don't think we do this automatically (or at all). > When I tried to save the attachment by pressing "o" on that button > (gnus-mime-save-part), emacs immediately crashed with a segmentation > violation signal. Since emacs very rarely crashes, I was a bit > surprised. I just restarted emacs and Gnus and tried again, and it > crashed again. So at least for me, the problem is reproducible. Can you send that message as a binary attachment? > And a crash triggered by untrusted data in a received email is always > scary. After fixing the bug, exploit possibilities ought to be analyzed. I suggest to try a recent development trunk, several similar crashes were fixed a few months ago. If that doesn't help, please reproduce the problem in a non-optimized non-stripped build, and show the variables from char_table_ref that are involved in the crash. (I'm guessing char_table_ref got a bogus character code.) ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-28 20:25 ` Eli Zaretskii @ 2013-11-28 22:17 ` Niels Möller 2013-11-28 22:46 ` Niels Möller 2013-11-29 7:16 ` Eli Zaretskii 2013-11-29 13:11 ` Kenichi Handa 1 sibling, 2 replies; 21+ messages in thread From: Niels Möller @ 2013-11-28 22:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 Eli Zaretskii <eliz@gnu.org> writes: > If you expected Emacs to perform normalization in this case, then I > don't think we do this automatically (or at all). I think for display, normalizing is definitely the right thing to do (the unicode spec, as I understand it, require that a "compliant" implementation treats different ways to code "ä" equivalently). But I understand if emacs currenty doesn't do that. (Digression: I think text-processor supporting unicode really ought to represent "characters" as interned strings of unicode (or utf-8) code points. These characters can have relations such as "normalized to", and glyphs should usually be associated only with the normalized form. One could also have configurable rules for character boundaries, as is described in the unicode book, or at least was in the version which was current when I tried to read up on this some years ago). > Can you send that message as a binary attachment? It's not very sensitive (it's about shares and options for a company I used to be employed by), but I'd prefer it not to be posted publicly on the bugtracker, or widely distributed among emacs hackers. I'll try to send you a private mail with the bulk of the message with the body of the attachment replaced (the base64 text in the raw message; if the problem really is with the attachment headers, that shouldn't matter); if that's for some reason not usable, I'll send you the complete message. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-28 22:17 ` Niels Möller @ 2013-11-28 22:46 ` Niels Möller 2013-11-29 7:16 ` Eli Zaretskii 1 sibling, 0 replies; 21+ messages in thread From: Niels Möller @ 2013-11-28 22:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 [-- Attachment #1: Type: text/plain, Size: 935 bytes --] nisse@lysator.liu.se (Niels Möller) writes: >> Can you send that message as a binary attachment? > > It's not very sensitive (it's about shares and options for a company I > used to be employed by), but I'd prefer it not to be posted publicly on > the bugtracker, or widely distributed among emacs hackers. I've now created a smaller an anonymized example. I tried to mail it to myself with sendmail -t, to confirm it still crashes emacs. Mailing for some reason didn't work, but the bounce I got back is a good enough example: It is displayed by Gnus with a button looking like [5. application/pdf; Brev aktieägar 131127.pdf]... and pressing "o" on that makes emacs crash, just as withh the original message. Attached in gzip form. I hope emacs doesn't automagically unpack and display the buttons for the embedded attachment when you read this in emacs, but if it does, be careful. Regards, /Niels [-- Attachment #2: Compressed problem message --] [-- Type: application/octet-stream, Size: 2160 bytes --] [-- Attachment #3: Type: text/plain, Size: 133 bytes --] -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-28 22:17 ` Niels Möller 2013-11-28 22:46 ` Niels Möller @ 2013-11-29 7:16 ` Eli Zaretskii 2013-11-29 8:49 ` Niels Möller 1 sibling, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 7:16 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Thu, 28 Nov 2013 23:17:06 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > If you expected Emacs to perform normalization in this case, then I > > don't think we do this automatically (or at all). > > I think for display, normalizing is definitely the right thing to do > (the unicode spec, as I understand it, require that a "compliant" > implementation treats different ways to code "ä" equivalently). > But I understand if emacs currenty doesn't do that. Someone(TM) should write the code to do that. > (Digression: I think text-processor supporting unicode really ought to > represent "characters" as interned strings of unicode (or utf-8) code > points. That's what Emacs does since v23.1 (except that we extend the range of Unicode codepoints to represent some non-unified characters and binary raw bytes). > These characters can have relations such as "normalized to" This part requires incorporation of tables and supporting code, which needs to be written. > glyphs should usually be associated only with the normalized form. Here I disagree. There are definitely situations where this is not TRT, and they aren't "unusual". > I'll try to send you a private mail with the bulk of the message with > the body of the attachment replaced (the base64 text in the raw message; > if the problem really is with the attachment headers, that shouldn't > matter); if that's for some reason not usable, I'll send you the > complete message. Thanks. I'd also need instructions to display that message in Gnus after saving it to a file, starting with "emacs -Q", as I don't have Gnus set up and don't use it in my day-to-day work. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 7:16 ` Eli Zaretskii @ 2013-11-29 8:49 ` Niels Möller 2013-11-29 9:00 ` Eli Zaretskii 0 siblings, 1 reply; 21+ messages in thread From: Niels Möller @ 2013-11-29 8:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 Eli Zaretskii <eliz@gnu.org> writes: >> (Digression: I think text-processor supporting unicode really ought to >> represent "characters" as interned strings of unicode (or utf-8) code >> points. > > That's what Emacs does since v23.1 (except that we extend the range of > Unicode codepoints to represent some non-unified characters and binary > raw bytes). Good! I thought emacs used a simpler mapping character <-> a single unicode value. > > glyphs should usually be associated only with the normalized form. > > Here I disagree. There are definitely situations where this is not > TRT, and they aren't "unusual". Ok. What's the typical use case where you'd want to have different glyphs for "Å", "A" + ring above combining char, and Ångström unit sign? > Thanks. I'd also need instructions to display that message in Gnus > after saving it to a file, starting with "emacs -Q", as I don't have > Gnus set up and don't use it in my day-to-day work. I'm also not sure how to do that, but I'll try to figure out. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 8:49 ` Niels Möller @ 2013-11-29 9:00 ` Eli Zaretskii 2013-11-29 10:43 ` Niels Möller 0 siblings, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 9:00 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 09:49:15 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> (Digression: I think text-processor supporting unicode really ought to > >> represent "characters" as interned strings of unicode (or utf-8) code > >> points. > > > > That's what Emacs does since v23.1 (except that we extend the range of > > Unicode codepoints to represent some non-unified characters and binary > > raw bytes). > > Good! I thought emacs used a simpler mapping character <-> a single > unicode value. Maybe I misunderstood you: what's the difference between those two alternatives? > > > glyphs should usually be associated only with the normalized form. > > > > Here I disagree. There are definitely situations where this is not > > TRT, and they aren't "unusual". > > Ok. What's the typical use case where you'd want to have different > glyphs for "Å", "A" + ring above combining char, and Ångström unit sign? MacOS file names, I think. Also, display in "C-u C-x =", which is very important for understanding and debugging Emacs display features. > > Thanks. I'd also need instructions to display that message in Gnus > > after saving it to a file, starting with "emacs -Q", as I don't have > > Gnus set up and don't use it in my day-to-day work. > > I'm also not sure how to do that, but I'll try to figure out. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 9:00 ` Eli Zaretskii @ 2013-11-29 10:43 ` Niels Möller 2013-11-29 11:26 ` Eli Zaretskii 2013-11-29 15:04 ` Stefan Monnier 0 siblings, 2 replies; 21+ messages in thread From: Niels Möller @ 2013-11-29 10:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 Eli Zaretskii <eliz@gnu.org> writes: >> Good! I thought emacs used a simpler mapping character <-> a single >> unicode value. > > Maybe I misunderstood you: what's the difference between those two > alternatives? What I think is the right thing, is to allow a sequence of unicode values, e.g., "A" + combining character, or "A" + any random sequence of combining characters, intern this string, and treat this as a single "character". The idea is that this character object should correspond to what the user thinks of as a single character. E.g, one glyph per character, and treated as a unit by forward-char, and regexp matching with "." and character sets. When reading text files, the character boundaries may be configurble. E.g, there could be a mode which makes each and every unicode value a single character, which will then be displayed as separate glyphs, separate characters for regexp matching, etc. >> > Thanks. I'd also need instructions to display that message in Gnus >> > after saving it to a file, starting with "emacs -Q", as I don't have >> > Gnus set up and don't use it in my day-to-day work. Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*). Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer, press G d to create a directory group, enter ~/tmp/mail. You should now be able to enter that group, and select the message in the *Summary* buffer. To mimic my setup, do this in an xterm running in a latin-1 locale. (I have to send this off now, I'll try later to really see if this recipe reproduces the problem for me). I also tried to reproduce the problem on another machine, with debian gnu/linux and emacs-23.4. This version worked fine, no crash. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 10:43 ` Niels Möller @ 2013-11-29 11:26 ` Eli Zaretskii 2013-11-29 12:41 ` Niels Möller 2013-11-29 15:04 ` Stefan Monnier 1 sibling, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 11:26 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 11:43:45 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> Good! I thought emacs used a simpler mapping character <-> a single > >> unicode value. > > > > Maybe I misunderstood you: what's the difference between those two > > alternatives? > > What I think is the right thing, is to allow a sequence of unicode > values, e.g., "A" + combining character, or "A" + any random sequence of > combining characters, intern this string, and treat this as a single > "character". That's not how Emacs represents and treats characters. The composition happens only at display time, and normalization, as it's currently implemented, happens when text is read into a buffer. Thereafter, each Unicode character is a single character, and there's no combining of them for any purpose except display. > The idea is that this character object should correspond to what the > user thinks of as a single character. E.g, one glyph per character, and > treated as a unit by forward-char, and regexp matching with "." and > character sets. What gets displayed as a single unit is a "grapheme cluster", not a single glyph. Whether a grapheme cluster that corresponds to "A" + any random sequence of combining characters maps to a single glyph depends on the font being used, which is something the user should not need to worry about. However, we do want to give the user a way to delete only one or more of the combining characters, so forcing the entire combination to be a single indivisible entity would not be TRT for users. Cursor motion does consider the entire thing as a single entity and moves across all of it, but that requires special code. IOW, things are not that simple, and I think the design you are suggesting is problematic in that it will remove several important features, or make them harder to implement. > When reading text files, the character boundaries may be configurble. The important question is what to do by default, as many users will not be happy if asked too many questions or requested to specify too many parameters for reading text. Compare this with the need to specify the encoding in too many cases in the early days of multilingual Emacs -- there was a user outcry about that. > E.g, there could be a mode which makes each and every unicode value a > single character, which will then be displayed as separate glyphs, > separate characters for regexp matching, etc. You are mixing display issues with editing issues and with how characters are represented internally in an Emacs buffer. These all are separate, and do not necessarily need to handle characters in the same rigid way. > Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*). > > Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to > "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer, > press G d to create a directory group, enter ~/tmp/mail. You should now > be able to enter that group, and select the message in the *Summary* > buffer. > > To mimic my setup, do this in an xterm running in a latin-1 locale. (I > have to send this off now, I'll try later to really see if this recipe > reproduces the problem for me). Thanks, I will try that. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 11:26 ` Eli Zaretskii @ 2013-11-29 12:41 ` Niels Möller 2013-11-29 14:50 ` Eli Zaretskii 2013-11-29 16:18 ` Eli Zaretskii 0 siblings, 2 replies; 21+ messages in thread From: Niels Möller @ 2013-11-29 12:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 Eli Zaretskii <eliz@gnu.org> writes: > However, we do want to give the user a way to > delete only one or more of the combining characters, so forcing the > entire combination to be a single indivisible entity would not be TRT > for users. Good question, how to handle this. Today, to remove the dots from an "ä" character, I'll have to delete the complete "ä" character and insert a new "a" character. Or similarly for the reverse edit. I think this "atomic" handling is the desired behaviour in many cases. And I don't think it should behave differently depending on the representation of "ä" in the original file. But if you have a complex sequence of unicode combining characters, I agree there's some need to be able to edit it. Maybe put point on the character and invoke edit-char to go in some special mode which explodes the usually "atomic" character into smaller pieces. And such a character edit mode might be useful for more things than unicode composing characters, e.g, manipulationg the different sub-parts of a chinese character. Anyway, this user interface is not intimately tied to the internal character representation; its overall effect on the buffer will be the same as replacing any substring. >> When reading text files, the character boundaries may be configurble. > > The important question is what to do by default, I'm pretty sure the default should be that a sequence of one unicode base char and all following unicode combining chars is interned as a single "emacs character". (I think the detailed rules for this are spelled out in the unicode book). With some arbitrary limit to prevent a GByte file with only unicode combining characters to get read as a single emacs character; say at most 10 combining characters. > You are mixing display issues with editing issues and with how > characters are represented internally in an Emacs buffer. I think it's confusing for users if the units of text which forward-char skips over, do not correspond to the units matched by "." in isearch-forward-regexp. My suggested internal representation seems to be a natural way to get this correspondence right, at the cost of some memory (or lots of complexity in reducing memory usage). I'm sure there are other ways, and maybe also a lot better ways, to implement the same thing. > Thanks, I will try that. Now I've also reproduced it on the same machine, without my normal Gnus setup getting in the way. I start emacs with $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el where bug.el contains (setq gnus-init-file nil) (setq gnus-nntp-server nil) (gnus-no-server) Then create the group with G d, pointing out the spool-like directory, enter the group (RET), view the message (RET), try to write out the attachment ("o" on the attachment button). Still crashes for me. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 12:41 ` Niels Möller @ 2013-11-29 14:50 ` Eli Zaretskii 2013-11-29 16:18 ` Eli Zaretskii 1 sibling, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 14:50 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > Today, to remove the dots from an "ä" character, I'll have to delete the > complete "ä" character and insert a new "a" character. Not if they were originally two or more characters which were composed into one. In that case, we let the user edit them separately. > I think this "atomic" handling is the desired behaviour in many > cases. For "ä", this is arguable. For more complex script, this is definitely wrong: users want to be able to edit each component separately. > But if you have a complex sequence of unicode combining characters, > I agree there's some need to be able to edit it. Maybe put point on > the character and invoke edit-char to go in some special mode which > explodes the usually "atomic" character into smaller pieces. We already do that, but if the characters were combined, and Emacs doesn't even know they were separate to begin with, it cannot do that, can it? > > You are mixing display issues with editing issues and with how > > characters are represented internally in an Emacs buffer. > > I think it's confusing for users if the units of text which forward-char > skips over, do not correspond to the units matched by "." in > isearch-forward-regexp. What happens under the hood with matching and what is shown to the user doesn't have to be identical. In fact, it cannot be identical. Again, please don't mix internal implementation and UI, they cannot be possibly identical anyway, because there are conflicting user requirements in different situations. > My suggested internal representation seems to be a natural way to get > this correspondence right, at the cost of some memory (or lots of > complexity in reducing memory usage). It only seems to be that. Real life is much more messy, and defeats such simplicity on many levels. > Now I've also reproduced it on the same machine, without my normal Gnus > setup getting in the way. I start emacs with > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > where bug.el contains > > (setq gnus-init-file nil) > (setq gnus-nntp-server nil) > (gnus-no-server) > > Then create the group with G d, pointing out the spool-like directory, > enter the group (RET), view the message (RET), try to write out the > attachment ("o" on the attachment button). Still crashes for me. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 12:41 ` Niels Möller 2013-11-29 14:50 ` Eli Zaretskii @ 2013-11-29 16:18 ` Eli Zaretskii 2013-11-30 13:20 ` Eli Zaretskii 1 sibling, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 16:18 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > where bug.el contains > > (setq gnus-init-file nil) > (setq gnus-nntp-server nil) > (gnus-no-server) > > Then create the group with G d, pointing out the spool-like directory, > enter the group (RET), view the message (RET), try to write out the > attachment ("o" on the attachment button). Still crashes for me. It crashes in the current development trunk as well, but only if the locale is set to Latin-1, like yours. I'm looking at this. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 16:18 ` Eli Zaretskii @ 2013-11-30 13:20 ` Eli Zaretskii 2013-11-30 14:25 ` Kenichi Handa 2013-11-30 15:50 ` Niels Möller 0 siblings, 2 replies; 21+ messages in thread From: Eli Zaretskii @ 2013-11-30 13:20 UTC (permalink / raw) To: Kenichi Handa; +Cc: 15984, nisse > From: Eli Zaretskii <eliz@gnu.org> > Cc: 15984@debbugs.gnu.org > > > From: nisse@lysator.liu.se (Niels Möller) > > Cc: 15984@debbugs.gnu.org > > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > > > where bug.el contains > > > > (setq gnus-init-file nil) > > (setq gnus-nntp-server nil) > > (gnus-no-server) > > > > Then create the group with G d, pointing out the spool-like directory, > > enter the group (RET), view the message (RET), try to write out the > > attachment ("o" on the attachment button). Still crashes for me. > > It crashes in the current development trunk as well, but only if the > locale is set to Latin-1, like yours. > > I'm looking at this. There's something strange going on here; I'm CC'ing Handa-san, because the problem is related to processing character compositions on a TTY. The reason for the crash is simple: the following code from indent.c:scan_for_column /* Check composition sequence. */ if (cmp_it.id >= 0 || (scan == cmp_it.stop_pos && composition_reseat_it (&cmp_it, scan, scan_byte, end, w, NULL, Qnil))) composition_update_it (&cmp_it, scan, scan_byte, Qnil); if (cmp_it.id >= 0) { scan += cmp_it.nchars; scan_byte += cmp_it.nbytes; if (scan <= end) col += cmp_it.width; if (cmp_it.to == cmp_it.nglyphs) { cmp_it.id = -1; composition_compute_stop_pos (&cmp_it, scan, scan_byte, end, Qnil); } else cmp_it.from = cmp_it.to; continue; } incorrectly steps into the middle of a multibyte sequence #xCC #x88 for the character u+0308, the Combining Diaeresis, because cmp_it.nbytes is computed as 1 instead of 2. The question is why it does so. From stepping through composition_reseat_it and composition_update_it, it looks like the code contradicts itself: it thinks that 'a' and the combining diaeresis should be composed, but then acts as if no composition should happen. As result, this code in composition_update_it: glyph = LGSTRING_GLYPH (gstring, cmp_it->from); cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from; cmp_it->nbytes = 0; cmp_it->width = 0; for (i = cmp_it->nchars - 1; i >= 0; i--) { c = XINT (LGSTRING_CHAR (gstring, i)); cmp_it->nbytes += CHAR_BYTES (c); cmp_it->width += CHAR_WIDTH (c); } always considers only 'a', never the diaeresis, and so cmp_it->nbytes is always computed as 1. So scan_for_column advances only 1 byte, instead of 2, and finds itself in the middle of a multibyte sequence. From there, it's a sure way to a crash. I hope Handa-san will be able to find the problem. The crash is 100% reproducible with the steps described above and a mail message that Niels can send you off-list. TIA ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-30 13:20 ` Eli Zaretskii @ 2013-11-30 14:25 ` Kenichi Handa 2013-11-30 16:09 ` Eli Zaretskii 2013-11-30 15:50 ` Niels Möller 1 sibling, 1 reply; 21+ messages in thread From: Kenichi Handa @ 2013-11-30 14:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984, nisse In article <83siue58mq.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > There's something strange going on here; I'm CC'ing Handa-san, because > the problem is related to processing character compositions on a TTY. [...] > I hope Handa-san will be able to find the problem. The crash is 100% > reproducible with the steps described above and a mail message that > Niels can send you off-list. Thank you for tracking down the bug. I'll investigate the cause of of the problem. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-30 14:25 ` Kenichi Handa @ 2013-11-30 16:09 ` Eli Zaretskii 0 siblings, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2013-11-30 16:09 UTC (permalink / raw) To: Kenichi Handa; +Cc: 15984, nisse > From: Kenichi Handa <handa@gnu.org> > Cc: nisse@lysator.liu.se, 15984@debbugs.gnu.org > Date: Sat, 30 Nov 2013 23:25:06 +0900 > > > I hope Handa-san will be able to find the problem. The crash is 100% > > reproducible with the steps described above and a mail message that > > Niels can send you off-list. > > Thank you for tracking down the bug. I'll investigate > the cause of of the problem. Thanks. To save you some time, the problem only happens in a Latin-1 locale, so I used this command to invoke Emacs: HOME=$HOME/tmp LC_CTYPE=sv_SE.ISO8859-1 src/emacs -Q -l bug.el ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-30 13:20 ` Eli Zaretskii 2013-11-30 14:25 ` Kenichi Handa @ 2013-11-30 15:50 ` Niels Möller 1 sibling, 0 replies; 21+ messages in thread From: Niels Möller @ 2013-11-30 15:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984 Eli Zaretskii <eliz@gnu.org> writes: > I hope Handa-san will be able to find the problem. The crash is 100% > reproducible with the steps described above and a mail message that > Niels can send you off-list. I ended up sending an anonymized example message to the list, see http://debbugs.gnu.org/cgi/bugreport.cgi?msg=14;filename=bounce.gz;att=1;bug=15984 Thanks for looking into this. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 10:43 ` Niels Möller 2013-11-29 11:26 ` Eli Zaretskii @ 2013-11-29 15:04 ` Stefan Monnier 2013-11-29 15:27 ` Eli Zaretskii 2013-11-30 8:53 ` Niels Möller 1 sibling, 2 replies; 21+ messages in thread From: Stefan Monnier @ 2013-11-29 15:04 UTC (permalink / raw) To: Niels Möller; +Cc: 15984 > What I think is the right thing, is to allow a sequence of unicode > values, e.g., "A" + combining character, or "A" + any random sequence of > combining characters, intern this string, and treat this as a single > "character". For the Lisp-level notion of "character", I think this would require too many deep changes. > The idea is that this character object should correspond to what the > user thinks of as a single character. E.g, one glyph per character, and > treated as a unit by forward-char, and regexp matching with "." and > character sets. For forward-char, we do try to fake that behavior (e.g. a `forward-char' command will skip over the whole A+ring combo) but not faithfully (e.g. `C-u 2 forward-char' will also just skip that combo, and not the subsequent char). It's not perfect, but it seems "close enough" that it hasn't proved problematic. Adjusting . in regexps would indeed help solve some unexpected behaviors. We would probably want to keep the ability to match a single "code point", so we'd need to introduce a new regexp operator. Maybe we could follow the lead of the POSIX collation thingy, IIRC, where [ϐ] in case-folding mode wants to be able to match SS in a German locale. So maybe [[:any:]] could match A+ring. > E.g, there could be a mode which makes each and every unicode value a > single character, which will then be displayed as separate glyphs, > separate characters for regexp matching, etc. I think we wouldn't want to use different modes (too coarse) but different commands instead. In any case, a first step would be to find a name for that notion of "multi character character". "Grapheme cluster" doesn't sound too good if we want to expose the concept to the end user. Stefan ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 15:04 ` Stefan Monnier @ 2013-11-29 15:27 ` Eli Zaretskii 2013-11-30 8:53 ` Niels Möller 1 sibling, 0 replies; 21+ messages in thread From: Eli Zaretskii @ 2013-11-29 15:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15984, nisse > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 10:04:04 -0500 > > In any case, a first step would be to find a name for that notion of "multi > character character". "Grapheme cluster" doesn't sound too good if we > want to expose the concept to the end user. Why should we invent terminology where one already exists and is widely accepted and used? It sounds like waste of energy. Explain the term well enough, and users will have no difficulty. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-29 15:04 ` Stefan Monnier 2013-11-29 15:27 ` Eli Zaretskii @ 2013-11-30 8:53 ` Niels Möller 1 sibling, 0 replies; 21+ messages in thread From: Niels Möller @ 2013-11-30 8:53 UTC (permalink / raw) To: Stefan Monnier; +Cc: 15984 Stefan Monnier <monnier@iro.umontreal.ca> writes: >> What I think is the right thing, is to allow a sequence of unicode >> values, e.g., "A" + combining character, or "A" + any random sequence of >> combining characters, intern this string, and treat this as a single >> "character". > > For the Lisp-level notion of "character", I think this would require too > many deep changes. I can understand that. I'm actually impressed by the move from MULE encodings to unicode, which to a user appeared to very smooth. But I still think that type of "character" abstraction the right thing for unicode text processing in general. > For forward-char, we do try to fake that behavior (e.g. a `forward-char' > command will skip over the whole A+ring combo) but not faithfully > (e.g. `C-u 2 forward-char' will also just skip that combo, and not the > subsequent char). It's not perfect, but it seems "close enough" that it > hasn't proved problematic. Didn't know, that's a bit weird. I just tried, as Eli suggested, editing text with "ä" represented with a as a combining character. In emacs-23.4, pressing DEL after the "ä" deletes the dots only. I now understand why, but it's not what I had expected, and I think deleteing the entire A + dots would be preferable. Plain C-x = on the "a" shows just "Char: a (97, #o141, #x61) point=443 of 455 (97%) column=1", but C-u C-x = also shows the combining char. However, emacs-24.3 behaves differently, the 'a' and the '"' gets displayed differently, and are not combined at all for display. The buffer shows 'a"', and according to C-u C-x 8 the '"' is a "COMBINING DIAERESIS". These tests done in an X11 frame, so maybe they're just picking up different fonts? >> E.g, there could be a mode which makes each and every unicode value a >> single character, which will then be displayed as separate glyphs, >> separate characters for regexp matching, etc. > > I think we wouldn't want to use different modes (too coarse) but > different commands instead. I didn't mean an emacs major or minor mode. It would be more like a special coding system, applied when reading the text from file. > In any case, a first step would be to find a name for that notion of "multi > character character". "Grapheme cluster" doesn't sound too good if we > want to expose the concept to the end user. I think "character" is the right word, the main source of confusion is that unicode code points are often referred to as "characters". Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#15984: 24.3; Problem with combining characters in attachment filename 2013-11-28 20:25 ` Eli Zaretskii 2013-11-28 22:17 ` Niels Möller @ 2013-11-29 13:11 ` Kenichi Handa 1 sibling, 0 replies; 21+ messages in thread From: Kenichi Handa @ 2013-11-29 13:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 15984, nisse In article <83iovc8eaq.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > If you expected Emacs to perform normalization in this case, then I > don't think we do this automatically (or at all). The library "ucs-normalize" (under lisp/international/) provides the coding system utf-8-hfs which may be appropiate for file-name-coding-system on Mac. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <87eh574qmm.fsf@gnu.org>]
* bug#15984: 24.3; Problem with combining characters in attachment filename [not found] ` <87eh574qmm.fsf@gnu.org> @ 2014-01-17 13:30 ` K. Handa 0 siblings, 0 replies; 21+ messages in thread From: K. Handa @ 2014-01-17 13:30 UTC (permalink / raw) To: K. Handa; +Cc: 15984, nisse In article <87eh574qmm.fsf@gnu.org>, handa@gnu.org (K. Handa) writes: > I'll keep trying to find why the trunk doesn't crash with > you recipe, and once I find the whole story, I'll install a > proper patch (which may be the same as what I sent) to the > trunk. I couldn't reproduce that bug with the trunk code. I rewinded back to the day 2013-03-11 which is the day 24.3 was released and I can reproduce the bug with 24.3. So, I am now very puzzled. Anyway, I installed that fix to the trunk because the previous code was apparently wrong. --- Kenichi Handa handa@gnu.org PS. I've just noticed that recent mails exchanged on this matter were not CC:ed to 15984@debbugs.gnu.org. So, to provide the context, I attach some key mails here. -1-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels Möller) To: handa@gnu.org (K. Handa) Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename handa@gnu.org (K. Handa) writes: > In article <83siue58mq.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > >> I hope Handa-san will be able to find the problem. The crash is 100% >> reproducible with the steps described above and a mail message that >> Niels can send you off-list. > > Could you please send me that mail message? I'll delete it > as soon as I can find a fix. I believe the smaller bounce message I posted in the bugtracker exhibits the the problem. That's the same file Eli was using when reproducing the problem. Described at http://debbugs.gnu.org/cgi/bugreport.cgi?msg=14;bug=15984 actual messge (gzipped): http://debbugs.gnu.org/cgi/bugreport.cgi?msg=14;filename=bounce.gz;att=1;bug=15984 Steps to reproduce the problem (this info spread out in the bug thread): 1. Create a new directory, say mail-tmp. Copy the message (uncompressed) into that directory, with filename "1". 2. Start emacs in tty mode, with a latin-1 locale, like HOME=$HOME/tmp LC_CTYPE=sv_SE.ISO8859-1 src/emacs -Q -l bug.el with bug.el containing (setq gnus-init-file nil) (setq gnus-nntp-server nil) (gnus-no-server) 3. Then, in Gnus' *Group* buffer, create the group with G d, pointing out the mail-tmp directory, enter the group (RET), view the message (RET), try to write out the attachment ("o" on the attachment button). Still crashes for me. Let me know if you need anything further info. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -2-------------------------------------------------------------------- From: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename To: handa@gnu.org (K. Handa) Cc: nisse@lysator.liu.se, handa@gnu.org > From: handa@gnu.org (K. Handa) > Cc: eliz@gnu.org, handa@gnu.org > Date: Fri, 13 Dec 2013 23:15:00 +0900 > > In article <nn4n6dag53.fsf@bacon.lysator.liu.se>, nisse@lysator.liu.se (Niels Möller) writes: > > > And tty mode, no X frame (I used an xterm, started in a latin-1 locale). > > Yes. I surely add "-nw" argument, and I tried the recipe > with xterm and lxterminal. I cannot reproduce this either, with today's trunk. Perhaps you could try with the trunk as it was on Nov 30, or with Emacs 24.3? > By the way, I noticed that buffer-file-coding-system of > Gnus's message buffer (the buffer showing that bounce mail) > is raw-text-unix. Is it the same with you? Yes. This might be part of the problem, or it could be the trigger for the crash. -3--------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: nisse@lysator.liu.se (Niels Möller) Cc: eliz@gnu.org, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In article <nn4n6dag53.fsf@bacon.lysator.liu.se>, nisse@lysator.liu.se (Niels Möller) writes: > And tty mode, no X frame (I used an xterm, started in a latin-1 locale). Yes. I surely add "-nw" argument, and I tried the recipe with xterm and lxterminal. By the way, I noticed that buffer-file-coding-system of Gnus's message buffer (the buffer showing that bounce mail) is raw-text-unix. Is it the same with you? --- Kenichi Handa handa@gnu.org -4-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels Möller) To: handa@gnu.org (K. Handa) Cc: eliz@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename handa@gnu.org (K. Handa) writes: > By the way, I noticed that buffer-file-coding-system of > Gnus's message buffer (the buffer showing that bounce mail) > is raw-text-unix. Is it the same with you? Yes. Probably wasn't in the original mail (if you like, I can look into that further, but I don't want to crash the emacs I'm writing this in right now). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -5-------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: Eli Zaretskii <eliz@gnu.org> Cc: nisse@lysator.liu.se, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In article <838uvo6cjx.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > I cannot reproduce this either, with today's trunk. Perhaps you > could try with the trunk as it was on Nov 30, or with Emacs 24.3? With Emacs 24.3, I could reproduce the bug and the patch attached at the tail seems to fix it. Could you please try it? It is applicable to the latest code too. But, with the trunk, I have not yet succeeded in reproducing the bug. I tried from the revision of Nov 30 and went back to April one month by one. > > By the way, I noticed that buffer-file-coding-system of > > Gnus's message buffer (the buffer showing that bounce mail) > > is raw-text-unix. Is it the same with you? > Yes. This might be part of the problem, or it could be the trigger > for the crash. With Emacs 24.3, the bug can be reproduced with a multibyte buffer. --- Kenichi Handa handa@gnu.org === modified file 'src/composite.c' --- src/composite.c 2013-01-01 09:11:05 +0000 +++ src/composite.c 2013-12-19 13:49:53 +0000 @@ -1426,7 +1426,7 @@ cmp_it->width = 0; for (i = cmp_it->nchars - 1; i >= 0; i--) { - c = XINT (LGSTRING_CHAR (gstring, i)); + c = XINT (LGSTRING_CHAR (gstring, cmp_it->from + i)); cmp_it->nbytes += CHAR_BYTES (c); cmp_it->width += CHAR_WIDTH (c); } -6-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels Möller) To: handa@gnu.org (K. Handa) Cc: Eli Zaretskii <eliz@gnu.org> Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename handa@gnu.org (K. Handa) writes: > With Emacs 24.3, I could reproduce the bug and the patch > attached at the tail seems to fix it. Could you please try > it? It is applicable to the latest code too. I compiled 24.3.1 with the patch applied. It no longer crashes. Great! Behavior is that on saving the attachment, the default filename is displayed as "Brev aktiea?gar 131127.pdf", where the question mark really is a COMBINING DIAERESIS (according to C-u C-x =). When I press enter, the file is saved under the file name "Brev aktiea gar 131127.pdf", with the combining diaeresis replaced by a SPC character (checked with GNU ls -N | od -tx1c). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -7--------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: nisse@lysator.liu.se (Niels Möller) Cc: eliz@gnu.org, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In article <nn4n64m18i.fsf@bacon.lysator.liu.se>, nisse@lysator.liu.se (Niels Möller) writes: > handa@gnu.org (K. Handa) writes: > > With Emacs 24.3, I could reproduce the bug and the patch > > attached at the tail seems to fix it. Could you please try > > it? It is applicable to the latest code too. > I compiled 24.3.1 with the patch applied. It no longer crashes. Great! Thank you for testing that. > Behavior is that on saving the attachment, the default filename is > displayed as "Brev aktiea?gar 131127.pdf", where the question mark > really is a COMBINING DIAERESIS (according to C-u C-x =). When I press > enter, the file is saved under the file name "Brev aktiea gar > 131127.pdf", with the combining diaeresis replaced by a SPC character > (checked with GNU ls -N | od -tx1c). This just my guess, but, as far as you are in ISO-8859-1 locale, there's no way to encode that combining diaeresis, so gnus uses SPC as a replacement character. Perhaps, gnus should warn you about that and ask you how to encode the file name. Anyway that is completely different matter than bug#15984. I'll keep trying to find why the trunk doesn't crash with you recipe, and once I find the whole story, I'll install a proper patch (which may be the same as what I sent) to the trunk. --- Kenichi Handa handa@gnu.org ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-01-17 13:30 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-28 8:08 bug#15984: 24.3; Problem with combining characters in attachment filename Niels Möller 2013-11-28 20:25 ` Eli Zaretskii 2013-11-28 22:17 ` Niels Möller 2013-11-28 22:46 ` Niels Möller 2013-11-29 7:16 ` Eli Zaretskii 2013-11-29 8:49 ` Niels Möller 2013-11-29 9:00 ` Eli Zaretskii 2013-11-29 10:43 ` Niels Möller 2013-11-29 11:26 ` Eli Zaretskii 2013-11-29 12:41 ` Niels Möller 2013-11-29 14:50 ` Eli Zaretskii 2013-11-29 16:18 ` Eli Zaretskii 2013-11-30 13:20 ` Eli Zaretskii 2013-11-30 14:25 ` Kenichi Handa 2013-11-30 16:09 ` Eli Zaretskii 2013-11-30 15:50 ` Niels Möller 2013-11-29 15:04 ` Stefan Monnier 2013-11-29 15:27 ` Eli Zaretskii 2013-11-30 8:53 ` Niels Möller 2013-11-29 13:11 ` Kenichi Handa [not found] ` <87eh574qmm.fsf@gnu.org> 2014-01-17 13:30 ` K. Handa
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).