From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Peter Dyballa Newsgroups: gmane.emacs.bugs Subject: bug#12803: 24.3.50; accented Thai Unicode characters are turned into decomposed ones on Mac OS X by replace-regexp Date: Sun, 4 Nov 2012 23:35:58 +0100 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1352071628 24114 80.91.229.3 (4 Nov 2012 23:27:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 4 Nov 2012 23:27:08 +0000 (UTC) To: 12803@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Nov 05 00:27:15 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TV9az-0005mG-NM for geb-bug-gnu-emacs@m.gmane.org; Mon, 05 Nov 2012 00:27:13 +0100 Original-Received: from localhost ([::1]:56016 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9aq-00060m-RW for geb-bug-gnu-emacs@m.gmane.org; Sun, 04 Nov 2012 18:27:04 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:34877) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9ao-00060X-53 for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:27:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TV9an-0007NG-0Z for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:27:02 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:39157) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9am-0007NC-TP for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:27:00 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TV9di-0003Ba-Or for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:30:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Peter Dyballa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 04 Nov 2012 23:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 12803 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.135207179612208 (code B ref -1); Sun, 04 Nov 2012 23:30:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 4 Nov 2012 23:29:56 +0000 Original-Received: from localhost ([127.0.0.1]:49408 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TV9db-0003Aq-Mr for submit@debbugs.gnu.org; Sun, 04 Nov 2012 18:29:56 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:37838) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TV9dY-0003Aj-Uj for submit@debbugs.gnu.org; Sun, 04 Nov 2012 18:29:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TV9ab-0007MG-BX for submit@debbugs.gnu.org; Sun, 04 Nov 2012 18:26:50 -0500 Original-Received: from lists.gnu.org ([208.118.235.17]:37355) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9ab-0007MC-88 for submit@debbugs.gnu.org; Sun, 04 Nov 2012 18:26:49 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:34826) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9aZ-0005zO-S4 for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:26:49 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TV9aY-0007Lv-AJ for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:26:47 -0500 Original-Received: from mout3.freenet.de ([2001:748:100:40::2:5]:40231) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TV9aY-0007Ky-0C for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 18:26:46 -0500 Original-Received: from [195.4.92.140] (helo=mjail0.freenet.de) by mout3.freenet.de with esmtpa (ID peter_dyballa@freenet.de) (port 25) (Exim 4.76 #1) id 1TV8pl-0005nC-82 for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 23:38:25 +0100 Original-Received: from localhost ([::1]:42576 helo=mjail0.freenet.de) by mjail0.freenet.de with esmtpa (ID peter_dyballa@freenet.de) (Exim 4.76 #1) id 1TV8pl-0002yo-5h for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 23:38:25 +0100 Original-Received: from [195.4.92.17] (port=58539 helo=7.mx.freenet.de) by mjail0.freenet.de with esmtpa (ID peter_dyballa@freenet.de) (Exim 4.76 #1) id 1TV8nR-0002Yd-66 for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 23:36:01 +0100 Original-Received: from ip-95-222-201-211.unitymediagroup.de ([95.222.201.211]:54958 helo=[192.168.178.8]) by 7.mx.freenet.de with esmtpsa (ID peter_dyballa@freenet.de) (TLSv1:AES128-SHA:128) (port 587) (Exim 4.76 #1) id 1TV8nQ-0001yD-UY for bug-gnu-emacs@gnu.org; Sun, 04 Nov 2012 23:36:01 +0100 X-Mailer: Apple Mail (2.1085) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:66453 Archived-At: Hello! I wanted to get the unique Thai characters from such an eMail subject: FW:grcthai = =E0=B8=AA=E0=B8=A3=E0=B9=89=E0=B8=B2=E0=B8=87=E0=B8=A3=E0=B8=B2=E0=B8=A2=E0= =B9=84=E0=B8=94=E0=B9=89=E0=B9=81=E0=B8=9A=E0=B8=9A=E0=B9=84=E0=B8=A3=E0=B9= =89=E0=B8=82=E0=B8=B5=E0=B8=94=E0=B8=88=E0=B8=B3=E0=B8=81=E0=B8=B1=E0=B8=94= =E0=B8=81=E0=B8=B1=E0=B8=9A=E0=B8=81=E0=B8=B2=E0=B8=A3=E0=B8=97=E0=B8=B3=E0= =B8=87=E0=B8=B2=E0=B8=99=E0=B9=81=E0=B8=9A=E0=B8=9A=E0=B9=84=E0=B8=A3=E0=B9= =89=E0=B8=82=E0=B8=AD=E0=B8=9A=E0=B9=80=E0=B8=82=E0=B8=95.. So I marked the Thai text and invoked replace-regexp with "\(.\)" -> = =E2=80=9D\1 " to later do replace-string " " -> "C-qC-j" and then = [g]sort -u the result. I had in buffer *Shell Command Output* decomposed = Thai Unicode characters=E2=80=A6 But actually it is already the function replace-regexp which produces = the decomposed characters (originally 41 characters, after = replace-regexp not 82 but 89 according to column-number-mode). Mac OS X 10.6.8; the fonts used are FreeSerif for the Thai characters, = George Williams' Monospace Regular is used for SPACE. The result is the = same when I use GTK2 and it also make no difference when I use a native = 64-bit binary (and libs). In GNU Emacs 24.3.50.1 (i386-apple-darwin10.8.0, X toolkit, Xaw3d scroll = bars) of 2012-11-04 on Sumac.local Bzr revision: 110798 eggert@cs.ucla.edu-20121104172952-vvhdy8gmbtgj0c3w Windowing system distributor `The X.Org Foundation', version = 11.0.11300000 Configured using: `configure '--build=3Dx86_64-apple-darwin10.8.0' '--host=3Di386-apple-darwin10.8.0' '--target=3Di386-apple-darwin10.8.0' '--without-pop' '--without-sound' '--without-gpm' '--without-dbus' '--without-selinux' '--with-x-toolkit=3Dathena' '--disable-ns-self-contained' '--without-xpm' '--without-jpeg' '--without-tiff' '--without-gif' '--without-png' '--x-libraries=3D/usr/X11/lib' '--x-includes=3D/usr/X11/include' '--enable-locallisppath=3D/Library/Application Support/Emacs/calendar24:/Library/Application Support/Emacs' 'CFLAGS=3D-g3 -H -pipe -fPIC -fno-common -Os -march=3Dcore2 = -mtune=3Dcore2 -m32 -fomit-frame-pointer -msse4.2' 'LDFLAGS=3D-m32 -Wl,-dead_strip_dylibs -Wl,-bind_at_load -Wl,-t' 'CPPFLAGS=3D-I/sw/include' 'CC=3Dclang' 'CXX=3Dclang++' = 'PKG_CONFIG_PATH=3D/sw/lib/xft2/lib/pkgconfig:/sw/share/pkgconfig:/sw/lib/= pkgconfig:/usr/X11/lib/pkgconfig:/usr/X11/share/pkgconfig:/usr/lib/pkgconf= ig' 'build_alias=3Dx86_64-apple-darwin10.8.0' 'host_alias=3Di386-apple-darwin10.8.0' 'target_alias=3Di386-apple-darwin10.8.0'' Important settings: value of $LC_CTYPE: de_DE.UTF-8 value of $LANG: de_DE.UTF-8 locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: =20 =20 C-a x r e p l r e g =20 \ ( . \ ) \ 1 SPC M-x c=20 o l C-a C-u C-x =3D C-u C-x =3D=20 C-u C-x =3D C-u C-x =3D =20 =20 Recent messages: Replaced 48 occurrences Column-Number mode enabled Type C-x 1 to delete the help window, C-M-v to scroll help. Char: =E0=B8=AA (3626, #o7052, #xe2a, file ...) point=3D192 of 287 (67%) = column=3D0 Char: SPC (32, #o40, #x20) point=3D193 of 287 (67%) column=3D1 Char: =E0=B8=A3 (3619, #o7043, #xe23, file ...) point=3D194 of 287 (67%) = column=3D2 Char: =E0=B9=89 (3657, #o7111, #xe49, file ...) point=3D196 of 287 (68%) = column=3D4 Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils pp wid-edit descr-text help-mode easymenu cus-start cus-load thai-util thai-word mule-util time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dynamic-setting system-font-setting font-render-setting x-toolkit x multi-tty emacs) -- Greetings Pete The problem with the French is that they don't have a word for =C2=AB = entrepreneur =C2=BB. =E2=80=93 Georges W. Bush