From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Georges Ko Newsgroups: gmane.emacs.bugs Subject: bug#24699: Some characters should have char-width of 2 Date: Sat, 15 Oct 2016 22:31:24 +0800 Organization: gko.net Message-ID: References: <834m4d3ix5.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1476542003 3122 195.159.176.226 (15 Oct 2016 14:33:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 15 Oct 2016 14:33:23 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (windows-nt) Hamster/2.0.0.1 To: 24699@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Oct 15 16:33:19 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bvQ1S-00081V-Hs for geb-bug-gnu-emacs@m.gmane.org; Sat, 15 Oct 2016 16:33:14 +0200 Original-Received: from localhost ([::1]:52166 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bvQ1R-00005l-Cu for geb-bug-gnu-emacs@m.gmane.org; Sat, 15 Oct 2016 10:33:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52569) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bvQ1L-000058-Dt for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:33:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bvQ1G-0003Ld-Iv for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:33:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:53954) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bvQ1G-0003Kn-FH for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:33:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bvQ1G-0001V0-5Q for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:33:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: Georges Ko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 15 Oct 2016 14:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24699 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14765419765749 (code B ref -1); Sat, 15 Oct 2016 14:33:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 15 Oct 2016 14:32:56 +0000 Original-Received: from localhost ([127.0.0.1]:60143 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bvQ19-0001Uf-LA for submit@debbugs.gnu.org; Sat, 15 Oct 2016 10:32:55 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:33078) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bvQ18-0001UR-Gr for submit@debbugs.gnu.org; Sat, 15 Oct 2016 10:32:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bvQ12-0003CH-4E for submit@debbugs.gnu.org; Sat, 15 Oct 2016 10:32:49 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:36472) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bvQ12-0003Bs-0b for submit@debbugs.gnu.org; Sat, 15 Oct 2016 10:32:48 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52281) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bvQ0z-0008WB-99 for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:32:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bvQ0v-000399-Uo for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:32:45 -0400 Original-Received: from [195.159.176.226] (port=57658 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bvQ0v-00035d-Nu for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 10:32:41 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1bvQ0f-0001xO-29 for bug-gnu-emacs@gnu.org; Sat, 15 Oct 2016 16:32:25 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 132 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:8N1LrAKN1Egn390RLe6qal8ig/Y= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:124517 Archived-At: Eli Zaretskii writes: >> These characters should have a char-width of 2, for consistency? > > I'm not sure what consistency you have in mind. By consistency, I meant consistency with the characters of the same "type", and by that I mean in the same code point block. > We use the data in the Unicode file EastAsianWidth.txt to determine > which characters should have a char-width of 2. The characters you > mention are not East-Asian characters, and therefore do not get 2 as > their width. And at least some of them, e.g. U+1F5A0, are of the same > width on my screen as Latin letters, so 2 would be wrong for them. Oh OK, that explains the widths, but some are of width 2 (see output at the end of this post). I checked this EastAsianWidth.txt in http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt: MAPLE LEAF (#x1F341) and BEAR FACE (#x1F43B) are listed: 1F337..1F37C;W # So [70] TULIP..BABY BOTTLE 1F400..1F43E;W # So [63] RAT..PAW PRINTS So BEAR FACE and MAPLE LEAF should be of the same width, but they are not in Emacs: (char-width ?🍁) => 2 (char-width ?🐻) => 1 Could the EastAsianWidth.txt (not found in Emacs sources) used to generate was an old version? Is the generated output in characters.el starting from this line? ;; 2: East Asian Wide and Full-width characters. (let ((l '((#x1100 . #x115F) . . . See below the discrepancies between EastAsianWidth.txt data (from URL) and actual char-width at the end of this post, just for this block, which match the 63 RAT to RAW PRINTS code points. > Can you tell what practical problem in Emacs are you trying to solve? A side effect of having non-consistent char-width is badly aligned org-mode tables. A table with a MAPLE LEAF (width=2) and a BEAR FACE (width=1) is aligned like this by org-mode: | Char | Name | |------+------------| | 🍁 | MAPLE LEAF | | 🐻 | BEAR FACE | 2 and 3 spaces after MAPLE LEAF and BEAR FACE due to char-width difference, so I have to add things like this to mitigate that: (set-char-table-range char-width-table '(?🐻 . ?🐻) 2) Block Miscellaneous Symbols and Pictographs: E = EastAsianWidth.txt (2=W, 1=N) and W = (char-width c) E | W | Char | Code Point | Name --+---+-------+------------------+------------------------------- 2 | 1 | 🐀 | #x1F400 (128000) | RAT 2 | 1 | 🐁 | #x1F401 (128001) | MOUSE 2 | 1 | 🐂 | #x1F402 (128002) | OX 2 | 1 | 🐃 | #x1F403 (128003) | WATER BUFFALO 2 | 1 | 🐄 | #x1F404 (128004) | COW 2 | 1 | 🐅 | #x1F405 (128005) | TIGER 2 | 1 | 🐆 | #x1F406 (128006) | LEOPARD 2 | 1 | 🐇 | #x1F407 (128007) | RABBIT 2 | 1 | 🐈 | #x1F408 (128008) | CAT 2 | 1 | 🐉 | #x1F409 (128009) | DRAGON 2 | 1 | 🐊 | #x1F40A (128010) | CROCODILE 2 | 1 | 🐋 | #x1F40B (128011) | WHALE 2 | 1 | 🐌 | #x1F40C (128012) | SNAIL 2 | 1 | 🐍 | #x1F40D (128013) | SNAKE 2 | 1 | 🐎 | #x1F40E (128014) | HORSE 2 | 1 | 🐏 | #x1F40F (128015) | RAM 2 | 1 | 🐐 | #x1F410 (128016) | GOAT 2 | 1 | 🐑 | #x1F411 (128017) | SHEEP 2 | 1 | 🐒 | #x1F412 (128018) | MONKEY 2 | 1 | 🐓 | #x1F413 (128019) | ROOSTER 2 | 1 | 🐔 | #x1F414 (128020) | CHICKEN 2 | 1 | 🐕 | #x1F415 (128021) | DOG 2 | 1 | 🐖 | #x1F416 (128022) | PIG 2 | 1 | 🐗 | #x1F417 (128023) | BOAR 2 | 1 | 🐘 | #x1F418 (128024) | ELEPHANT 2 | 1 | 🐙 | #x1F419 (128025) | OCTOPUS 2 | 1 | 🐚 | #x1F41A (128026) | SPIRAL SHELL 2 | 1 | 🐛 | #x1F41B (128027) | BUG 2 | 1 | 🐜 | #x1F41C (128028) | ANT 2 | 1 | 🐝 | #x1F41D (128029) | HONEYBEE 2 | 1 | 🐞 | #x1F41E (128030) | LADY BEETLE 2 | 1 | 🐟 | #x1F41F (128031) | FISH 2 | 1 | 🐠 | #x1F420 (128032) | TROPICAL FISH 2 | 1 | 🐡 | #x1F421 (128033) | BLOWFISH 2 | 1 | 🐢 | #x1F422 (128034) | TURTLE 2 | 1 | 🐣 | #x1F423 (128035) | HATCHING CHICK 2 | 1 | 🐤 | #x1F424 (128036) | BABY CHICK 2 | 1 | 🐥 | #x1F425 (128037) | FRONT-FACING BABY CHICK 2 | 1 | 🐦 | #x1F426 (128038) | BIRD 2 | 1 | 🐧 | #x1F427 (128039) | PENGUIN 2 | 1 | 🐨 | #x1F428 (128040) | KOALA 2 | 1 | 🐩 | #x1F429 (128041) | POODLE 2 | 1 | 🐪 | #x1F42A (128042) | DROMEDARY CAMEL 2 | 1 | 🐫 | #x1F42B (128043) | BACTRIAN CAMEL 2 | 1 | 🐬 | #x1F42C (128044) | DOLPHIN 2 | 1 | 🐭 | #x1F42D (128045) | MOUSE FACE 2 | 1 | 🐮 | #x1F42E (128046) | COW FACE 2 | 1 | 🐯 | #x1F42F (128047) | TIGER FACE 2 | 1 | 🐰 | #x1F430 (128048) | RABBIT FACE 2 | 1 | 🐱 | #x1F431 (128049) | CAT FACE 2 | 1 | 🐲 | #x1F432 (128050) | DRAGON FACE 2 | 1 | 🐳 | #x1F433 (128051) | SPOUTING WHALE 2 | 1 | 🐴 | #x1F434 (128052) | HORSE FACE 2 | 1 | 🐵 | #x1F435 (128053) | MONKEY FACE 2 | 1 | 🐶 | #x1F436 (128054) | DOG FACE 2 | 1 | 🐷 | #x1F437 (128055) | PIG FACE 2 | 1 | 🐸 | #x1F438 (128056) | FROG FACE 2 | 1 | 🐹 | #x1F439 (128057) | HAMSTER FACE 2 | 1 | 🐺 | #x1F43A (128058) | WOLF FACE 2 | 1 | 🐻 | #x1F43B (128059) | BEAR FACE 2 | 1 | 🐼 | #x1F43C (128060) | PANDA FACE 2 | 1 | 🐽 | #x1F43D (128061) | PIG NOSE 2 | 1 | 🐾 | #x1F43E (128062) | PAW PRINTS Georges -- Georges Ko gko@gko.net 2016-10-15