From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: unencodable-char-position [Re: Several serious problems] Date: Sun, 11 Aug 2002 10:59:33 +0900 (JST) Sender: emacs-devel-admin@gnu.org Message-ID: <200208110159.KAA23419@etlken.m17n.org> References: <200207221711.g6MHBZo02496@aztec.santafe.edu> <200207231335.WAA25692@etlken.m17n.org> <200207240325.g6O3PdX04898@aztec.santafe.edu> <200207240437.NAA26632@etlken.m17n.org> <200207250312.g6P3C9J06653@aztec.santafe.edu> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1029031171 16452 127.0.0.1 (11 Aug 2002 01:59:31 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 11 Aug 2002 01:59:31 +0000 (UTC) Cc: spiegel@gnu.org, emacs-devel@gnu.org, d.love@dl.ac.uk Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17di0w-0004HE-00 for ; Sun, 11 Aug 2002 03:59:30 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 17diOG-0005y0-00 for ; Sun, 11 Aug 2002 04:23:36 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17di1g-0000D3-00; Sat, 10 Aug 2002 22:00:16 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17di1H-0000Bm-00 for emacs-devel@gnu.org; Sat, 10 Aug 2002 21:59:51 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17di1F-0000Ba-00 for emacs-devel@gnu.org; Sat, 10 Aug 2002 21:59:51 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17di1A-0000B0-00; Sat, 10 Aug 2002 21:59:44 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6/3.7W-20010518204228) with ESMTP id g7B1xYl18129; Sun, 11 Aug 2002 10:59:34 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.3/3.7W-20010823150639) with ESMTP id g7B1xX900368; Sun, 11 Aug 2002 10:59:33 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id KAA23419; Sun, 11 Aug 2002 10:59:33 +0900 (JST) Original-To: rms@gnu.org In-Reply-To: <200207250312.g6P3C9J06653@aztec.santafe.edu> (message from Richard Stallman on Wed, 24 Jul 2002 21:12:09 -0600 (MDT)) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:6436 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:6436 In article <200207250312.g6P3C9J06653@aztec.santafe.edu>, Richard Stallman writes: > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. > That is true. > If > we limit the number of highlighting, it may give users > incorrect information (i.e. non-highlighted characters seems > to be encodable). > It could highlight the first N runs of such characters, and display a > message saying "Many more unencodable characters found--type WHATEVER > to view them". WHATEVER could be the same command with a prefix > argument. I implemented that and tried on several files. But, it seems that such kind of feature is not that helpful. In the case that the buffer contains many unencodable chars, usually the specified coding system is wrong, and we must use a different coding system. So, it is not that interesting to know where are the other unencodable characters. In the case that the buffer contains a few unencodable chars, as it's seldam that more than one of them appear in one window, highlighting the other unencodable chars is not that useful. By the way, I've just noticed that Dave has already installed the function `unencodable-char-position' in mule-cmds.el and used it in select-safe-coding-system. That function resembles to check-coding-system-region on which we are currently discussing. But, as the docstring says, it's slow. So, I commited these changes. (1) Re-implementation of unencodable-char-position in C while adding two optional arguments. ---------------------------------------------------------------------- unencodable-char-position is a built-in function. (unencodable-char-position START END CODING-SYSTEM &optional COUNT STRING) Return position of first un-encodable character in a region. START and END specfiy the region and CODING-SYSTEM specifies the encoding to check. Return nil if CODING-SYSTEM does encode the region. If optional 4th argument COUNT is non-nil, it specifies at most how many un-encodable characters to search. In this case, the value is a list of positions. If optional 5th argument STRING is non-nil, it is a string to search for un-encodable characters. In that case, START and END are indexes to the string. ---------------------------------------------------------------------- (2) New function `search-unencodable-char' for interactive use. It utilizes `unencodable-char-position'. ---------------------------------------------------------------------- (search-unencodable-char CODING-SYSTEM) Search forward from point for a character that is not encodable. It asks which coding system to check. If such a character is found, set point after that character. Otherwise, don't move point. When called from a program, the value is a position of the found character, or nil if all characters are encodable. ---------------------------------------------------------------------- It may be good to bind C-x RET s to this command. Could someone make this command more user friendly (e.g. improving messages)? It is also easy to modify this funciton to highlight a few more (or windowful) unencodable characters if you think that is surely helpful. (3) Make select-safe-coding-system to show (at most 10) unencodable characters for each default coding systems tried. Now, if any unencodable chars are found, one can type C-g to cancel further saving. As C-g doesn't hide *Warning* buffer, one can clik on the displayed unencodable chars to jump to the corresponding position in a buffer. --- Ken'ichi HANDA handa@etl.go.jp