From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: Re: non-ASCII TAGS Date: Wed, 23 Apr 2003 20:01:04 +0900 (JST) Sender: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: <200304231101.UAA04310@etlken.m17n.org> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: main.gmane.org 1051098171 1666 80.91.224.249 (23 Apr 2003 11:42:51 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 23 Apr 2003 11:42:51 +0000 (UTC) Cc: handa@m17n.org Original-X-From: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Wed Apr 23 13:42:48 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 198IeG-0000QY-00 for ; Wed, 23 Apr 2003 13:42:48 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 198IZc-0004hC-07 for gnu-bug-gnu-emacs@m.gmane.org; Wed, 23 Apr 2003 07:38:00 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 198IWQ-0003Cy-00 for bug-gnu-emacs@gnu.org; Wed, 23 Apr 2003 07:34:42 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 198I27-0005x8-00 for bug-gnu-emacs@gnu.org; Wed, 23 Apr 2003 07:03:24 -0400 Original-Received: from tsukuba.m17n.org ([192.47.44.130]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 198I01-0005iH-00 for bug-gnu-emacs@gnu.org; Wed, 23 Apr 2003 07:01:13 -0400 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2])h3NB14o11582; Wed, 23 Apr 2003 20:01:04 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) h3NB14A19898; Wed, 23 Apr 2003 20:01:04 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id UAA04310; Wed, 23 Apr 2003 20:01:04 +0900 (JST) Original-To: d.love@dl.ac.uk In-reply-to: (message from Dave Love on 02 Apr 2003 18:34:53 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.2.92 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) Original-cc: bug-gnu-emacs@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:4878 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:4878 Sorry for the late response on this matter. In article , Dave Love writes: > Probably the worst problem with using non-ASCII programming > identifiers is etags. It isn't aware of encoding issues and fixing > the issues is non-trivial, so this is mainly raising a flag and hoping > someone can work on it. I think sorting it out requires not only > extending the TAGS format, but probably also generating it with Emacs. > I don't have time to work on this, but here's the problem and some > ideas. What do you think about the following proposal which, I think, work in most cases without extending the TAGS format. In the case it doesn't work, we can still ask people to use C-x RET c __CODING__ RET ESC . . Of course, as you wrote, recoding coding systems in TAGS is ideal, but it requires more work just to save rare cases. Kenichi Handa writes: > In article , Karl Eichwalder writes: >> Now the next one: `tags-query-replace' does not work properly when file >> names are UTF-8 encoded. First run `etags *' on the files and then >> call `tags-query-replace'. > This is the same type of bug (but more difficult) as what I > posted to emacs-devel by the subjest "bad interaction with > C-x RET c and vc-cvs-registered". > A tag file contains file names plus parts of source code. > The former must be decoded by file-name-coding-system, but > the latter must be decoded by the coding system of each > file. It's very hard to decided a coding system for the > latter without actually reading the file. > Perhaps, a tag file must be read as raw-text (thus in a > unibyte buffer), and if one gives a non-ASCII TAGNAME to > `find-tag', it must be encoded by the > buffer-file-coding-system of the current buffer. And the reply from Richard is as follows: > That seems like a good approach. Would someone like to implement it? --- Ken'ichi HANDA handa@m17n.org