From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Dave Love Newsgroups: gmane.emacs.bugs Subject: non-ASCII TAGS Date: 02 Apr 2003 18:34:53 +0100 Sender: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1049305310 26331 80.91.224.249 (2 Apr 2003 17:41:50 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 2 Apr 2003 17:41:50 +0000 (UTC) Original-X-From: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Wed Apr 02 19:41:47 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 190mF9-0006q8-00 for ; Wed, 02 Apr 2003 19:41:47 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 190mE7-0007bl-08 for gnu-bug-gnu-emacs@m.gmane.org; Wed, 02 Apr 2003 12:40:43 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 190mBc-0006CR-00 for bug-gnu-emacs@gnu.org; Wed, 02 Apr 2003 12:38:08 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 190mAi-00054z-00 for bug-gnu-emacs@gnu.org; Wed, 02 Apr 2003 12:37:14 -0500 Original-Received: from albion.dl.ac.uk ([148.79.80.39]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 190m8U-0003Xy-00 for bug-gnu-emacs@gnu.org; Wed, 02 Apr 2003 12:34:54 -0500 Original-Received: from fx by albion.dl.ac.uk with local (Exim 3.36 #1 (Debian)) id 190m8T-0001sT-00 for ; Wed, 02 Apr 2003 18:34:53 +0100 Original-To: bug-gnu-emacs@gnu.org Original-Lines: 32 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: bug-gnu-emacs-bounces+gnu-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:4699 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:4699 Probably the worst problem with using non-ASCII programming identifiers is etags. It isn't aware of encoding issues and fixing the issues is non-trivial, so this is mainly raising a flag and hoping someone can work on it. I think sorting it out requires not only extending the TAGS format, but probably also generating it with Emacs. I don't have time to work on this, but here's the problem and some ideas. In general, different files in TAGS could have different encodings -- as is actually the case for Emacs, but the tags are all ASCII -- and file names encoded inappropriately for the locale in which it's used. I think it's reasonable to assume that the locale in which it's used is the same as the one in which it's generated, i.e. the file names are always in `file-name-coding-system', though it wouldn't harm to record the locale information and act on it. However, the file content encodings may well be different from the locale coding system which determines the encoding of their names, e.g. utf-8 code processed in a Latin-N locale. Thus it's a question of labelling the section for each file with a coding system corresponding to how Emacs would read the source file (accounting for coding cookies &c). This can all be decoded appropriately with a bit of effort, and searches in the result should work. I think the TAGS files have to be generated with Emacs, since making etags.c multilingual doesn't seem realistic. A Lisp version should be efficient enough and it would have the advantage that tags, imenu and font-lock might work from the same set of patterns. It could be used in makefiles by running Emacs in batch, obviously. It will be a significant amount of work, though, and I guess dropping etags.c isn't reasonable, so two programs would have to be maintained in parallel.