unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* non-ASCII TAGS
@ 2003-04-02 17:34 Dave Love
  2003-04-23 11:01 ` Kenichi Handa
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Love @ 2003-04-02 17:34 UTC (permalink / raw)


Probably the worst problem with using non-ASCII programming
identifiers is etags.  It isn't aware of encoding issues and fixing
the issues is non-trivial, so this is mainly raising a flag and hoping
someone can work on it.  I think sorting it out requires not only
extending the TAGS format, but probably also generating it with Emacs.
I don't have time to work on this, but here's the problem and some
ideas.

In general, different files in TAGS could have different encodings --
as is actually the case for Emacs, but the tags are all ASCII -- and
file names encoded inappropriately for the locale in which it's used.
I think it's reasonable to assume that the locale in which it's used
is the same as the one in which it's generated, i.e. the file names
are always in `file-name-coding-system', though it wouldn't harm to
record the locale information and act on it.  However, the file
content encodings may well be different from the locale coding system
which determines the encoding of their names, e.g. utf-8 code
processed in a Latin-N locale.

Thus it's a question of labelling the section for each file with a
coding system corresponding to how Emacs would read the source file
(accounting for coding cookies &c).  This can all be decoded
appropriately with a bit of effort, and searches in the result should
work.

I think the TAGS files have to be generated with Emacs, since making
etags.c multilingual doesn't seem realistic.  A Lisp version should be
efficient enough and it would have the advantage that tags, imenu and
font-lock might work from the same set of patterns.  It could be used
in makefiles by running Emacs in batch, obviously.  It will be a
significant amount of work, though, and I guess dropping etags.c isn't
reasonable, so two programs would have to be maintained in parallel.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-04-23 11:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-02 17:34 non-ASCII TAGS Dave Love
2003-04-23 11:01 ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).