non-ASCII TAGS

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* non-ASCII TAGS
@ 2003-04-02 17:34 Dave Love
  2003-04-23 11:01 ` Kenichi Handa
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Love @ 2003-04-02 17:34 UTC (permalink / raw)


Probably the worst problem with using non-ASCII programming
identifiers is etags.  It isn't aware of encoding issues and fixing
the issues is non-trivial, so this is mainly raising a flag and hoping
someone can work on it.  I think sorting it out requires not only
extending the TAGS format, but probably also generating it with Emacs.
I don't have time to work on this, but here's the problem and some
ideas.

In general, different files in TAGS could have different encodings --
as is actually the case for Emacs, but the tags are all ASCII -- and
file names encoded inappropriately for the locale in which it's used.
I think it's reasonable to assume that the locale in which it's used
is the same as the one in which it's generated, i.e. the file names
are always in `file-name-coding-system', though it wouldn't harm to
record the locale information and act on it.  However, the file
content encodings may well be different from the locale coding system
which determines the encoding of their names, e.g. utf-8 code
processed in a Latin-N locale.

Thus it's a question of labelling the section for each file with a
coding system corresponding to how Emacs would read the source file
(accounting for coding cookies &c).  This can all be decoded
appropriately with a bit of effort, and searches in the result should
work.

I think the TAGS files have to be generated with Emacs, since making
etags.c multilingual doesn't seem realistic.  A Lisp version should be
efficient enough and it would have the advantage that tags, imenu and
font-lock might work from the same set of patterns.  It could be used
in makefiles by running Emacs in batch, obviously.  It will be a
significant amount of work, though, and I guess dropping etags.c isn't
reasonable, so two programs would have to be maintained in parallel.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: non-ASCII TAGS
  2003-04-02 17:34 non-ASCII TAGS Dave Love
@ 2003-04-23 11:01 ` Kenichi Handa
  0 siblings, 0 replies; 2+ messages in thread
From: Kenichi Handa @ 2003-04-23 11:01 UTC (permalink / raw)
  Cc: handa

Sorry for the late response on this matter.

In article <rzqu1dgiz0i.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Probably the worst problem with using non-ASCII programming
> identifiers is etags.  It isn't aware of encoding issues and fixing
> the issues is non-trivial, so this is mainly raising a flag and hoping
> someone can work on it.  I think sorting it out requires not only
> extending the TAGS format, but probably also generating it with Emacs.
> I don't have time to work on this, but here's the problem and some
> ideas.

What do you think about the following proposal which, I
think, work in most cases without extending the TAGS format.
In the case it doesn't work, we can still ask people to use
C-x RET c __CODING__ RET ESC . .

Of course, as you wrote, recoding coding systems in TAGS is
ideal, but it requires more work just to save rare cases.

Kenichi Handa <handa@m17n.org> writes:
> In article <shpto57wa4.fsf_-_@tux.gnu.franken.de>, Karl Eichwalder <keichwa@gmx.net> writes:
>>  Now the next one: `tags-query-replace' does not work properly when file
>>  names are UTF-8 encoded.  First run `etags *' on the files and then
>>  call `tags-query-replace'.

> This is the same type of bug (but more difficult) as what I
> posted to emacs-devel by the subjest "bad interaction with
> C-x RET c and vc-cvs-registered".

> A tag file contains file names plus parts of source code.
> The former must be decoded by file-name-coding-system, but
> the latter must be decoded by the coding system of each
> file.  It's very hard to decided a coding system for the
> latter without actually reading the file.

> Perhaps, a tag file must be read as raw-text (thus in a
> unibyte buffer), and if one gives a non-ASCII TAGNAME to
> `find-tag', it must be encoded by the
> buffer-file-coding-system of the current buffer.

And the reply from Richard is as follows:

> That seems like a good approach.  Would someone like to implement it?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-04-23 11:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-02 17:34 non-ASCII TAGS Dave Love
2003-04-23 11:01 ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).