* coding systems vs. info files
@ 2006-02-09 1:58 Miles Bader
2006-02-09 14:54 ` Jesper Harder
2006-02-09 17:28 ` Juri Linkov
0 siblings, 2 replies; 13+ messages in thread
From: Miles Bader @ 2006-02-09 1:58 UTC (permalink / raw)
I noticed the info node "efaq" on my system has what are apparently
non-ascii characters in it, but just appear as gibberish (octal escapes)
in info mode, and even visiting the underlying info file not in info
mode, emacs doesn't seem to recognize the coding.
However, if I (1) visit the info file (.../info/efaq) using
`find-file-literally', (2) cut the first page or two of it, (3) save
that in a temporary file, and then (4) visit the temporary file, emacs
recognizes the coding (it's UTF-8)!
So what I can gather from this is:
(a) Makeinfo apparently generated non-English strings when it made the
info file, and used utf-8 to encode it. My LANG environment
variable is "ja_JP.utf-8", which is probably why.
(b) It did not put any "coding:" tags in the file to reflect this encoding.
(c) Emacs did not recognize the coding, even though my LANG and
language-environment are setup to make it easy for it to do so --
I guess maybe that's because Emacs decoding gets confused by
various magic characters in the info file (like ^_ -- although the
temporary file that Emacs _did_ decode successfully includes at
least one ^_ character from the original info file).
So, my question is, should any of the above be happening?
(a) seems a bit silly -- the strings which get encoded in Japanese/UTF-8
are a just few random boilerplate things, and given that the actual
content of the file is in English, it's kind of inconsistent, and not
terribly useful for Japanese speakers. It seems like makeinfo should
ignore LANG for the most part, and just use the language the texinfo
file was written in as the language for any makeinfo-produced text
(though maybe it could look at LANG to fine-tune the final encoding).
As for (b), it seems like makeinfo should probably add a coding: tag to
reflect whatever decision it makes. If it does that, then it solves (c).
Thanks,
-Miles
--
97% of everything is grunge
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: coding systems vs. info files
2006-02-09 1:58 coding systems vs. info files Miles Bader
@ 2006-02-09 14:54 ` Jesper Harder
2006-02-09 19:14 ` Juri Linkov
2006-02-09 17:28 ` Juri Linkov
1 sibling, 1 reply; 13+ messages in thread
From: Jesper Harder @ 2006-02-09 14:54 UTC (permalink / raw)
Cc: emacs-devel
Miles Bader <miles.bader@necel.com> writes:
> I noticed the info node "efaq" on my system has what are apparently
>
> (b) It did not put any "coding:" tags in the file to reflect this encoding.
It does with this option:
`--enable-encoding'
Output accented and special characters in Info or plain text output
based on `@documentencoding'. *Note `documentencoding':
documentencoding, and *Note Inserting Accents::.
_______________________________________________
Texinfo home page: http://www.gnu.org/software/texinfo
help-texinfo@gnu.org
http://lists.gnu.org/mailman/listinfo/help-texinfo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: coding systems vs. info files
2006-02-09 1:58 coding systems vs. info files Miles Bader
2006-02-09 14:54 ` Jesper Harder
@ 2006-02-09 17:28 ` Juri Linkov
1 sibling, 0 replies; 13+ messages in thread
From: Juri Linkov @ 2006-02-09 17:28 UTC (permalink / raw)
Cc: help-texinfo, emacs-devel
> So, my question is, should any of the above be happening?
The problem is due to Latin-1 characters in the author's name in the node
(info "(efaq)Emacs for Atari ST"). The source file man/faq.texi
contains -*- coding: latin-1; -*- so Emacs handles the source file
correctly. But makeinfo doesn't copy this cookie to the info file,
so Emacs Info reader recognizes its coding differently on different
language environments.
> (a) seems a bit silly -- the strings which get encoded in Japanese/UTF-8
> are a just few random boilerplate things, and given that the actual
> content of the file is in English, it's kind of inconsistent, and not
> terribly useful for Japanese speakers. It seems like makeinfo should
> ignore LANG for the most part, and just use the language the texinfo
> file was written in as the language for any makeinfo-produced text
> (though maybe it could look at LANG to fine-tune the final encoding).
>
> As for (b), it seems like makeinfo should probably add a coding: tag to
> reflect whatever decision it makes. If it does that, then it solves (c).
I believe this is exactly what makeinfo should do.
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: coding systems vs. info files
2006-02-09 14:54 ` Jesper Harder
@ 2006-02-09 19:14 ` Juri Linkov
2006-02-11 0:13 ` [help-texinfo] " Karl Berry
0 siblings, 1 reply; 13+ messages in thread
From: Juri Linkov @ 2006-02-09 19:14 UTC (permalink / raw)
Cc: help-texinfo
>> I noticed the info node "efaq" on my system has what are apparently
>>
>> (b) It did not put any "coding:" tags in the file to reflect this encoding.
>
> It does with this option:
>
> `--enable-encoding'
> Output accented and special characters in Info or plain text output
> based on `@documentencoding'. *Note `documentencoding':
> documentencoding, and *Note Inserting Accents::.
This is a nice feature, but it seems it is not tested enough. It adds
the `Local Variables' section too far from the end of the Info file,
so Emacs can't find it.
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-09 19:14 ` Juri Linkov
@ 2006-02-11 0:13 ` Karl Berry
2006-02-11 1:11 ` Juri Linkov
2006-02-11 22:03 ` Miles Bader
0 siblings, 2 replies; 13+ messages in thread
From: Karl Berry @ 2006-02-11 0:13 UTC (permalink / raw)
Cc: help-texinfo, emacs-devel
the `Local Variables' section too far from the end of the Info file,
so Emacs can't find it.
Because the tag table is huge, I would guess? Or is there some other
reason? Can someone point me to the problematic Texinfo file?
It seems like makeinfo should
ignore LANG for the most part,
Right, it should not use LANG for output (this is nothing new), but
neither I nor anyone has implemented it yet. (Bruno informed me it is
not so easy to switch languages in gettext, unfortunately, which is what
it boils down to.) Meanwhile, using LANG for output means that at least
it is possible to get those boilerplate fragments in (say) French for a
French document.
and just use the language the texinfo
BTW, Texinfo has @documentlanguage and @documentencoding commands. It
doesn't try to "guess" anything. The coding: tag reflects the
@documentencoding. If there is no @documentencoding, there will be no
coding: tag. I don't think it has any way of knowing what encoding the
translation for language XX used.
karl
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-11 0:13 ` [help-texinfo] " Karl Berry
@ 2006-02-11 1:11 ` Juri Linkov
2006-02-12 1:40 ` Karl Berry
2006-02-12 21:44 ` Kevin Ryde
2006-02-11 22:03 ` Miles Bader
1 sibling, 2 replies; 13+ messages in thread
From: Juri Linkov @ 2006-02-11 1:11 UTC (permalink / raw)
Cc: help-texinfo, emacs-devel
> the `Local Variables' section too far from the end of the Info file,
> so Emacs can't find it.
>
> Because the tag table is huge, I would guess? Or is there some
> other reason?
Yes, the reason is that makeinfo adds the `Local Variables' section
(with the `coding:' tag) before the Info tag table, and on file reading
Emacs looks for this sections only within 3000 characters from the end of
the file.
> Can someone point me to the problematic Texinfo file?
The file in question is the Emacs FAQ from Emacs CVS. Its tag table
is not too huge, but still larger than the 3000 limit.
> BTW, Texinfo has @documentlanguage and @documentencoding commands. It
> doesn't try to "guess" anything. The coding: tag reflects the
> @documentencoding. If there is no @documentencoding, there will be no
> coding: tag. I don't think it has any way of knowing what encoding the
> translation for language XX used.
If there is no @documentencoding, makeinfo could look for the `coding:'
tag in the first lines or in the `Local Variables' section of the
source Texinfo file, and copy it to the Info output file. For instance,
the Emacs FAQ source file faq.texi has the `coding:' tag, but no
@documentencoding, and makeinfo doesn't try to read it.
However, this is a minor problem because it is easy to add
@documentencoding to the source Texinfo file. The main problem is that
given the @documentencoding in the source file, makeinfo writes the
`coding:' tag before the tag table in the Info output file, and Emacs
can't find it.
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-11 0:13 ` [help-texinfo] " Karl Berry
2006-02-11 1:11 ` Juri Linkov
@ 2006-02-11 22:03 ` Miles Bader
2006-02-12 4:28 ` Eli Zaretskii
1 sibling, 1 reply; 13+ messages in thread
From: Miles Bader @ 2006-02-11 22:03 UTC (permalink / raw)
Cc: help-texinfo, emacs-devel
karl@freefriends.org (Karl Berry) writes:
> Right, it should not use LANG for output (this is nothing new), but
> neither I nor anyone has implemented it yet. (Bruno informed me it is
> not so easy to switch languages in gettext, unfortunately, which is what
> it boils down to.)
Hmm, I thought there was a function "dgettext" to allow explicitly
specifying the language for a given lookup.
-Miles
--
"An atheist doesn't have to be someone who thinks he has a proof that there
can't be a god. He only has to be someone who believes that the evidence
on the God question is at a similar level to the evidence on the werewolf
question." [John McCarthy]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-11 1:11 ` Juri Linkov
@ 2006-02-12 1:40 ` Karl Berry
2006-02-12 10:01 ` Andreas Schwab
2006-02-12 21:44 ` Kevin Ryde
1 sibling, 1 reply; 13+ messages in thread
From: Karl Berry @ 2006-02-12 1:40 UTC (permalink / raw)
Cc: help-texinfo, emacs-devel
The main problem is that
given the @documentencoding in the source file, makeinfo writes the
`coding:' tag before the tag table in the Info output file, and Emacs
can't find it.
I installed a patch to CVS Texinfo to write the coding: variable last.
Hope it doesn't mess anything else up.
Of course, faq.texi doesn't have any @documentencoding line so all this
work is not immediately relevant. It shouldn't need one, either. I
strongly suggest replacing the one non-ASCII character, as in:
Roland Sch@"auble
(I wonder if his name was really Roland Sch@"uble, Sch@"auble looks
unusual to me, but what do I know. Back in 18.58 for the Atari ST ...)
Miles' problem of Japanese ending up in the output due to his LANG
variable isn't immediately solvable.
Thanks,
karl
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-11 22:03 ` Miles Bader
@ 2006-02-12 4:28 ` Eli Zaretskii
0 siblings, 0 replies; 13+ messages in thread
From: Eli Zaretskii @ 2006-02-12 4:28 UTC (permalink / raw)
Cc: emacs-devel, help-texinfo, karl
> From: Miles Bader <miles@gnu.org>
> Date: Sun, 12 Feb 2006 07:03:10 +0900
> Cc: help-texinfo@gnu.org, emacs-devel@gnu.org
>
> karl@freefriends.org (Karl Berry) writes:
> > Right, it should not use LANG for output (this is nothing new), but
> > neither I nor anyone has implemented it yet. (Bruno informed me it is
> > not so easy to switch languages in gettext, unfortunately, which is what
> > it boils down to.)
>
> Hmm, I thought there was a function "dgettext" to allow explicitly
> specifying the language for a given lookup.
The problem is that makeinfo also has its messages to the user which
should be displayed in the current locale's language. So it needs to
change the language back and forth, depending on the output stream.
And that is something gettext doesn't like too much.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-12 1:40 ` Karl Berry
@ 2006-02-12 10:01 ` Andreas Schwab
0 siblings, 0 replies; 13+ messages in thread
From: Andreas Schwab @ 2006-02-12 10:01 UTC (permalink / raw)
Cc: juri, help-texinfo, emacs-devel
karl@freefriends.org (Karl Berry) writes:
> (I wonder if his name was really Roland Sch@"uble, Sch@"auble looks
> unusual to me, but what do I know. Back in 18.58 for the Atari ST ...)
Roland Schäuble is correct, see for example
<http://www.stcarchiv.de/stc1993/12_gnushell.php>. Schäuble is a rather
well known name here in Germany, being the name of a prominent politician.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-11 1:11 ` Juri Linkov
2006-02-12 1:40 ` Karl Berry
@ 2006-02-12 21:44 ` Kevin Ryde
2006-02-13 17:48 ` Juri Linkov
1 sibling, 1 reply; 13+ messages in thread
From: Kevin Ryde @ 2006-02-12 21:44 UTC (permalink / raw)
Juri Linkov <juri@jurta.org> writes:
>
> it is easy to add @documentencoding to the source Texinfo file.
I suppose it'd be nice if emacs looked for that to know the file
coding. Another entry for `auto-coding-functions' I guess.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-12 21:44 ` Kevin Ryde
@ 2006-02-13 17:48 ` Juri Linkov
2006-02-13 22:07 ` Kevin Ryde
0 siblings, 1 reply; 13+ messages in thread
From: Juri Linkov @ 2006-02-13 17:48 UTC (permalink / raw)
Cc: emacs-devel
>> it is easy to add @documentencoding to the source Texinfo file.
>
> I suppose it'd be nice if emacs looked for that to know the file
> coding. Another entry for `auto-coding-functions' I guess.
It would be useless to scan every visited file for @documentencoding
after adding a function to find @documentencoding to the default value
of `auto-coding-functions'.
Other similar modes use the variable `file-coding-system-alist',
with an entry like
("\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'" . latexenc-find-file-coding-system)
But this is not ideal either. Ideally, the coding-guessing function
should be mode-dependent, not filename-dependent. But it seems currently
it's not possible to do this, and this should wait until the next release.
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [help-texinfo] Re: coding systems vs. info files
2006-02-13 17:48 ` Juri Linkov
@ 2006-02-13 22:07 ` Kevin Ryde
0 siblings, 0 replies; 13+ messages in thread
From: Kevin Ryde @ 2006-02-13 22:07 UTC (permalink / raw)
Juri Linkov <juri@jurta.org> writes:
>
> Other similar modes use the variable `file-coding-system-alist',
> with an entry like
>
> ("\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'" . latexenc-find-file-coding-system)
This came up before ... I couldn't understand what auto-coding-functions
is for if not to determine coding from file contents.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-02-13 22:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-09 1:58 coding systems vs. info files Miles Bader
2006-02-09 14:54 ` Jesper Harder
2006-02-09 19:14 ` Juri Linkov
2006-02-11 0:13 ` [help-texinfo] " Karl Berry
2006-02-11 1:11 ` Juri Linkov
2006-02-12 1:40 ` Karl Berry
2006-02-12 10:01 ` Andreas Schwab
2006-02-12 21:44 ` Kevin Ryde
2006-02-13 17:48 ` Juri Linkov
2006-02-13 22:07 ` Kevin Ryde
2006-02-11 22:03 ` Miles Bader
2006-02-12 4:28 ` Eli Zaretskii
2006-02-09 17:28 ` Juri Linkov
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).