* Input of XML documents (input of unicode)
@ 2002-10-11 15:51 Torsten Bronger
2002-10-11 18:28 ` Kevin Rodgers
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Torsten Bronger @ 2002-10-11 15:51 UTC (permalink / raw)
Halloechen!
I've just compiled Emacs 21.2.1 on my Linux system. I've
used Emacs 20 for a rather long time now, and the most
important reason for me to migrate would be better
XML and unicode support. But:
1. How can I input an arbitrary Unicode Character?
I open a file in UTF-8 encode, and inout C-g
and an octal unicode number, but always the message:
"invalid character".
2. What is the most convenient way to install the XML
mode in Emacs? It isn't part of the standard
installation. (A pity.)
Tschoe,
Torsten.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 15:51 Input of XML documents (input of unicode) Torsten Bronger
@ 2002-10-11 18:28 ` Kevin Rodgers
2002-10-11 20:59 ` Jason Rumney
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Kevin Rodgers @ 2002-10-11 18:28 UTC (permalink / raw)
Torsten Bronger wrote:
> I've just compiled Emacs 21.2.1 on my Linux system. I've
> used Emacs 20 for a rather long time now, and the most
> important reason for me to migrate would be better
> XML and unicode support. But:
...
> 2. What is the most convenient way to install the XML
> mode in Emacs? It isn't part of the standard
> installation. (A pity.)
Check out the XML Authoring Environment for Emacs:
http://xae.sunsite.dk/
--
<a href="mailto:<kevinr@ihs.com>">Kevin Rodgers</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 15:51 Input of XML documents (input of unicode) Torsten Bronger
2002-10-11 18:28 ` Kevin Rodgers
@ 2002-10-11 20:59 ` Jason Rumney
2002-10-11 23:49 ` Torsten Bronger
2002-10-12 1:47 ` Charles Muller
2002-10-12 3:22 ` Jesper Harder
3 siblings, 1 reply; 11+ messages in thread
From: Jason Rumney @ 2002-10-11 20:59 UTC (permalink / raw)
Torsten Bronger <bronger@physik.rwth-aachen.de> writes:
> 1. How can I input an arbitrary Unicode Character?
> I open a file in UTF-8 encode, and inout C-g [C-q?]
> and an octal unicode number, but always the message:
> "invalid character".
Generally, opening files in a certain encoding does not change your
keyboard layout. So to tell Emacs that keyboard input is in UTF-8,
you should use set-keyboard-coding-system. I have not tried this
myself, so there may be other limitations that prevent this from
working.
Note that this will only work for arbitrary Unicode characters where
arbritary is the subset of Unicode that Emacs supports.
> 2. What is the most convenient way to install the XML
> mode in Emacs? It isn't part of the standard
> installation. (A pity.)
sgml-mode is part of the standard installation, but I assume you mean
psgml (which is difficult to set up, and dependent on external tools,
but well worth it if you do a lot of SGML or XML editing).
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 20:59 ` Jason Rumney
@ 2002-10-11 23:49 ` Torsten Bronger
2002-10-12 7:35 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Torsten Bronger @ 2002-10-11 23:49 UTC (permalink / raw)
Halloechen!
Jason Rumney wrote:
> Torsten Bronger <bronger@physik.rwth-aachen.de> writes:
>
>> 1. How can I input an arbitrary Unicode Character?
>> I open a file in UTF-8 encode, and inout C-g [C-q?]
>> and an octal unicode number, but always the message:
>> "invalid character".
>
> Generally, opening files in a certain encoding does not change your
> keyboard layout. So to tell Emacs that keyboard input is in UTF-8,
> you should use set-keyboard-coding-system. I have not tried this
> myself, so there may be other limitations that prevent this from
> working.
Unfortunately they do. I see the Greek theta, because it was
in the file before I opened it, but when I do what you suggest
and type C-q 1 7 2 1: 'Invalid Character'. Why is my keyboard
layout important here? (I can type digits on every keyboard
after all.)
Tschoe,
Torsten.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 15:51 Input of XML documents (input of unicode) Torsten Bronger
2002-10-11 18:28 ` Kevin Rodgers
2002-10-11 20:59 ` Jason Rumney
@ 2002-10-12 1:47 ` Charles Muller
2002-10-12 3:22 ` Jesper Harder
3 siblings, 0 replies; 11+ messages in thread
From: Charles Muller @ 2002-10-12 1:47 UTC (permalink / raw)
Cc: help-gnu-emacs
Torsten wrote:
> I've just compiled Emacs 21.2.1 on my Linux system. I've
> used Emacs 20 for a rather long time now, and the most
> important reason for me to migrate would be better
> XML and unicode support.
You might want to try adding the TEI-Emacs package to your present
Emacs. The TEI package was developed precisely for the purpose of adding
enhanced XML and Unicode functionality to Emacs. It includes a wide range of W3C
DTDs for XML, XHTML, and HTML, in addition to XSLT output with debugging and
an array of style sheets. It also corrects the problem of non-display of
non-Western characters encoded in UTF-8. It does not, however, provide
anything special for character input. It is also a rather large package,
which you probably don't want to try downloading through a modem.
http://www.tei-c.org/Software
The package also solves the problem with the present SGML/XML functionality
wherein XML files are mistaken as SGML, and thus there are many incorrect
validation errors when trying to validate XML.
I am doing input of many non-ASCII characters using the abbreviation
function. I created a table that contains most of the characters I need in
Windows, and imported it into my .abbrev_defs file.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 15:51 Input of XML documents (input of unicode) Torsten Bronger
` (2 preceding siblings ...)
2002-10-12 1:47 ` Charles Muller
@ 2002-10-12 3:22 ` Jesper Harder
3 siblings, 0 replies; 11+ messages in thread
From: Jesper Harder @ 2002-10-12 3:22 UTC (permalink / raw)
Torsten Bronger <bronger@physik.rwth-aachen.de> writes:
> 1. How can I input an arbitrary Unicode Character? I open a file in
> UTF-8 encode, and inout C-g and an octal unicode number, but always
> the message: "invalid character".
Dave Love's 'ucs-tables.el' has this command:
(defun ucs-insert (arg)
"Insert the Emacs character representation of the given Unicode.
Interactively, prompts for a hex string giving the code."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 23:49 ` Torsten Bronger
@ 2002-10-12 7:35 ` Eli Zaretskii
2002-10-12 10:12 ` Karl Eichwalder
[not found] ` <mailman.1034412060.19974.help-gnu-emacs@gnu.org>
2 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2002-10-12 7:35 UTC (permalink / raw)
> From: Torsten Bronger <bronger@physik.rwth-aachen.de>
> Newsgroups: gnu.emacs.help
> Date: Sat, 12 Oct 2002 01:49:14 +0200
>
> > Generally, opening files in a certain encoding does not change your
> > keyboard layout. So to tell Emacs that keyboard input is in UTF-8,
> > you should use set-keyboard-coding-system. I have not tried this
> > myself, so there may be other limitations that prevent this from
> > working.
>
> Unfortunately they do. I see the Greek theta, because it was
> in the file before I opened it, but when I do what you suggest
> and type C-q 1 7 2 1: 'Invalid Character'.
1721 is not the code of the Greek theta in the Emacs encoding, that's
why it fails. Try this instead:
M-: (insert (decode-char 'ucs #x3b8)) RET
Please read the entry in the file etc/PROBLEMS named "Problems when
using Emacs with UTF-8 locales", perhaps the advice there will help
you.
> Why is my keyboard layout important here?
Because Emacs should know how to interpret each code sent by your
keyboard. If there's a mismatch between what the keyboard sends and
what Emacs knows about that, you will get gibberish in your buffer.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-11 23:49 ` Torsten Bronger
2002-10-12 7:35 ` Eli Zaretskii
@ 2002-10-12 10:12 ` Karl Eichwalder
2002-10-12 10:19 ` Karl Eichwalder
[not found] ` <mailman.1034412060.19974.help-gnu-emacs@gnu.org>
2 siblings, 1 reply; 11+ messages in thread
From: Karl Eichwalder @ 2002-10-12 10:12 UTC (permalink / raw)
Torsten Bronger <bronger@physik.rwth-aachen.de> writes:
> I see the Greek theta, because it was in the file before I opened it,
> but when I do what you suggest and type C-q 1 7 2 1: 'Invalid
> Character'.
Eli explained the reason why.
To enter Unicode letters you can go for the "TeX input method":
C-x RET C-\ TeX RET
At this point you will see "-\" at the start of the mode line. Now
enter (without the spaces):
\ t h e t a
and Emacs will "collapse" the input to: θ (that's the theta). To
switch back to your old input method press:
C-\
Actually, with C-\ you can toggle the input methods.
--
ke@suse.de (work) / keichwa@gmx.net (home): |
http://www.suse.de/~ke/ | ,__o
Free Translation Project: | _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-12 10:12 ` Karl Eichwalder
@ 2002-10-12 10:19 ` Karl Eichwalder
0 siblings, 0 replies; 11+ messages in thread
From: Karl Eichwalder @ 2002-10-12 10:19 UTC (permalink / raw)
Karl Eichwalder <keichwa@gmx.net> writes:
> and Emacs will "collapse" the input to: θ (that's the theta).
Emacs/Gnus decided to encode it using iso-8859-7 -- but that's okay for
me. Adding an € symbol (EUR) to the same posting might cause Gnus to
go for UTF-8 automagically ;-)
[Or will it cause a multipart message?]
--
ke@suse.de (work) / keichwa@gmx.net (home): |
http://www.suse.de/~ke/ | ,__o
Free Translation Project: | _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
[not found] ` <mailman.1034412060.19974.help-gnu-emacs@gnu.org>
@ 2002-10-12 19:10 ` Stefan Monnier <foo@acm.com>
2002-10-13 5:49 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2002-10-12 19:10 UTC (permalink / raw)
>> Unfortunately they do. I see the Greek theta, because it was
>> in the file before I opened it, but when I do what you suggest
>> and type C-q 1 7 2 1: 'Invalid Character'.
> 1721 is not the code of the Greek theta in the Emacs encoding, that's
> why it fails. Try this instead:
Maybe this behavior of C-q is actually not desirable.
After all, who remembers Emacs' internal code for particular chars ?
Also in a latin-1 locale C-q 240 RET inserts a latin-1 0240
which is 04240 in Emacs' internal encoding, so I'd expect in
a utf-8 environment that C-q <octal-code-of-a-unicode-char> RET
would insert the desired unicode char.
Of course, I know where C-q's current behavior comes from and that
C-q doesn't really do the 0240 -> 04240 conversion itself (and that
it is not done according to the keyboard-coding-system), but maybe
it would make sense to change that.
Also, it would be neat to be able to say C-q x89 RET when you prefer
hex input (of course you can also set read-quoted-char-radix).
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Input of XML documents (input of unicode)
2002-10-12 19:10 ` Stefan Monnier <foo@acm.com>
@ 2002-10-13 5:49 ` Eli Zaretskii
0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2002-10-13 5:49 UTC (permalink / raw)
On 12 Oct 2002, Stefan Monnier <foo@acm.com> wrote:
> >> Unfortunately they do. I see the Greek theta, because it was
> >> in the file before I opened it, but when I do what you suggest
> >> and type C-q 1 7 2 1: 'Invalid Character'.
> > 1721 is not the code of the Greek theta in the Emacs encoding, that's
> > why it fails. Try this instead:
>
> Maybe this behavior of C-q is actually not desirable.
> After all, who remembers Emacs' internal code for particular chars ?
> Also in a latin-1 locale C-q 240 RET inserts a latin-1 0240
> which is 04240 in Emacs' internal encoding, so I'd expect in
> a utf-8 environment that C-q <octal-code-of-a-unicode-char> RET
> would insert the desired unicode char.
It should, but 1721 isn't the octal code of the Greek theta in Unicode,
either.
> Of course, I know where C-q's current behavior comes from and that
> C-q doesn't really do the 0240 -> 04240 conversion itself (and that
> it is not done according to the keyboard-coding-system), but maybe
> it would make sense to change that.
C-q <utf-8-code-of-a-unicode-character> RET should insert a Unicode
character in the UTF-8 locale. Doesn't it?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-10-13 5:49 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-11 15:51 Input of XML documents (input of unicode) Torsten Bronger
2002-10-11 18:28 ` Kevin Rodgers
2002-10-11 20:59 ` Jason Rumney
2002-10-11 23:49 ` Torsten Bronger
2002-10-12 7:35 ` Eli Zaretskii
2002-10-12 10:12 ` Karl Eichwalder
2002-10-12 10:19 ` Karl Eichwalder
[not found] ` <mailman.1034412060.19974.help-gnu-emacs@gnu.org>
2002-10-12 19:10 ` Stefan Monnier <foo@acm.com>
2002-10-13 5:49 ` Eli Zaretskii
2002-10-12 1:47 ` Charles Muller
2002-10-12 3:22 ` Jesper Harder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).