unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* How to convert char from Emacs-20 internal to UTF-8?
@ 2005-03-16 16:46 Adrian Robert
  2005-03-16 17:19 ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Adrian Robert @ 2005-03-16 16:46 UTC (permalink / raw)


Hi all,

I apologize for the "retro" question, but I was wondering if there was 
an easy way to convert a character in the Emacs-20 internal 19-bit 
encoding (from FAST_GLYPH_CHAR(glyph)) to UTF-8 (preferable) or 
straight Unicode.  I'd like to do it fully within C if possible, and it 
needs to be efficient.  I've looked into CCL a little bit and also 
found http://tclab.kaist.ac.kr/~otfried/Mule/ , but it's a bit tough 
getting started with this stuff and since I just want a single special 
case I was wondering if anyone knew a handy invocation, or an 
alternative to CCL.

thanks,
Adrian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-16 16:46 How to convert char from Emacs-20 internal to UTF-8? Adrian Robert
@ 2005-03-16 17:19 ` Stefan Monnier
  2005-03-16 19:19   ` Adrian Robert
  2005-03-22 17:30   ` Adrian Robert
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Monnier @ 2005-03-16 17:19 UTC (permalink / raw)
  Cc: emacs-devel

> I apologize for the "retro" question, but I was wondering if there was an
> easy way to convert a character in the Emacs-20 internal 19-bit encoding
> (from FAST_GLYPH_CHAR(glyph)) to UTF-8 (preferable) or straight Unicode.
> I'd like to do it fully within C if possible, and it needs to be efficient.

Since Emacs-21's internal chars are a superset of Emacs-20's internal chars,
you can just use Emacs-21's facilities like (encode-coding-string <str>
'utf-8) or (encode-char <char> 'ucs).


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-16 17:19 ` Stefan Monnier
@ 2005-03-16 19:19   ` Adrian Robert
  2005-03-16 19:38     ` Stefan Monnier
  2005-03-17  3:20     ` Adrian Robert
  2005-03-22 17:30   ` Adrian Robert
  1 sibling, 2 replies; 7+ messages in thread
From: Adrian Robert @ 2005-03-16 19:19 UTC (permalink / raw)
  Cc: emacs-devel


On Mar 16, 2005, at 12:19 PM, Stefan Monnier wrote:

>> I apologize for the "retro" question, but I was wondering if there 
>> was an
>> easy way to convert a character in the Emacs-20 internal 19-bit 
>> encoding
>> (from FAST_GLYPH_CHAR(glyph)) to UTF-8 (preferable) or straight 
>> Unicode.
>> I'd like to do it fully within C if possible, and it needs to be 
>> efficient.
>
> Since Emacs-21's internal chars are a superset of Emacs-20's internal 
> chars,
> you can just use Emacs-21's facilities like (encode-coding-string <str>
> 'utf-8) or (encode-char <char> 'ucs).

Thanks.  Are the encode-coding-string and encode-char functions a) fast 
enough to be used inside of dumpglyphs() for screen rendering and b) 
something I can easily lift out of 21 and backport to 20?  I'm asking 
because this is for a GNUstep/OS X interface for emacs 20 
(http://emacs-on-aqua.sf.net/), and while we'd like to bring it up to 
date to work with the coming emacs-22 (where I assume this problem 
disappears completely), this is a large job and we'd like to have 
support for 2-byte font and other i18n rendering in the meantime.  (In 
the OpenStep APIs, conversions to native font encoding are handled 
internally, so we don't need all of CCL's generality, but we need to 
get characters in UTF-8 or unicode to give to the APIs in the first 
place.)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-16 19:19   ` Adrian Robert
@ 2005-03-16 19:38     ` Stefan Monnier
  2005-03-17  3:20     ` Adrian Robert
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2005-03-16 19:38 UTC (permalink / raw)
  Cc: emacs-devel

> Thanks.  Are the encode-coding-string and encode-char functions a) fast
> enough to be used inside of dumpglyphs() for screen rendering and b)
> something I can easily lift out of 21 and backport to 20?  I'm asking
> because this is for a GNUstep/OS X interface for emacs 20
> (http://emacs-on-aqua.sf.net/), and while we'd like to bring it up to date
> to work with the coming emacs-22 (where I assume this problem disappears
> completely), this is a large job and we'd like to have support for 2-byte
> font and other i18n rendering in the meantime.  (In the OpenStep APIs,
> conversions to native font encoding are handled internally, so we don't need
> all of CCL's generality, but we need to get characters in UTF-8 or unicode
> to give to the APIs in the first place.)

The trunk of the CVS repository (which will becomes Emacs-22) already
supports OS X (via Carbon).

If that doesn't help you because you want to use some other API, I recommend
you start from the emacs-unicode-2 branch in the CVS repository (which may
become Emacs-23).  That branch changes the internal character set of Emacs
to Unicode, so you won't need to convert chars at all.

But work on Emacs-20 so obviously counter productive to me that I suspect
I just don't understand your motivations enough to give you a good response,


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-16 19:19   ` Adrian Robert
  2005-03-16 19:38     ` Stefan Monnier
@ 2005-03-17  3:20     ` Adrian Robert
  1 sibling, 0 replies; 7+ messages in thread
From: Adrian Robert @ 2005-03-17  3:20 UTC (permalink / raw)
  Cc: emacs-devel

> The trunk of the CVS repository (which will becomes Emacs-22)
> already supports OS X (via Carbon).
>
> If that doesn't help you because you want to use some other
> API, I recommend you start from the emacs-unicode-2 branch in
> the CVS repository (which may become Emacs-23).  That branch
> changes the internal character set of Emacs to Unicode, so you
> won't need to convert chars at all.
>
> But work on Emacs-20 so obviously counter productive to me that
> I suspect I just don't understand your motivations enough to
> give you a good response,

I'm sorry, perhaps I should have explained a bit more at
first.. desire to avoid a long-winded email has led to 3
medium-winded ones.  ;)

http://emacs-on-aqua.sf.net is a project to resurrect the old
NeXTstep port of Emacs, that has actually been maintained to some
extent over the years, until the point where it was based on
Emacs 20.7 and partially working on OS X.  (The Cocoa APIs are
based on OpenStep, the successor of NeXTstep).  We are interested
in getting this running well on both OS X and GNUstep, an
open-source implementation of OpenStep, and in bringing it up to
be based on the latest Emacs.

Rather than trying to do both of these at once, we decided to
bring the port up to date first on Emacs 20.7, since the code has
gotten a little rusty over the years.  This has been going well,
but one of the remaining tasks is 2-byte font rendering, which
led to my query.  We think we can handle it as long as we deliver
UTF-8 (or possibly plain unicode) to our renderer.  But we need
to try to be sure, and then will come the task of getting
fontsets working..

So, in our code we have a function ns_dumpglyphs() which plays
the role of dumplglyphs() in xterm.c: an array of GLYPHS is
received, the character info is extracted using
GLYPH_FAST_CHAR(), along with other face-related info, and the
text is rendered.  In our case, we need to convert the result of
GLYPH_FAST_CHAR() into UTF-8 or unicode.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-16 17:19 ` Stefan Monnier
  2005-03-16 19:19   ` Adrian Robert
@ 2005-03-22 17:30   ` Adrian Robert
  2005-03-23  4:52     ` Miles Bader
  1 sibling, 1 reply; 7+ messages in thread
From: Adrian Robert @ 2005-03-22 17:30 UTC (permalink / raw)
  Cc: emacs-devel


On Mar 16, 2005, at 12:19 PM, Stefan Monnier wrote:

>> I apologize for the "retro" question, but I was wondering if there 
>> was an
>> easy way to convert a character in the Emacs-20 internal 19-bit 
>> encoding
>> (from FAST_GLYPH_CHAR(glyph)) to UTF-8 (preferable) or straight 
>> Unicode.
>> I'd like to do it fully within C if possible, and it needs to be 
>> efficient.

I found a way to do this using parts of the C program available at:

http://tclab.kaist.ac.kr/~otfried/Mule/

Basically it uses a large table to convert from charset/byte1/byte2 to 
unicode then UTF-8.  I call SPLIT_NON_ASCII_CHAR() to get that info out 
of the 19-bit internal representation stored in the glyph.  CCL was not 
needed, though maybe it would have provided a more compact way to solve 
the problem than a 250K table.

However, I still have an issue: for 2-byte characters, such as Big5 or 
JIS Chinese characters, emacs (20) is giving me two glyphs for each 
character, with identical values.    Does this have something to do 
with it thinking the font needs a double wide horizontal space to 
render the character?

thanks,
Adrian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to convert char from Emacs-20 internal to UTF-8?
  2005-03-22 17:30   ` Adrian Robert
@ 2005-03-23  4:52     ` Miles Bader
  0 siblings, 0 replies; 7+ messages in thread
From: Miles Bader @ 2005-03-23  4:52 UTC (permalink / raw)
  Cc: Stefan Monnier, emacs-devel

It seems like you're going to an awful lot of trouble to get something
to work which will already be obsolete when you're finished...  Are
you really confident that doing it this way is less work than simply
starting with a modern version of Emacs and porting over the platform
specific bits you want to preserve?

-Miles

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-03-23  4:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-16 16:46 How to convert char from Emacs-20 internal to UTF-8? Adrian Robert
2005-03-16 17:19 ` Stefan Monnier
2005-03-16 19:19   ` Adrian Robert
2005-03-16 19:38     ` Stefan Monnier
2005-03-17  3:20     ` Adrian Robert
2005-03-22 17:30   ` Adrian Robert
2005-03-23  4:52     ` Miles Bader

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).