unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Probably dumb question: glyph rendering on unicode-2 branch
@ 2005-10-17 13:46 Adrian Robert
  2005-10-24 14:43 ` Adrian Robert
  0 siblings, 1 reply; 4+ messages in thread
From: Adrian Robert @ 2005-10-17 13:46 UTC (permalink / raw)


Hi,

I apologize if this is a dumb question, but I've been looking through  
the code and can't figure this one out: on the unicode-2 branch, if a  
font specifies "iso-10646-1" for XLFD registry/encoding (and then  
fontset.c sets 'charset' accordingly), what exactly is getting passed  
in struct glyph_string.char2b to x_draw_glyph_string()?  Not UTF-8,  
since it's just 2 bytes.  UCS-2?  UTF-16?  Don't these exclude a lot  
of unicode characters?  Is that what the "composition" machinery is  
for?  (But I thought that had to do with the script itself involving  
composition, like Arabic or Korean Hangul..)

Does emacs provide any internal facility to get UTF-8?

Also, what (encoding) is in glyph.u.ch?  Is that UCS-4?  UTF-32?

thanks,
Adrian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Probably dumb question: glyph rendering on unicode-2 branch
  2005-10-17 13:46 Probably dumb question: glyph rendering on unicode-2 branch Adrian Robert
@ 2005-10-24 14:43 ` Adrian Robert
  2005-10-25  1:33   ` Kenichi Handa
  0 siblings, 1 reply; 4+ messages in thread
From: Adrian Robert @ 2005-10-24 14:43 UTC (permalink / raw)


Hi,

I didn't get any response to the below, let me try asking it in a  
different way:

unicode-2 branch:
   dispextern.h:

     struct glyph {
     ...
         /* Character code for character glyphs (type ==  
CHAR_GLYPH).  */
         unsigned ch;
     ...
     }
     ...
     struct glyph_string {
     ...
     /* Characters to be drawn, and number of characters.  */
     XChar2b *char2b;
     int nchars;
     ...
     }

   {x,mac,w32}term.c:

     x_encode_char(int c, XChar2b *char2b, ...)
     {
     ...
     }

     x_draw_glyph_string(struct glyph_string *s)
     {
     ...
     }

Questions:

1) Is 'int c' passed to x_encode_char() the same as 'unsigned ch' in  
struct glpyh?

2) In either case, what are they -- UCS-2?  UTF-16?  MULE?  UCS-4?   
UTF-32?  What is the byte ordering?

I'll be happy to RTFM if this is documented anywhere..

thanks,
Adrian




On Oct 17, 2005, at 9:46 AM, Adrian Robert wrote:

> Hi,
>
> I apologize if this is a dumb question, but I've been looking  
> through the code and can't figure this one out: on the unicode-2  
> branch, if a font specifies "iso-10646-1" for XLFD registry/ 
> encoding (and then fontset.c sets 'charset' accordingly), what  
> exactly is getting passed in struct glyph_string.char2b to  
> x_draw_glyph_string()?  Not UTF-8, since it's just 2 bytes.   
> UCS-2?  UTF-16?  Don't these exclude a lot of unicode characters?   
> Is that what the "composition" machinery is for?  (But I thought  
> that had to do with the script itself involving composition, like  
> Arabic or Korean Hangul..)
>
> Does emacs provide any internal facility to get UTF-8?
>
> Also, what (encoding) is in glyph.u.ch?  Is that UCS-4?  UTF-32?
>
> thanks,
> Adrian
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Probably dumb question: glyph rendering on unicode-2 branch
  2005-10-24 14:43 ` Adrian Robert
@ 2005-10-25  1:33   ` Kenichi Handa
  2005-10-25  3:24     ` Adrian Robert
  0 siblings, 1 reply; 4+ messages in thread
From: Kenichi Handa @ 2005-10-25  1:33 UTC (permalink / raw)
  Cc: emacs-devel

In article <BD5A24D1-F3BD-4AC9-8762-8E4917C83D2E@cogsci.ucsd.edu>, Adrian Robert <arobert@cogsci.ucsd.edu> writes:
> I didn't get any response to the below, let me try asking it in a  
> different way:

Sorry for not responding on this matter.  It seems that I
missed your original mail.

> unicode-2 branch:
>    dispextern.h:

>      struct glyph {
>      ...
>          /* Character code for character glyphs (type ==  
> CHAR_GLYPH).  */
>          unsigned ch;
>      ...
>      }
>      ...
>      struct glyph_string {
>      ...
>      /* Characters to be drawn, and number of characters.  */
>      XChar2b *char2b;
>      int nchars;
>      ...
>      }

>    {x,mac,w32}term.c:

>      x_encode_char(int c, XChar2b *char2b, ...)
>      {
>      ...
>      }

>      x_draw_glyph_string(struct glyph_string *s)
>      {
>      ...
>      }

> Questions:

> 1) Is 'int c' passed to x_encode_char() the same as 'unsigned ch' in  
> struct glpyh?

Mostly yes.  The exception is in the case that x_encode_char
is called on an element of composition glyph.  In that case,
x_encode_char is called from get_char_face_and_encoding
which is called from BUILD_COMPOSITE_GLYPH_STRING macro on
each element of a composition glyph.

> 2) In either case, what are they -- UCS-2?  UTF-16?  MULE?  UCS-4?   
> UTF-32?  What is the byte ordering?

It is a character code used in Emacs.  The value range is
0x0..0x3FFFFF.  Among them, 0x0..0x10FFFF are exactly the
same as Unicode characters.  I think it's nonsense to ask
"byte ordering" of (int).  That's depends on your hardware
architecture.

> I'll be happy to RTFM if this is documented anywhere..

The file src/character.h contains some documentation about
character code.

>>  I apologize if this is a dumb question, but I've been looking  
>>  through the code and can't figure this one out: on the unicode-2  
>>  branch, if a font specifies "iso-10646-1" for XLFD registry/ 
>>  encoding (and then fontset.c sets 'charset' accordingly), what  
>>  exactly is getting passed in struct glyph_string.char2b to  
>>  x_draw_glyph_string()?

If a font has CHARSET_REGISTRY "iso10646" and
CHARSET_ENCODING "1", the font contains only BMP characters.
Emacs-unicode uses such a font only for BMP characters.


>>  Not UTF-8, since it's just 2 bytes.   
>>  UCS-2?  UTF-16?  Don't these exclude a lot of unicode characters?   

Yes.  But, as far as I know, there's no consensus about what
to specify in a font supporting SMP or SIP in
CHARSET_REGISTRY and CHARSET_ENCODING fields.

>>  Does emacs provide any internal facility to get UTF-8?

Do you mean a way to convert a character code to UTF-8 byte
sequence in C level?  Then you can use the macro CHAR_STRING
(defined in character.h) because Emacs-unicode's internal
string/buffer representation is UTF-8 byte sequence.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Probably dumb question: glyph rendering on unicode-2 branch
  2005-10-25  1:33   ` Kenichi Handa
@ 2005-10-25  3:24     ` Adrian Robert
  0 siblings, 0 replies; 4+ messages in thread
From: Adrian Robert @ 2005-10-25  3:24 UTC (permalink / raw)
  Cc: emacs-devel


On Oct 24, 2005, at 9:33 PM, Kenichi Handa wrote:

> In article <BD5A24D1-F3BD-4AC9-8762-8E4917C83D2E@cogsci.ucsd.edu>, 
> Adrian Robert <arobert@cogsci.ucsd.edu> writes:
>> I didn't get any response to the below, let me try asking it in a
>> different way:
>
> Sorry for not responding on this matter.  It seems that I
> missed your original mail.

Hmm, might have been because of my "dumb" subject title.. ;'/

Anyhow, thanks very much -- this is all extremely helpful.

Adrian

> <snip...>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-10-25  3:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-17 13:46 Probably dumb question: glyph rendering on unicode-2 branch Adrian Robert
2005-10-24 14:43 ` Adrian Robert
2005-10-25  1:33   ` Kenichi Handa
2005-10-25  3:24     ` Adrian Robert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).