Looking through multibyte_char_to

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Looking through multibyte_char_to_unibyte
@ 2014-06-22 17:09 Dmitry Antipov
  2014-06-22 17:20 ` Eli Zaretskii
  2014-06-22 17:39 ` Andreas Schwab
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Antipov @ 2014-06-22 17:09 UTC (permalink / raw)
  To: Emacs development discussions; +Cc: Kenichi Handa

Looking through preprocessed src/character.c, I found the following function:

int
multibyte_char_to_unibyte (int c)
{
   if (c < 0x80)
     return c;
   if (((c) > 0x3FFF7F))
     return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
   return (c & 0xFF);
}

Is this a subtle trick which I just can't understand and explain, or something
is really wrong with this?

Dmitry



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking through multibyte_char_to_unibyte
  2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
@ 2014-06-22 17:20 ` Eli Zaretskii
  2014-06-22 17:25   ` Eli Zaretskii
  2014-06-22 17:39 ` Andreas Schwab
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:20 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: handa, emacs-devel

> Date: Sun, 22 Jun 2014 21:09:19 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> Cc: Kenichi Handa <handa@gnu.org>
> 
> int
> multibyte_char_to_unibyte (int c)
> {
>    if (c < 0x80)
>      return c;
>    if (((c) > 0x3FFF7F))
>      return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
>    return (c & 0xFF);
> }
> 
> Is this a subtle trick which I just can't understand and explain, or something
> is really wrong with this?

What do you think this function intends to do?  IOW, how would you go
about "converting a multibyte character to unibyte", something that is
not possible in principle?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking through multibyte_char_to_unibyte
  2014-06-22 17:20 ` Eli Zaretskii
@ 2014-06-22 17:25   ` Eli Zaretskii
  2014-06-22 17:35     ` Dmitry Antipov
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:25 UTC (permalink / raw)
  To: dmantipov; +Cc: handa, emacs-devel

> Date: Sun, 22 Jun 2014 20:20:30 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: handa@gnu.org, emacs-devel@gnu.org
> 
> > Date: Sun, 22 Jun 2014 21:09:19 +0400
> > From: Dmitry Antipov <dmantipov@yandex.ru>
> > Cc: Kenichi Handa <handa@gnu.org>
> > 
> > int
> > multibyte_char_to_unibyte (int c)
> > {
> >    if (c < 0x80)
> >      return c;
> >    if (((c) > 0x3FFF7F))
> >      return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
> >    return (c & 0xFF);
> > }
> > 
> > Is this a subtle trick which I just can't understand and explain, or something
> > is really wrong with this?
> 
> What do you think this function intends to do?  IOW, how would you go
> about "converting a multibyte character to unibyte", something that is
> not possible in principle?

Wait, you probably asked about the recursive call to
multibyte_char_to_unibyte, is that right?  If so, that branch is never
taken, and will be optimized away by any reasonable compiler.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking through multibyte_char_to_unibyte
  2014-06-22 17:25   ` Eli Zaretskii
@ 2014-06-22 17:35     ` Dmitry Antipov
  2014-06-22 17:47       ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Antipov @ 2014-06-22 17:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 316 bytes --]

On 06/22/2014 09:25 PM, Eli Zaretskii wrote:

> Wait, you probably asked about the recursive call to
> multibyte_char_to_unibyte, is that right?

Yes.

> If so, that branch is never taken, and will be optimized
> away by any reasonable compiler.

If so, can I assume that the following cleanup is correct?

Dmitry



[-- Attachment #2: char_convert.patch --]
[-- Type: text/x-patch, Size: 4012 bytes --]

=== modified file 'src/character.c'
--- src/character.c	2014-01-01 07:43:34 +0000
+++ src/character.c	2014-06-22 17:24:31 +0000
@@ -233,32 +233,6 @@
   return c;
 }
 
-/* Convert ASCII or 8-bit character C to unibyte.  If C is none of
-   them, return (C & 0xFF).  */
-
-int
-multibyte_char_to_unibyte (int c)
-{
-  if (c < 0x80)
-    return c;
-  if (CHAR_BYTE8_P (c))
-    return CHAR_TO_BYTE8 (c);
-  return (c & 0xFF);
-}
-
-/* Like multibyte_char_to_unibyte, but return -1 if C is not supported
-   by charset_unibyte.  */
-
-int
-multibyte_char_to_unibyte_safe (int c)
-{
-  if (c < 0x80)
-    return c;
-  if (CHAR_BYTE8_P (c))
-    return CHAR_TO_BYTE8 (c);
-  return -1;
-}
-
 DEFUN ("characterp", Fcharacterp, Scharacterp, 1, 2, 0,
        doc: /* Return non-nil if OBJECT is a character.
 In Emacs Lisp, characters are represented by character codes, which

=== modified file 'src/character.h'
--- src/character.h	2014-02-14 16:11:11 +0000
+++ src/character.h	2014-06-22 17:23:57 +0000
@@ -70,17 +70,15 @@
   (ASCII_BYTE_P (byte) ? (byte) : BYTE8_TO_CHAR (byte))
 
 /* Return the raw 8-bit byte for character C.  */
-#define CHAR_TO_BYTE8(c)	\
-  (CHAR_BYTE8_P (c)		\
-   ? (c) - 0x3FFF00		\
-   : multibyte_char_to_unibyte (c))
+#define CHAR_TO_BYTE8(c)				\
+  (ASCII_CHAR_P (c) ? c				\
+   : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : (c & 0xFF)))
 
 /* Return the raw 8-bit byte for character C,
    or -1 if C doesn't correspond to a byte.  */
-#define CHAR_TO_BYTE_SAFE(c)	\
-  (CHAR_BYTE8_P (c)		\
-   ? (c) - 0x3FFF00		\
-   : multibyte_char_to_unibyte_safe (c))
+#define CHAR_TO_BYTE_SAFE(c)			\
+  (ASCII_CHAR_P (c) ? c			\
+   : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : -1))
 
 /* Nonzero iff BYTE is the 1st byte of a multibyte form of a character
    that corresponds to a raw 8-bit byte.  */

=== modified file 'src/cmds.c'
--- src/cmds.c	2014-06-02 00:18:22 +0000
+++ src/cmds.c	2014-06-22 17:26:40 +0000
@@ -360,8 +360,7 @@
   else
     {
       str[0] = (SINGLE_BYTE_CHAR_P (c)
-		? c
-		: multibyte_char_to_unibyte (c));
+		? c : CHAR_TO_BYTE8 (c));
       len = 1;
     }
   if (!NILP (overwrite)

=== modified file 'src/editfns.c'
--- src/editfns.c	2014-06-17 13:50:22 +0000
+++ src/editfns.c	2014-06-22 17:27:46 +0000
@@ -2238,7 +2238,7 @@
 	    len = CHAR_STRING (c, str);
 	  else
 	    {
-	      str[0] = ASCII_CHAR_P (c) ? c : multibyte_char_to_unibyte (c);
+	      str[0] = CHAR_TO_BYTE8 (c);
 	      len = 1;
 	    }
 	  (*insert_func) ((char *) str, len);

=== modified file 'src/lisp.h'
--- src/lisp.h	2014-06-17 16:09:19 +0000
+++ src/lisp.h	2014-06-22 17:24:38 +0000
@@ -3433,8 +3433,6 @@
 /* Defined in character.c.  */
 extern ptrdiff_t chars_in_text (const unsigned char *, ptrdiff_t);
 extern ptrdiff_t multibyte_chars_in_text (const unsigned char *, ptrdiff_t);
-extern int multibyte_char_to_unibyte (int) ATTRIBUTE_CONST;
-extern int multibyte_char_to_unibyte_safe (int) ATTRIBUTE_CONST;
 extern void syms_of_character (void);
 
 /* Defined in charset.c.  */

=== modified file 'src/search.c'
--- src/search.c	2014-04-25 16:11:07 +0000
+++ src/search.c	2014-06-22 17:27:26 +0000
@@ -2596,7 +2596,7 @@
 	    {
 	      FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, newtext, pos, pos_byte);
 	      if (!buf_multibyte)
-		c = multibyte_char_to_unibyte (c);
+		c = CHAR_TO_BYTE8 (c);
 	    }
 	  else
 	    {
@@ -2619,7 +2619,7 @@
 		  FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, newtext,
 						      pos, pos_byte);
 		  if (!buf_multibyte && !ASCII_CHAR_P (c))
-		    c = multibyte_char_to_unibyte (c);
+		    c = CHAR_TO_BYTE8 (c);
 		}
 	      else
 		{

=== modified file 'src/xdisp.c'
--- src/xdisp.c	2014-06-21 19:45:59 +0000
+++ src/xdisp.c	2014-06-22 17:25:56 +0000
@@ -9932,9 +9932,7 @@
 	  for (i = 0; i < nbytes; i += char_bytes)
 	    {
 	      c = string_char_and_length (msg + i, &char_bytes);
-	      work[0] = (ASCII_CHAR_P (c)
-			 ? c
-			 : multibyte_char_to_unibyte (c));
+	      work[0] = CHAR_TO_BYTE8 (c);
 	      insert_1_both (work, 1, 1, 1, 0, 0);
 	    }
 	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking through multibyte_char_to_unibyte
  2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
  2014-06-22 17:20 ` Eli Zaretskii
@ 2014-06-22 17:39 ` Andreas Schwab
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Schwab @ 2014-06-22 17:39 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Kenichi Handa, Emacs development discussions

Dmitry Antipov <dmantipov@yandex.ru> writes:

> Looking through preprocessed src/character.c, I found the following function:
>
> int
> multibyte_char_to_unibyte (int c)
> {
>   if (c < 0x80)
>     return c;
>   if (((c) > 0x3FFF7F))
>     return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
>   return (c & 0xFF);
> }
>
> Is this a subtle trick which I just can't understand and explain, or something
> is really wrong with this?

That's the expansion of the CHAR_BYTE8_P and CHAR_TO_BYTE8 macros.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking through multibyte_char_to_unibyte
  2014-06-22 17:35     ` Dmitry Antipov
@ 2014-06-22 17:47       ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:47 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: handa, emacs-devel

> Date: Sun, 22 Jun 2014 21:35:52 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: handa@gnu.org, emacs-devel@gnu.org
> 
> > If so, that branch is never taken, and will be optimized
> > away by any reasonable compiler.
> 
> If so, can I assume that the following cleanup is correct?

Looks OK to me, except that here:

>  /* Return the raw 8-bit byte for character C.  */
> -#define CHAR_TO_BYTE8(c)	\
> -  (CHAR_BYTE8_P (c)		\
> -   ? (c) - 0x3FFF00		\
> -   : multibyte_char_to_unibyte (c))
> +#define CHAR_TO_BYTE8(c)				\
> +  (ASCII_CHAR_P (c) ? c				\
> +   : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : (c & 0xFF)))

why do you need a separate test for ASCII_CHAR_P?



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-22 17:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
2014-06-22 17:20 ` Eli Zaretskii
2014-06-22 17:25   ` Eli Zaretskii
2014-06-22 17:35     ` Dmitry Antipov
2014-06-22 17:47       ` Eli Zaretskii
2014-06-22 17:39 ` Andreas Schwab

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.