* Looking through multibyte_char_to_unibyte
@ 2014-06-22 17:09 Dmitry Antipov
2014-06-22 17:20 ` Eli Zaretskii
2014-06-22 17:39 ` Andreas Schwab
0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Antipov @ 2014-06-22 17:09 UTC (permalink / raw)
To: Emacs development discussions; +Cc: Kenichi Handa
Looking through preprocessed src/character.c, I found the following function:
int
multibyte_char_to_unibyte (int c)
{
if (c < 0x80)
return c;
if (((c) > 0x3FFF7F))
return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
return (c & 0xFF);
}
Is this a subtle trick which I just can't understand and explain, or something
is really wrong with this?
Dmitry
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Looking through multibyte_char_to_unibyte
2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
@ 2014-06-22 17:20 ` Eli Zaretskii
2014-06-22 17:25 ` Eli Zaretskii
2014-06-22 17:39 ` Andreas Schwab
1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:20 UTC (permalink / raw)
To: Dmitry Antipov; +Cc: handa, emacs-devel
> Date: Sun, 22 Jun 2014 21:09:19 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> Cc: Kenichi Handa <handa@gnu.org>
>
> int
> multibyte_char_to_unibyte (int c)
> {
> if (c < 0x80)
> return c;
> if (((c) > 0x3FFF7F))
> return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
> return (c & 0xFF);
> }
>
> Is this a subtle trick which I just can't understand and explain, or something
> is really wrong with this?
What do you think this function intends to do? IOW, how would you go
about "converting a multibyte character to unibyte", something that is
not possible in principle?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Looking through multibyte_char_to_unibyte
2014-06-22 17:20 ` Eli Zaretskii
@ 2014-06-22 17:25 ` Eli Zaretskii
2014-06-22 17:35 ` Dmitry Antipov
0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:25 UTC (permalink / raw)
To: dmantipov; +Cc: handa, emacs-devel
> Date: Sun, 22 Jun 2014 20:20:30 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: handa@gnu.org, emacs-devel@gnu.org
>
> > Date: Sun, 22 Jun 2014 21:09:19 +0400
> > From: Dmitry Antipov <dmantipov@yandex.ru>
> > Cc: Kenichi Handa <handa@gnu.org>
> >
> > int
> > multibyte_char_to_unibyte (int c)
> > {
> > if (c < 0x80)
> > return c;
> > if (((c) > 0x3FFF7F))
> > return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
> > return (c & 0xFF);
> > }
> >
> > Is this a subtle trick which I just can't understand and explain, or something
> > is really wrong with this?
>
> What do you think this function intends to do? IOW, how would you go
> about "converting a multibyte character to unibyte", something that is
> not possible in principle?
Wait, you probably asked about the recursive call to
multibyte_char_to_unibyte, is that right? If so, that branch is never
taken, and will be optimized away by any reasonable compiler.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Looking through multibyte_char_to_unibyte
2014-06-22 17:25 ` Eli Zaretskii
@ 2014-06-22 17:35 ` Dmitry Antipov
2014-06-22 17:47 ` Eli Zaretskii
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Antipov @ 2014-06-22 17:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: handa, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 316 bytes --]
On 06/22/2014 09:25 PM, Eli Zaretskii wrote:
> Wait, you probably asked about the recursive call to
> multibyte_char_to_unibyte, is that right?
Yes.
> If so, that branch is never taken, and will be optimized
> away by any reasonable compiler.
If so, can I assume that the following cleanup is correct?
Dmitry
[-- Attachment #2: char_convert.patch --]
[-- Type: text/x-patch, Size: 4012 bytes --]
=== modified file 'src/character.c'
--- src/character.c 2014-01-01 07:43:34 +0000
+++ src/character.c 2014-06-22 17:24:31 +0000
@@ -233,32 +233,6 @@
return c;
}
-/* Convert ASCII or 8-bit character C to unibyte. If C is none of
- them, return (C & 0xFF). */
-
-int
-multibyte_char_to_unibyte (int c)
-{
- if (c < 0x80)
- return c;
- if (CHAR_BYTE8_P (c))
- return CHAR_TO_BYTE8 (c);
- return (c & 0xFF);
-}
-
-/* Like multibyte_char_to_unibyte, but return -1 if C is not supported
- by charset_unibyte. */
-
-int
-multibyte_char_to_unibyte_safe (int c)
-{
- if (c < 0x80)
- return c;
- if (CHAR_BYTE8_P (c))
- return CHAR_TO_BYTE8 (c);
- return -1;
-}
-
DEFUN ("characterp", Fcharacterp, Scharacterp, 1, 2, 0,
doc: /* Return non-nil if OBJECT is a character.
In Emacs Lisp, characters are represented by character codes, which
=== modified file 'src/character.h'
--- src/character.h 2014-02-14 16:11:11 +0000
+++ src/character.h 2014-06-22 17:23:57 +0000
@@ -70,17 +70,15 @@
(ASCII_BYTE_P (byte) ? (byte) : BYTE8_TO_CHAR (byte))
/* Return the raw 8-bit byte for character C. */
-#define CHAR_TO_BYTE8(c) \
- (CHAR_BYTE8_P (c) \
- ? (c) - 0x3FFF00 \
- : multibyte_char_to_unibyte (c))
+#define CHAR_TO_BYTE8(c) \
+ (ASCII_CHAR_P (c) ? c \
+ : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : (c & 0xFF)))
/* Return the raw 8-bit byte for character C,
or -1 if C doesn't correspond to a byte. */
-#define CHAR_TO_BYTE_SAFE(c) \
- (CHAR_BYTE8_P (c) \
- ? (c) - 0x3FFF00 \
- : multibyte_char_to_unibyte_safe (c))
+#define CHAR_TO_BYTE_SAFE(c) \
+ (ASCII_CHAR_P (c) ? c \
+ : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : -1))
/* Nonzero iff BYTE is the 1st byte of a multibyte form of a character
that corresponds to a raw 8-bit byte. */
=== modified file 'src/cmds.c'
--- src/cmds.c 2014-06-02 00:18:22 +0000
+++ src/cmds.c 2014-06-22 17:26:40 +0000
@@ -360,8 +360,7 @@
else
{
str[0] = (SINGLE_BYTE_CHAR_P (c)
- ? c
- : multibyte_char_to_unibyte (c));
+ ? c : CHAR_TO_BYTE8 (c));
len = 1;
}
if (!NILP (overwrite)
=== modified file 'src/editfns.c'
--- src/editfns.c 2014-06-17 13:50:22 +0000
+++ src/editfns.c 2014-06-22 17:27:46 +0000
@@ -2238,7 +2238,7 @@
len = CHAR_STRING (c, str);
else
{
- str[0] = ASCII_CHAR_P (c) ? c : multibyte_char_to_unibyte (c);
+ str[0] = CHAR_TO_BYTE8 (c);
len = 1;
}
(*insert_func) ((char *) str, len);
=== modified file 'src/lisp.h'
--- src/lisp.h 2014-06-17 16:09:19 +0000
+++ src/lisp.h 2014-06-22 17:24:38 +0000
@@ -3433,8 +3433,6 @@
/* Defined in character.c. */
extern ptrdiff_t chars_in_text (const unsigned char *, ptrdiff_t);
extern ptrdiff_t multibyte_chars_in_text (const unsigned char *, ptrdiff_t);
-extern int multibyte_char_to_unibyte (int) ATTRIBUTE_CONST;
-extern int multibyte_char_to_unibyte_safe (int) ATTRIBUTE_CONST;
extern void syms_of_character (void);
/* Defined in charset.c. */
=== modified file 'src/search.c'
--- src/search.c 2014-04-25 16:11:07 +0000
+++ src/search.c 2014-06-22 17:27:26 +0000
@@ -2596,7 +2596,7 @@
{
FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, newtext, pos, pos_byte);
if (!buf_multibyte)
- c = multibyte_char_to_unibyte (c);
+ c = CHAR_TO_BYTE8 (c);
}
else
{
@@ -2619,7 +2619,7 @@
FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, newtext,
pos, pos_byte);
if (!buf_multibyte && !ASCII_CHAR_P (c))
- c = multibyte_char_to_unibyte (c);
+ c = CHAR_TO_BYTE8 (c);
}
else
{
=== modified file 'src/xdisp.c'
--- src/xdisp.c 2014-06-21 19:45:59 +0000
+++ src/xdisp.c 2014-06-22 17:25:56 +0000
@@ -9932,9 +9932,7 @@
for (i = 0; i < nbytes; i += char_bytes)
{
c = string_char_and_length (msg + i, &char_bytes);
- work[0] = (ASCII_CHAR_P (c)
- ? c
- : multibyte_char_to_unibyte (c));
+ work[0] = CHAR_TO_BYTE8 (c);
insert_1_both (work, 1, 1, 1, 0, 0);
}
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Looking through multibyte_char_to_unibyte
2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
2014-06-22 17:20 ` Eli Zaretskii
@ 2014-06-22 17:39 ` Andreas Schwab
1 sibling, 0 replies; 6+ messages in thread
From: Andreas Schwab @ 2014-06-22 17:39 UTC (permalink / raw)
To: Dmitry Antipov; +Cc: Kenichi Handa, Emacs development discussions
Dmitry Antipov <dmantipov@yandex.ru> writes:
> Looking through preprocessed src/character.c, I found the following function:
>
> int
> multibyte_char_to_unibyte (int c)
> {
> if (c < 0x80)
> return c;
> if (((c) > 0x3FFF7F))
> return (((c) > 0x3FFF7F) ? (c) - 0x3FFF00 : multibyte_char_to_unibyte (c));
> return (c & 0xFF);
> }
>
> Is this a subtle trick which I just can't understand and explain, or something
> is really wrong with this?
That's the expansion of the CHAR_BYTE8_P and CHAR_TO_BYTE8 macros.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Looking through multibyte_char_to_unibyte
2014-06-22 17:35 ` Dmitry Antipov
@ 2014-06-22 17:47 ` Eli Zaretskii
0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2014-06-22 17:47 UTC (permalink / raw)
To: Dmitry Antipov; +Cc: handa, emacs-devel
> Date: Sun, 22 Jun 2014 21:35:52 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: handa@gnu.org, emacs-devel@gnu.org
>
> > If so, that branch is never taken, and will be optimized
> > away by any reasonable compiler.
>
> If so, can I assume that the following cleanup is correct?
Looks OK to me, except that here:
> /* Return the raw 8-bit byte for character C. */
> -#define CHAR_TO_BYTE8(c) \
> - (CHAR_BYTE8_P (c) \
> - ? (c) - 0x3FFF00 \
> - : multibyte_char_to_unibyte (c))
> +#define CHAR_TO_BYTE8(c) \
> + (ASCII_CHAR_P (c) ? c \
> + : (CHAR_BYTE8_P (c) ? (c) - 0x3FFF00 : (c & 0xFF)))
why do you need a separate test for ASCII_CHAR_P?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-22 17:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-22 17:09 Looking through multibyte_char_to_unibyte Dmitry Antipov
2014-06-22 17:20 ` Eli Zaretskii
2014-06-22 17:25 ` Eli Zaretskii
2014-06-22 17:35 ` Dmitry Antipov
2014-06-22 17:47 ` Eli Zaretskii
2014-06-22 17:39 ` Andreas Schwab
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.