bug#24378: [PATCH 0/6] Small fixes and improvements

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#24378: [PATCH 0/6] Small fixes and improvements
@ 2016-09-06 13:28 Michal Nazarewicz
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
  2016-09-09 16:34 ` bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz
  0 siblings, 2 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:28 UTC (permalink / raw)
  To: 24378

Just some small fixes.  If no objections are brought forward I’ll push
it in a few days.

Michal Nazarewicz (6):
  Don’t use FETCH_MULTIBYTE_CHAR when advancing index
  Remove inaccurate comment in regex.c
  Replace decimalnump with alphanumericp
  Remove dead loop iterations in regex.c
  Don’t allocate char-table’s extra slots in regexp-out-charset
  Factor out character category lookup to separate function

 lisp/emacs-lisp/regexp-opt.el |  2 +-
 src/casefiddle.c              |  5 +----
 src/character.c               | 50 ++++++++++++++++++++++++-------------------
 src/character.h               |  2 +-
 src/regex.c                   | 33 ++++++++++++----------------
 5 files changed, 45 insertions(+), 47 deletions(-)

-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index
  2016-09-06 13:28 bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz
@ 2016-09-06 13:31 ` Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 2/6] Remove inaccurate comment in regex.c Michal Nazarewicz
                     ` (5 more replies)
  2016-09-09 16:34 ` bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz
  1 sibling, 6 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

* src/casefiddle.c (casify_region): use STRING_CHAR_AND_LENGTH as a safe
alternative to FETCH_MULTIBYTE_CHAR which is documented in src/buffer.h
as unsafe when used for advancing index.
---
 src/casefiddle.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/casefiddle.c b/src/casefiddle.c
index 6114a6f..5456eac 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -223,10 +223,7 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
       int c2, len;
 
       if (multibyte)
-	{
-	  c = FETCH_MULTIBYTE_CHAR (start_byte);
-	  len = CHAR_BYTES (c);
-	}
+	c = STRING_CHAR_AND_LENGTH (BYTE_POS_ADDR(start_byte), len);
       else
 	{
 	  c = FETCH_BYTE (start_byte);
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 2/6] Remove inaccurate comment in regex.c
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
@ 2016-09-06 13:31   ` Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 3/6] Replace decimalnump with alphanumericp Michal Nazarewicz
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

* src/regex.c (regex_compile): Remove comment indicating that wctype of
some character classes may be negative.  All wctypes are in fact
non-negative.
---
 src/regex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/regex.c b/src/regex.c
index c191f24..c808398 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -2879,8 +2879,7 @@ regex_compile (const_re_char *pattern, size_t size,
 		    /* Most character classes in a multibyte match just set
 		       a flag.  Exceptions are is_blank, is_digit, is_cntrl, and
 		       is_xdigit, since they can only match ASCII characters.
-		       We don't need to handle them for multibyte.  They are
-		       distinguished by a negative wctype.  */
+		       We don't need to handle them for multibyte.  */
 
 		    /* Setup the gl_state object to its buffer-defined value.
 		       This hardcodes the buffer-global syntax-table for ASCII
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 3/6] Replace decimalnump with alphanumericp
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 2/6] Remove inaccurate comment in regex.c Michal Nazarewicz
@ 2016-09-06 13:31   ` Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 4/6] Remove dead loop iterations in regex.c Michal Nazarewicz
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

decimalnump was used in regex.c only in ISALNUM macro which ored it with
alphabeticp.  Because both of those functions require Unicode general
category lookup, this resulted in unnecessary lookups (if alphabeticp
return false decimalp had to perform another lookup).  Drop decimalnump
in favour of alphanumericp which combines decimelnump with alphabeticp.

* src/character.c (decimalnump): Remove in favour of…
(alphanumericp): …new function.

* src/regex.c (ISALNUM): Use alphanumericp.
---
 src/character.c | 17 +++++++++++++----
 src/character.h |  2 +-
 src/regex.c     |  2 +-
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/character.c b/src/character.c
index 9f60aa7..b19e41d 100644
--- a/src/character.c
+++ b/src/character.c
@@ -983,17 +983,26 @@ alphabeticp (int c)
 	  || gen_cat == UNICODE_CATEGORY_Nl);
 }
 
-/* Return true if C is a decimal-number character.  */
+/* Return true if C is a alphabetic or decimal-number character.  */
 bool
-decimalnump (int c)
+alphanumericp (int c)
 {
   Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
   if (! INTEGERP (category))
     return false;
   EMACS_INT gen_cat = XINT (category);
 
-  /* See UTS #18.  */
-  return gen_cat == UNICODE_CATEGORY_Nd;
+  /* See UTS #18.  Same comment as for alphabeticp applies.  FIXME. */
+  return (gen_cat == UNICODE_CATEGORY_Lu
+	  || gen_cat == UNICODE_CATEGORY_Ll
+	  || gen_cat == UNICODE_CATEGORY_Lt
+	  || gen_cat == UNICODE_CATEGORY_Lm
+	  || gen_cat == UNICODE_CATEGORY_Lo
+	  || gen_cat == UNICODE_CATEGORY_Mn
+	  || gen_cat == UNICODE_CATEGORY_Mc
+	  || gen_cat == UNICODE_CATEGORY_Me
+	  || gen_cat == UNICODE_CATEGORY_Nl
+	  || gen_cat == UNICODE_CATEGORY_Nd);
 }
 
 /* Return true if C is a graphic character.  */
diff --git a/src/character.h b/src/character.h
index 0d0e31c..5499cc1 100644
--- a/src/character.h
+++ b/src/character.h
@@ -679,7 +679,7 @@ extern Lisp_Object Vchar_unify_table;
 extern Lisp_Object string_escape_byte8 (Lisp_Object);
 
 extern bool alphabeticp (int);
-extern bool decimalnump (int);
+extern bool alphanumericp (int);
 extern bool graphicp (int);
 extern bool printablep (int);
 
diff --git a/src/regex.c b/src/regex.c
index c808398..5f51b43 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -324,7 +324,7 @@ enum syntaxcode { Swhitespace = 0, Sword = 1, Ssymbol = 2 };
 		    ? (((c) >= 'a' && (c) <= 'z')	\
 		       || ((c) >= 'A' && (c) <= 'Z')	\
 		       || ((c) >= '0' && (c) <= '9'))	\
-		    : (alphabeticp (c) || decimalnump (c)))
+		    : alphanumericp (c))
 
 # define ISALPHA(c) (IS_REAL_ASCII (c)			\
 		    ? (((c) >= 'a' && (c) <= 'z')	\
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 4/6] Remove dead loop iterations in regex.c
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 2/6] Remove inaccurate comment in regex.c Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 3/6] Replace decimalnump with alphanumericp Michal Nazarewicz
@ 2016-09-06 13:31   ` Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 5/6] Don’t allocate char-table’s extra slots in regexp-out-charset Michal Nazarewicz
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

RE_CHAR_TO_MULTIBYTE(c) yields c for ASCII characters and a byte8
character for c ≥ 0x80.  Furthermore, CHAR_BYTE8_P(c) is true only
for byte8 characters.  This means that

	c = RE_CHAR_TO_MULTIBYTE (ch);
	if (! CHAR_BYTE8_P (c) && re_iswctype (c, cc))

is equivalent to:

	c = c;
	if (! false && re_iswctype (c, cc))

for 0 ⪬ c < 0x80, and

	c = BYTE8_TO_CHAR (c);
	if (! true && re_iswctype (c, cc))

for 0x80 ⪬ c < 0x100.  In other words, the loop never executes for
c ≥ 0x80 and RE_CHAR_TO_MULTIBYTE call is unnecessary for c < 0x80.

* src/regex.c (regex_compile): Simplyfy a for loop by eliminating
dead iterations and unnecessary macro calls.
---
 src/regex.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/src/regex.c b/src/regex.c
index 5f51b43..41c1d3f 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -2888,22 +2888,18 @@ regex_compile (const_re_char *pattern, size_t size,
 		       done until now.  */
 		    SETUP_BUFFER_SYNTAX_TABLE ();
 
-		    for (ch = 0; ch < 256; ++ch)
-		      {
-			c = RE_CHAR_TO_MULTIBYTE (ch);
-			if (! CHAR_BYTE8_P (c)
-			    && re_iswctype (c, cc))
-			  {
-			    SET_LIST_BIT (ch);
-			    c1 = TRANSLATE (c);
-			    if (c1 == c)
-			      continue;
-			    if (ASCII_CHAR_P (c1))
-			      SET_LIST_BIT (c1);
-			    else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0)
-			      SET_LIST_BIT (c1);
-			  }
-		      }
+		    for (c = 0; c < 0x80; ++c)
+		      if (re_iswctype (c, cc))
+			{
+			  SET_LIST_BIT (c);
+			  c1 = TRANSLATE (c);
+			  if (c1 == c)
+			    continue;
+			  if (ASCII_CHAR_P (c1))
+			    SET_LIST_BIT (c1);
+			  else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0)
+			    SET_LIST_BIT (c1);
+			}
 		    SET_RANGE_TABLE_WORK_AREA_BIT
 		      (range_table_work, re_wctype_to_bit (cc));
 #endif	/* emacs */
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 5/6] Don’t allocate char-table’s extra slots in regexp-out-charset
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
                     ` (2 preceding siblings ...)
  2016-09-06 13:31   ` bug#24378: [PATCH 4/6] Remove dead loop iterations in regex.c Michal Nazarewicz
@ 2016-09-06 13:31   ` Michal Nazarewicz
  2016-09-06 13:31   ` bug#24378: [PATCH 6/6] Factor out character category lookup to separate function Michal Nazarewicz
  2016-09-06 14:43   ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Eli Zaretskii
  5 siblings, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

* lisp/emacs-lisp/regexp-opt.el (regexp-opt-charset): Do not use
'case-table as charmap char-table’s property.  The function has nothing
to do with casing and in addition using 'case-table causes unnecessary
extra slots to be allocated which ‘regexp-opt-charset’ does not use.
---
 lisp/emacs-lisp/regexp-opt.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/emacs-lisp/regexp-opt.el b/lisp/emacs-lisp/regexp-opt.el
index b1e132a..cf66530 100644
--- a/lisp/emacs-lisp/regexp-opt.el
+++ b/lisp/emacs-lisp/regexp-opt.el
@@ -236,7 +236,7 @@ regexp-opt-charset
   ;; The basic idea is to find character ranges.  Also we take care in the
   ;; position of character set meta characters in the character set regexp.
   ;;
-  (let* ((charmap (make-char-table 'case-table))
+  (let* ((charmap (make-char-table 'regexp-opt-charset))
 	 (start -1) (end -2)
 	 (charset "")
 	 (bracket "") (dash "") (caret ""))
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 6/6] Factor out character category lookup to separate function
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
                     ` (3 preceding siblings ...)
  2016-09-06 13:31   ` bug#24378: [PATCH 5/6] Don’t allocate char-table’s extra slots in regexp-out-charset Michal Nazarewicz
@ 2016-09-06 13:31   ` Michal Nazarewicz
  2016-09-06 14:54     ` Eli Zaretskii
  2016-09-06 14:43   ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Eli Zaretskii
  5 siblings, 1 reply; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 13:31 UTC (permalink / raw)
  To: 24378

* src/character.c (char_unicode_category): New function returning Unicode
general category of specified character.
(alphabeticp, alphanumericp, graphicp, printablep): Use the above.
---
 src/character.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/src/character.c b/src/character.c
index b19e41d..18a8fa3 100644
--- a/src/character.c
+++ b/src/character.c
@@ -960,14 +960,18 @@ character is not ASCII nor 8-bit character, an error is signaled.  */)
   return make_number (c);
 }
 
+static unicode_category_t
+char_unicode_category (int c)
+{
+  Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
+  return INTEGERP (category) ? XINT (category) : UNICODE_CATEGORY_UNKNOWN;
+}
+
 /* Return true if C is an alphabetic character.  */
 bool
 alphabeticp (int c)
 {
-  Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
-  if (! INTEGERP (category))
-    return false;
-  EMACS_INT gen_cat = XINT (category);
+  unicode_category_t gen_cat = char_unicode_category (c);
 
   /* See UTS #18.  There are additional characters that should be
      here, those designated as Other_uppercase, Other_lowercase,
@@ -987,10 +991,7 @@ alphabeticp (int c)
 bool
 alphanumericp (int c)
 {
-  Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
-  if (! INTEGERP (category))
-    return false;
-  EMACS_INT gen_cat = XINT (category);
+  unicode_category_t gen_cat = char_unicode_category (c);
 
   /* See UTS #18.  Same comment as for alphabeticp applies.  FIXME. */
   return (gen_cat == UNICODE_CATEGORY_Lu
@@ -1009,13 +1010,11 @@ alphanumericp (int c)
 bool
 graphicp (int c)
 {
-  Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
-  if (! INTEGERP (category))
-    return false;
-  EMACS_INT gen_cat = XINT (category);
+  unicode_category_t gen_cat = char_unicode_category (c);
 
   /* See UTS #18.  */
-  return (!(gen_cat == UNICODE_CATEGORY_Zs /* space separator */
+  return (!(gen_cat == UNICODE_CATEGORY_UNKNOWN
+	    || gen_cat == UNICODE_CATEGORY_Zs /* space separator */
 	    || gen_cat == UNICODE_CATEGORY_Zl /* line separator */
 	    || gen_cat == UNICODE_CATEGORY_Zp /* paragraph separator */
 	    || gen_cat == UNICODE_CATEGORY_Cc /* control */
@@ -1027,13 +1026,11 @@ graphicp (int c)
 bool
 printablep (int c)
 {
-  Lisp_Object category = CHAR_TABLE_REF (Vunicode_category_table, c);
-  if (! INTEGERP (category))
-    return false;
-  EMACS_INT gen_cat = XINT (category);
+  unicode_category_t gen_cat = char_unicode_category (c);
 
   /* See UTS #18.  */
-  return (!(gen_cat == UNICODE_CATEGORY_Cc /* control */
+  return (!(gen_cat == UNICODE_CATEGORY_UNKNOWN
+	    || gen_cat == UNICODE_CATEGORY_Cc /* control */
 	    || gen_cat == UNICODE_CATEGORY_Cs /* surrogate */
 	    || gen_cat == UNICODE_CATEGORY_Cn)); /* unassigned */
 }
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
                     ` (4 preceding siblings ...)
  2016-09-06 13:31   ` bug#24378: [PATCH 6/6] Factor out character category lookup to separate function Michal Nazarewicz
@ 2016-09-06 14:43   ` Eli Zaretskii
  2016-09-06 15:17     ` Michal Nazarewicz
  2016-09-06 15:24     ` bug#24378: [PATCH] Remove obsolete comment from FETCH_MULTIBYTE_CHAR documentation Michal Nazarewicz
  5 siblings, 2 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-09-06 14:43 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: 24378

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Tue,  6 Sep 2016 15:31:29 +0200
> 
> * src/casefiddle.c (casify_region): use STRING_CHAR_AND_LENGTH as a safe
> alternative to FETCH_MULTIBYTE_CHAR which is documented in src/buffer.h
> as unsafe when used for advancing index.

AFAIK, that comment is stale and no longer correct.  I don't think
there's a reason to make this change anymore, since STRING_CHAR no
longer unifies characters.

Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 6/6] Factor out character category lookup to separate function
  2016-09-06 13:31   ` bug#24378: [PATCH 6/6] Factor out character category lookup to separate function Michal Nazarewicz
@ 2016-09-06 14:54     ` Eli Zaretskii
  2016-09-06 15:11       ` Michal Nazarewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2016-09-06 14:54 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: 24378

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Tue,  6 Sep 2016 15:31:34 +0200
> 
> * src/character.c (char_unicode_category): New function returning Unicode
> general category of specified character.
> (alphabeticp, alphanumericp, graphicp, printablep): Use the above.

Is there any future change that needs this refactoring?  Or is there
any other reason for this?

Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 6/6] Factor out character category lookup to separate function
  2016-09-06 14:54     ` Eli Zaretskii
@ 2016-09-06 15:11       ` Michal Nazarewicz
  2016-09-06 15:21         ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24378

On Tue, Sep 06 2016, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Date: Tue,  6 Sep 2016 15:31:34 +0200
>> 
>> * src/character.c (char_unicode_category): New function returning Unicode
>> general category of specified character.
>> (alphabeticp, alphanumericp, graphicp, printablep): Use the above.
>
> Is there any future change that needs this refactoring?  Or is there
> any other reason for this?

I have dozen or so patches in pipeline which will build on top of this
change.  I can wait with this patch and resubmit it with the rest of the
changes so it’s clearer how it all fits together.

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index
  2016-09-06 14:43   ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Eli Zaretskii
@ 2016-09-06 15:17     ` Michal Nazarewicz
  2016-09-06 15:27       ` Eli Zaretskii
  2016-09-06 15:24     ` bug#24378: [PATCH] Remove obsolete comment from FETCH_MULTIBYTE_CHAR documentation Michal Nazarewicz
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 15:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24378

On Tue, Sep 06 2016, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Date: Tue,  6 Sep 2016 15:31:29 +0200
>> 
>> * src/casefiddle.c (casify_region): use STRING_CHAR_AND_LENGTH as a safe
>> alternative to FETCH_MULTIBYTE_CHAR which is documented in src/buffer.h
>> as unsafe when used for advancing index.
>
> AFAIK, that comment is stale and no longer correct.  I don't think
> there's a reason to make this change anymore, since STRING_CHAR no
> longer unifies characters.

I’ll drop the patch in favour of updating the comment in
FETCH_MULTIBYTE_CHAR then.

/me wonders if STRING_CHAR_AND_LENGTH must be faster than
FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES.

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 6/6] Factor out character category lookup to separate function
  2016-09-06 15:11       ` Michal Nazarewicz
@ 2016-09-06 15:21         ` Eli Zaretskii
  2016-09-06 15:34           ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2016-09-06 15:21 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: 24378

> From: Michal Nazarewicz <mina86@mina86.com>
> Cc: 24378@debbugs.gnu.org
> Date: Tue, 06 Sep 2016 17:11:14 +0200
> 
> >> * src/character.c (char_unicode_category): New function returning Unicode
> >> general category of specified character.
> >> (alphabeticp, alphanumericp, graphicp, printablep): Use the above.
> >
> > Is there any future change that needs this refactoring?  Or is there
> > any other reason for this?
> 
> I have dozen or so patches in pipeline which will build on top of this
> change.  I can wait with this patch and resubmit it with the rest of the
> changes so it’s clearer how it all fits together.

Yes, please commit this together with the changes that build on it.

Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH] Remove obsolete comment from FETCH_MULTIBYTE_CHAR documentation
  2016-09-06 14:43   ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Eli Zaretskii
  2016-09-06 15:17     ` Michal Nazarewicz
@ 2016-09-06 15:24     ` Michal Nazarewicz
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-06 15:24 UTC (permalink / raw)
  To: 24378

* src/buffer.h (FETCH_MULTIBYTE_CHAR): STRING_CHAR macro no longer
unifies characters so the comment about byte-length changing is no
longer accurate; remove it.  While at it, change the function to use
BYTE_POS_ADDR instead of open-coding it.
---
 src/buffer.h | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/src/buffer.h b/src/buffer.h
index 87b7cee..fa4866e 100644
--- a/src/buffer.h
+++ b/src/buffer.h
@@ -1182,23 +1182,12 @@ buffer_has_overlays (void)
 
 /* Return character code of multi-byte form at byte position POS.  If POS
    doesn't point the head of valid multi-byte form, only the byte at
-   POS is returned.  No range checking.
-
-   WARNING: The character returned by this macro could be "unified"
-   inside STRING_CHAR, if the original character in the buffer belongs
-   to one of the Private Use Areas (PUAs) of codepoints that Emacs
-   uses to support non-unified CJK characters.  If that happens,
-   CHAR_BYTES will return a value that is different from the length of
-   the original multibyte sequence stored in the buffer.  Therefore,
-   do _not_ use FETCH_MULTIBYTE_CHAR if you need to advance through
-   the buffer to the next character after fetching this one.  Instead,
-   use either FETCH_CHAR_ADVANCE or STRING_CHAR_AND_LENGTH.  */
+   POS is returned.  No range checking. */
 
 INLINE int
 FETCH_MULTIBYTE_CHAR (ptrdiff_t pos)
 {
-  unsigned char *p = ((pos >= GPT_BYTE ? GAP_SIZE : 0)
-		      + pos + BEG_ADDR - BEG_BYTE);
+  unsigned char *p = BYTE_POS_ADDR (pos);
   return STRING_CHAR (p);
 }
 
-- 
2.8.0.rc3.226.g39d4020






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index
  2016-09-06 15:17     ` Michal Nazarewicz
@ 2016-09-06 15:27       ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-09-06 15:27 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: 24378

> From: Michal Nazarewicz <mina86@mina86.com>
> Cc: 24378@debbugs.gnu.org
> Date: Tue, 06 Sep 2016 17:17:30 +0200
> 
> On Tue, Sep 06 2016, Eli Zaretskii wrote:
> >> From: Michal Nazarewicz <mina86@mina86.com>
> >> Date: Tue,  6 Sep 2016 15:31:29 +0200
> >> 
> >> * src/casefiddle.c (casify_region): use STRING_CHAR_AND_LENGTH as a safe
> >> alternative to FETCH_MULTIBYTE_CHAR which is documented in src/buffer.h
> >> as unsafe when used for advancing index.
> >
> > AFAIK, that comment is stale and no longer correct.  I don't think
> > there's a reason to make this change anymore, since STRING_CHAR no
> > longer unifies characters.
> 
> I’ll drop the patch in favour of updating the comment in
> FETCH_MULTIBYTE_CHAR then.

Thanks.  While you are at that, please also update a similar comment
for STRING_CHAR as well.

> /me wonders if STRING_CHAR_AND_LENGTH must be faster than
> FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES.

It might or might not be faster (measurements are welcome), but
FETCH_MULTIBYTE_CHAR is safer when the code could GC, because it uses
a byte position, not a C pointer to buffer text.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 6/6] Factor out character category lookup to separate function
  2016-09-06 15:21         ` Eli Zaretskii
@ 2016-09-06 15:34           ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-09-06 15:34 UTC (permalink / raw)
  To: mina86; +Cc: 24378

> Date: Tue, 06 Sep 2016 18:21:46 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 24378@debbugs.gnu.org
> 
> > I have dozen or so patches in pipeline which will build on top of this
> > change.  I can wait with this patch and resubmit it with the rest of the
> > changes so it’s clearer how it all fits together.
> 
> Yes, please commit this together with the changes that build on it.

To clarify: I meant that only for the char_unicode_category part, not
for the rest of the patches you posted.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#24378: [PATCH 0/6] Small fixes and improvements
  2016-09-06 13:28 bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz
  2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
@ 2016-09-09 16:34 ` Michal Nazarewicz
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Nazarewicz @ 2016-09-09 16:34 UTC (permalink / raw)
  To: 24378-done

On Tue, Sep 06 2016, Michal Nazarewicz wrote:
> Just some small fixes.  If no objections are brought forward I’ll push
> it in a few days.
>
> Michal Nazarewicz (6):
>   Don’t use FETCH_MULTIBYTE_CHAR when advancing index
>   Remove inaccurate comment in regex.c
>   Replace decimalnump with alphanumericp
>   Remove dead loop iterations in regex.c
>   Don’t allocate char-table’s extra slots in regexp-out-charset

All above pushed.

>   Factor out character category lookup to separate function

As per the thread, this will wait for later submission.

>  lisp/emacs-lisp/regexp-opt.el |  2 +-
>  src/casefiddle.c              |  5 +----
>  src/character.c               | 50 ++++++++++++++++++++++++-------------------
>  src/character.h               |  2 +-
>  src/regex.c                   | 33 ++++++++++++----------------
>  5 files changed, 45 insertions(+), 47 deletions(-)

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»





^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-09 16:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-06 13:28 bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz
2016-09-06 13:31 ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Michal Nazarewicz
2016-09-06 13:31   ` bug#24378: [PATCH 2/6] Remove inaccurate comment in regex.c Michal Nazarewicz
2016-09-06 13:31   ` bug#24378: [PATCH 3/6] Replace decimalnump with alphanumericp Michal Nazarewicz
2016-09-06 13:31   ` bug#24378: [PATCH 4/6] Remove dead loop iterations in regex.c Michal Nazarewicz
2016-09-06 13:31   ` bug#24378: [PATCH 5/6] Don’t allocate char-table’s extra slots in regexp-out-charset Michal Nazarewicz
2016-09-06 13:31   ` bug#24378: [PATCH 6/6] Factor out character category lookup to separate function Michal Nazarewicz
2016-09-06 14:54     ` Eli Zaretskii
2016-09-06 15:11       ` Michal Nazarewicz
2016-09-06 15:21         ` Eli Zaretskii
2016-09-06 15:34           ` Eli Zaretskii
2016-09-06 14:43   ` bug#24378: [PATCH 1/6] Don’t use FETCH_MULTIBYTE_CHAR when advancing index Eli Zaretskii
2016-09-06 15:17     ` Michal Nazarewicz
2016-09-06 15:27       ` Eli Zaretskii
2016-09-06 15:24     ` bug#24378: [PATCH] Remove obsolete comment from FETCH_MULTIBYTE_CHAR documentation Michal Nazarewicz
2016-09-09 16:34 ` bug#24378: [PATCH 0/6] Small fixes and improvements Michal Nazarewicz

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).