all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Po Lu <luangruo@yahoo.com>,
	73307-done@debbugs.gnu.org, Thomas Klausner <wiz@gatalith.at>
Subject: bug#73307: Fix ctype(3) usage
Date: Tue, 17 Sep 2024 17:05:29 -0700	[thread overview]
Message-ID: <89deb908-92a6-489f-bab9-116556ce8ce4@cs.ucla.edu> (raw)
In-Reply-To: <86r09ibiq3.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On 2024-09-17 04:52, Eli Zaretskii wrote:
> However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'.  Paul, any comments?

Although that patch was an improvement it still had problems, as it 
incorrectly assumed the string does not end in a multibyte space, and 
that Emacs's locale matches the system's.

Emacs itself should not use <ctype.h> unless it knows the string is 
unibyte and the system locale matches Emacs's. I scanned through its 
source code looking for all problematic instances of <ctype.h> that have 
crept in (except I didn't scan the MS-Windows code, where you're the 
expert), and found five other places where ctype.h was obviously 
misused. I installed the attached to fix these glitches and am boldly 
closing this the report.

I can't easily test patch 0003, which fixes Android-specific code. 
Although I think it's an improvement, in unlikely cases I suspect it 
still doesn't exactly match what the Android kernel does with #! lines. 
I don't know whether that matters. I'll CC this to Po Lu (my goto person 
for Android) as a heads-up.

[-- Attachment #2: 0001-Fix-yes-or-no-p-with-multibyte-spaces.patch --]
[-- Type: text/x-patch, Size: 2141 bytes --]

From 43cde03fa5a663a1509a762077c11eb57a60cee8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:22:02 -0700
Subject: [PATCH 1/4] Fix yes-or-no-p with multibyte spaces
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Thomas Klausner (Bug#73307).
Emacs shouldn’t use ctype.h, as it doesn’t work for multibyte
chars and it doesn’t work with Emacs’s locale model anyway.
* src/fns.c: Include syntax.h, not ctype.h.
(Fyes_or_no_p): Check the character category with SYNTAX, not
with isspace, which assumes the current locale and works only
with single-byte characters.
---
 src/fns.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/fns.c b/src/fns.c
index 57113a8c5ed..370f7711b90 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -26,7 +26,6 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include <intprops.h>
 #include <vla.h>
 #include <errno.h>
-#include <ctype.h>
 #include <math.h>
 
 #include "lisp.h"
@@ -36,6 +35,7 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include "composite.h"
 #include "buffer.h"
 #include "intervals.h"
+#include "syntax.h"
 #include "window.h"
 #include "puresize.h"
 #include "gnutls.h"
@@ -3576,13 +3576,15 @@ DEFUN ("yes-or-no-p", Fyes_or_no_p, Syes_or_no_p, 1, 1, 0,
   if (use_short_answers)
     return call1 (Qy_or_n_p, prompt);
 
-  {
-    char *s = SSDATA (prompt);
-    ptrdiff_t len = strlen (s);
-    if ((len > 0) && !isspace (s[len - 1]))
-      prompt = CALLN (Fconcat, prompt, build_string (" "));
-  }
-  prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
+  ptrdiff_t promptlen = SCHARS (prompt);
+  bool prompt_ends_in_nonspace
+    = (0 < promptlen
+       && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
+	   != Swhitespace));
+  AUTO_STRING (space_string, " ");
+  prompt = CALLN (Fconcat, prompt,
+		  prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
+		  Vyes_or_no_prompt);
 
   specpdl_ref count = SPECPDL_INDEX ();
   specbind (Qenable_recursive_minibuffers, Qt);
-- 
2.43.0


[-- Attachment #3: 0002-Fix-misuse-of-toupper-in-sfnt_parse_style.patch --]
[-- Type: text/x-patch, Size: 1360 bytes --]

From 58a44b6ac317c9a17bcdd208e4ec33ff9772429e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:23:47 -0700
Subject: [PATCH 2/4] Fix misuse of toupper in sfnt_parse_style
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/sfntfont.c: Include c-ctype.h, not ctype.h.
(sfnt_parse_style): Upcase just initial ASCII letters;
that’s good enough here.
---
 src/sfntfont.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/sfntfont.c b/src/sfntfont.c
index 8c02c05e7a6..f21b6c0782f 100644
--- a/src/sfntfont.c
+++ b/src/sfntfont.c
@@ -20,7 +20,7 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
 #include <config.h>
 
 #include <fcntl.h>
-#include <ctype.h>
+#include <c-ctype.h>
 
 #include "lisp.h"
 
@@ -534,12 +534,12 @@ sfnt_parse_style (Lisp_Object style_name, struct sfnt_font_desc *desc)
 	}
 
       /* This token is extraneous or was not recognized.  Capitalize
-	 the first letter and set it as the adstyle.  */
+	 the first letter if it's ASCII lowercase, then set the token as
+	 the adstyle.  */
 
       if (strlen (single))
 	{
-	  if (islower (single[0]))
-	    single[0] = toupper (single[0]);
+	  single[0] = c_toupper (single[0]);
 
 	  if (NILP (desc->adstyle))
 	    desc->adstyle = build_string (single);
-- 
2.43.0


[-- Attachment #4: 0003-Fix-some-misparsing-in-check_interpreter.patch --]
[-- Type: text/x-patch, Size: 1889 bytes --]

From e0b027d1215ed32fe0f3d0956d2d7a81552274f2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:38:53 -0700
Subject: [PATCH 3/4] Fix some #! misparsing in check_interpreter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* exec/exec.c: Do not include ctype.h, as the kernel
does not care about the locale.
(check_interpreter): Treat only spaces and tabs as white space.
Do not inspect more bytes than were read.
Although the resulting code does not exactly match what
the Android kernel does, it’s closer than what it was before.
---
 exec/exec.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/exec/exec.c b/exec/exec.c
index 775a8b06b96..f31c9a89744 100644
--- a/exec/exec.c
+++ b/exec/exec.c
@@ -24,7 +24,6 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
 #include <fcntl.h>
 #include <assert.h>
 #include <string.h>
-#include <ctype.h>
 #include <stdlib.h>
 
 #include <sys/ptrace.h>
@@ -116,11 +115,11 @@ check_interpreter (const char *name, int fd, const char **extra)
 
   /* Strip leading whitespace.  */
   start = buffer;
-  while (*start && ((unsigned char) *start) < 128 && isspace (*start))
+  while (start < buffer + rc && (*start == ' ' || *start == '\t'))
     ++start;
 
   /* Look for a newline character.  */
-  end = memchr (start, '\n', rc);
+  end = memchr (start, '\n', buffer + rc - start);
 
   if (!end)
     goto fail;
@@ -130,11 +129,12 @@ check_interpreter (const char *name, int fd, const char **extra)
   *end = '\0';
 
   /* Now look for any whitespace characters.  */
-  ws = strchr (start, ' ');
+  for (ws = start; *ws && *ws != ' ' && *ws != '\t'; ws++)
+    continue;
 
   /* If there's no whitespace, return the entire start.  */
 
-  if (!ws)
+  if (!*ws)
     {
       if (lseek (fd, 0, SEEK_SET))
 	goto fail;
-- 
2.43.0


[-- Attachment #5: 0004-Use-c-ctype.h-in-lib-src.patch --]
[-- Type: text/x-patch, Size: 4041 bytes --]

From 865b54e2acea4fdaa3f302ed225f50281b371d6e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:54:17 -0700
Subject: [PATCH 4/4] Use c-ctype.h in lib-src

This fixes some unlikely bugs and removes the temptation
of using ctype.h.  Although some uses were correct,
many weren't.
* lib-src/ebrowse.c: Include c-ctype.h, not ctype.h.
* lib-src/emacsclient.c: Include c-ctype.h, not ctype.h.
* lib-src/update-game-score.c: Include c-ctype.h, not ctype.h.
All uses changed.
---
 lib-src/ebrowse.c           | 10 +++++-----
 lib-src/emacsclient.c       |  6 +++---
 lib-src/update-game-score.c |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib-src/ebrowse.c b/lib-src/ebrowse.c
index 821c29272a4..13c1befdc3e 100644
--- a/lib-src/ebrowse.c
+++ b/lib-src/ebrowse.c
@@ -22,11 +22,11 @@ Copyright (C) 1992-2024 Free Software Foundation, Inc.
 #include <stddef.h>
 #include <stdlib.h>
 #include <string.h>
-#include <ctype.h>
 #include <assert.h>
 #include <getopt.h>
 
 #include <attribute.h>
+#include <c-ctype.h>
 #include <flexmember.h>
 #include <min-max.h>
 #include <unlocked-io.h>
@@ -1875,7 +1875,7 @@ #define RSTRING_EOF_CHECK						\
 
         int_suffixes:
           /* Integer suffixes.  */
-          while (isalpha (c))
+          while (c_isalpha (c))
             GET (c);
           UNGET ();
           return CINT;
@@ -1907,7 +1907,7 @@ #define RSTRING_EOF_CHECK						\
             }
 
           /* Optional type suffixes.  */
-          while (isalpha (c))
+          while (c_isalpha (c))
             GET (c);
 	  UNGET ();
           return CFLOAT;
@@ -2158,7 +2158,7 @@ init_scanner (void)
   /* Set up character class vectors.  */
   for (i = 0; i < sizeof is_ident; ++i)
     {
-      if (i == '_' || isalnum (i))
+      if (i == '_' || c_isalnum (i))
         is_ident[i] = 1;
 
       if (i >= '0' && i <= '9')
@@ -2946,7 +2946,7 @@ operator_name (int *sc)
           MATCH ();
 
 	  /* If this is a simple operator like `+', stop now.  */
-	  if (!isalpha ((unsigned char) *s) && *s != '(' && *s != '[')
+	  if (!c_isalpha (*s) && *s != '(' && *s != '[')
 	    break;
 
 	  ++tokens_matched;
diff --git a/lib-src/emacsclient.c b/lib-src/emacsclient.c
index 8e64f1e92d3..c1ffa1480ec 100644
--- a/lib-src/emacsclient.c
+++ b/lib-src/emacsclient.c
@@ -68,7 +68,6 @@ Copyright (C) 1986-2024 Free Software Foundation, Inc.
 
 #define DEFAULT_TIMEOUT (30)
 
-#include <ctype.h>
 #include <errno.h>
 #include <getopt.h>
 #include <inttypes.h>
@@ -83,6 +82,7 @@ #define DEFAULT_TIMEOUT (30)
 #include <unistd.h>
 
 #include <attribute.h>
+#include <c-ctype.h>
 #include <filename.h>
 #include <intprops.h>
 #include <min-max.h>
@@ -2124,7 +2124,7 @@ main (int argc, char **argv)
 	      unsigned char c;
 	      do
 		c = *++p;
-	      while (isdigit (c) || c == ':');
+	      while (c_isdigit (c) || c == ':');
 
 	      if (c == 0)
                 {
@@ -2136,7 +2136,7 @@ main (int argc, char **argv)
             }
 #ifdef WINDOWSNT
 	  else if (! IS_ABSOLUTE_FILE_NAME (argv[i])
-		   && (isalpha (argv[i][0]) && argv[i][1] == ':'))
+		   && (c_isalpha (argv[i][0]) && argv[i][1] == ':'))
 	    /* Windows can have a different default directory for each
 	       drive, so the cwd passed via "-dir" is not sufficient
 	       to account for that.
diff --git a/lib-src/update-game-score.c b/lib-src/update-game-score.c
index e3b24ad7717..7545d0390c1 100644
--- a/lib-src/update-game-score.c
+++ b/lib-src/update-game-score.c
@@ -41,11 +41,11 @@ Copyright (C) 2002-2024 Free Software Foundation, Inc.
 #include <stdlib.h>
 #include <time.h>
 #include <pwd.h>
-#include <ctype.h>
 #include <fcntl.h>
 #include <sys/stat.h>
 #include <getopt.h>
 
+#include <c-ctype.h>
 #include <unlocked-io.h>
 
 #ifdef WINDOWSNT
@@ -143,7 +143,7 @@ normalize_integer (char *num)
 {
   bool neg;
   char *p;
-  while (*num != '\n' && isspace (*num))
+  while (*num != '\n' && c_isspace (*num))
     num++;
   neg = *num == '-';
   num += neg || *num == '-';
-- 
2.43.0


  parent reply	other threads:[~2024-09-18  0:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-16 21:25 bug#73307: Fix ctype(3) usage Thomas Klausner
2024-09-17 11:52 ` Eli Zaretskii
2024-09-17 11:55   ` Thomas Klausner
2024-09-18  0:05   ` Paul Eggert [this message]
2024-09-18  2:13     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 11:45     ` Eli Zaretskii
2024-09-18 15:18       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 15:35       ` Paul Eggert
2024-09-18 16:11         ` Eli Zaretskii
2024-09-18 16:22           ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89deb908-92a6-489f-bab9-116556ce8ce4@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=73307-done@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=luangruo@yahoo.com \
    --cc=wiz@gatalith.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.