all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#73307: Fix ctype(3) usage
@ 2024-09-16 21:25 Thomas Klausner
  2024-09-17 11:52 ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Klausner @ 2024-09-16 21:25 UTC (permalink / raw)
  To: 73307

[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]

Tags: patch


When compiling emacs on NetBSD-10.99.12/amd64, I get the following
warning

In file included from /usr/include/ctype.h:100,
                 from fns.c:29:
fns.c: In function 'Fyes_or_no_p':
fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
 3582 |     if ((len > 0) && !isspace (s[len - 1]))
      |                                 ^

The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
is quite explicit about the problems with this - ctype(3) functions only
accept -1 and "unsigned char" and you can get very weird problems if
this is disregarded.

The attached patch adds the missing cast.


In GNU Emacs 31.0.50 (build 1, x86_64--netbsd, GTK+ Version 3.24.43,
 cairo version 1.18.0) of 2024-09-16
Repository revision: f27553c30a772a0103d2e6762e4d7f588f302e4b
Repository branch: HEAD
System Description: NetBSD 10.99.12/amd64

Configured using:
 'configure --srcdir=/scratch/wip/emacs-git/work/emacs
 --localstatedir=/var --with-native-compilation --without-ns
 --without-imagemagick --without-xaw3d --with-x-toolkit=gtk3
 --prefix=/usr/pkg --build=x86_64--netbsd --host=x86_64--netbsd
 --infodir=/usr/pkg/info --mandir=/usr/pkg/man
 --enable-option-checking=yes 'CFLAGS=-O2 -g -g -fstack-clash-protection
 -I/usr/pkg/include -I/usr/include -I/usr/pkg/include/freetype2
 -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0
 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/harfbuzz
 -I/usr/pkg/include/libdrm' 'CPPFLAGS=-g -I/usr/pkg/include
 -I/usr/include -I/usr/pkg/include/freetype2 -I/usr/pkg/include/glib-2.0
 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include
 -I/usr/pkg/include/harfbuzz -I/usr/pkg/include/libdrm'
 'LDFLAGS=-Wl,-R/usr/pkg/gcc14/lib -Wl,-zrelro -Wl,-znow -L/usr/pkg/lib
 -Wl,-R/usr/pkg/lib -L/usr/lib -Wl,-R/usr/lib''


[-- Attachment #2: patch-src_fns.c --]
[-- Type: text/x-c, Size: 460 bytes --]

$NetBSD$

Fix ctype(3) usage.

--- src/fns.c.orig	2024-09-16 21:11:40.908684144 +0000
+++ src/fns.c
@@ -3579,7 +3579,7 @@ by a mouse, or by some window-system ges
   {
     char *s = SSDATA (prompt);
     ptrdiff_t len = strlen (s);
-    if ((len > 0) && !isspace (s[len - 1]))
+    if ((len > 0) && !isspace ((unsigned char)s[len - 1]))
       prompt = CALLN (Fconcat, prompt, build_string (" "));
   }
   prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-16 21:25 bug#73307: Fix ctype(3) usage Thomas Klausner
@ 2024-09-17 11:52 ` Eli Zaretskii
  2024-09-17 11:55   ` Thomas Klausner
  2024-09-18  0:05   ` Paul Eggert
  0 siblings, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2024-09-17 11:52 UTC (permalink / raw)
  To: Thomas Klausner, Paul Eggert; +Cc: 73307

> Date: Mon, 16 Sep 2024 23:25:16 +0200
> From: Thomas Klausner <wiz@gatalith.at>
> 
> When compiling emacs on NetBSD-10.99.12/amd64, I get the following
> warning
> 
> In file included from /usr/include/ctype.h:100,
>                  from fns.c:29:
> fns.c: In function 'Fyes_or_no_p':
> fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
>  3582 |     if ((len > 0) && !isspace (s[len - 1]))
>       |                                 ^
> 
> The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
> is quite explicit about the problems with this - ctype(3) functions only
> accept -1 and "unsigned char" and you can get very weird problems if
> this is disregarded.
> 
> The attached patch adds the missing cast.
> [...]
>
> --- src/fns.c.orig	2024-09-16 21:11:40.908684144 +0000
> +++ src/fns.c
> @@ -3579,7 +3579,7 @@ by a mouse, or by some window-system ges
>    {
>      char *s = SSDATA (prompt);
>      ptrdiff_t len = strlen (s);
> -    if ((len > 0) && !isspace (s[len - 1]))
> +    if ((len > 0) && !isspace ((unsigned char)s[len - 1]))
>        prompt = CALLN (Fconcat, prompt, build_string (" "));
>    }
>    prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);

Thanks.  However, I'm not sure this is the right fix, the function is
defined with argument type of 'int'.  Paul, any comments?





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-17 11:52 ` Eli Zaretskii
@ 2024-09-17 11:55   ` Thomas Klausner
  2024-09-18  0:05   ` Paul Eggert
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Klausner @ 2024-09-17 11:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 73307, Paul Eggert

On Tue, Sep 17, 2024 at 02:52:20PM +0300, Eli Zaretskii wrote:
> Thanks.  However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'.  Paul, any comments?

That is exactly the problem.

If you pass a 'signed char' below -1, it gets converted to the same
integer value, also a negative number below -1, and you end up passing
an invalid argument to the function. The function is defined to only
accept EOF and non-negative values in the range of 'unsigned char'.

-1 is only special because that's usually the value of EOF.
 Thomas





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-17 11:52 ` Eli Zaretskii
  2024-09-17 11:55   ` Thomas Klausner
@ 2024-09-18  0:05   ` Paul Eggert
  2024-09-18  2:13     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-18 11:45     ` Eli Zaretskii
  1 sibling, 2 replies; 10+ messages in thread
From: Paul Eggert @ 2024-09-18  0:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Po Lu, 73307-done, Thomas Klausner

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On 2024-09-17 04:52, Eli Zaretskii wrote:
> However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'.  Paul, any comments?

Although that patch was an improvement it still had problems, as it 
incorrectly assumed the string does not end in a multibyte space, and 
that Emacs's locale matches the system's.

Emacs itself should not use <ctype.h> unless it knows the string is 
unibyte and the system locale matches Emacs's. I scanned through its 
source code looking for all problematic instances of <ctype.h> that have 
crept in (except I didn't scan the MS-Windows code, where you're the 
expert), and found five other places where ctype.h was obviously 
misused. I installed the attached to fix these glitches and am boldly 
closing this the report.

I can't easily test patch 0003, which fixes Android-specific code. 
Although I think it's an improvement, in unlikely cases I suspect it 
still doesn't exactly match what the Android kernel does with #! lines. 
I don't know whether that matters. I'll CC this to Po Lu (my goto person 
for Android) as a heads-up.

[-- Attachment #2: 0001-Fix-yes-or-no-p-with-multibyte-spaces.patch --]
[-- Type: text/x-patch, Size: 2141 bytes --]

From 43cde03fa5a663a1509a762077c11eb57a60cee8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:22:02 -0700
Subject: [PATCH 1/4] Fix yes-or-no-p with multibyte spaces
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Thomas Klausner (Bug#73307).
Emacs shouldn’t use ctype.h, as it doesn’t work for multibyte
chars and it doesn’t work with Emacs’s locale model anyway.
* src/fns.c: Include syntax.h, not ctype.h.
(Fyes_or_no_p): Check the character category with SYNTAX, not
with isspace, which assumes the current locale and works only
with single-byte characters.
---
 src/fns.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/fns.c b/src/fns.c
index 57113a8c5ed..370f7711b90 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -26,7 +26,6 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include <intprops.h>
 #include <vla.h>
 #include <errno.h>
-#include <ctype.h>
 #include <math.h>
 
 #include "lisp.h"
@@ -36,6 +35,7 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include "composite.h"
 #include "buffer.h"
 #include "intervals.h"
+#include "syntax.h"
 #include "window.h"
 #include "puresize.h"
 #include "gnutls.h"
@@ -3576,13 +3576,15 @@ DEFUN ("yes-or-no-p", Fyes_or_no_p, Syes_or_no_p, 1, 1, 0,
   if (use_short_answers)
     return call1 (Qy_or_n_p, prompt);
 
-  {
-    char *s = SSDATA (prompt);
-    ptrdiff_t len = strlen (s);
-    if ((len > 0) && !isspace (s[len - 1]))
-      prompt = CALLN (Fconcat, prompt, build_string (" "));
-  }
-  prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
+  ptrdiff_t promptlen = SCHARS (prompt);
+  bool prompt_ends_in_nonspace
+    = (0 < promptlen
+       && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
+	   != Swhitespace));
+  AUTO_STRING (space_string, " ");
+  prompt = CALLN (Fconcat, prompt,
+		  prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
+		  Vyes_or_no_prompt);
 
   specpdl_ref count = SPECPDL_INDEX ();
   specbind (Qenable_recursive_minibuffers, Qt);
-- 
2.43.0


[-- Attachment #3: 0002-Fix-misuse-of-toupper-in-sfnt_parse_style.patch --]
[-- Type: text/x-patch, Size: 1360 bytes --]

From 58a44b6ac317c9a17bcdd208e4ec33ff9772429e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:23:47 -0700
Subject: [PATCH 2/4] Fix misuse of toupper in sfnt_parse_style
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/sfntfont.c: Include c-ctype.h, not ctype.h.
(sfnt_parse_style): Upcase just initial ASCII letters;
that’s good enough here.
---
 src/sfntfont.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/sfntfont.c b/src/sfntfont.c
index 8c02c05e7a6..f21b6c0782f 100644
--- a/src/sfntfont.c
+++ b/src/sfntfont.c
@@ -20,7 +20,7 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
 #include <config.h>
 
 #include <fcntl.h>
-#include <ctype.h>
+#include <c-ctype.h>
 
 #include "lisp.h"
 
@@ -534,12 +534,12 @@ sfnt_parse_style (Lisp_Object style_name, struct sfnt_font_desc *desc)
 	}
 
       /* This token is extraneous or was not recognized.  Capitalize
-	 the first letter and set it as the adstyle.  */
+	 the first letter if it's ASCII lowercase, then set the token as
+	 the adstyle.  */
 
       if (strlen (single))
 	{
-	  if (islower (single[0]))
-	    single[0] = toupper (single[0]);
+	  single[0] = c_toupper (single[0]);
 
 	  if (NILP (desc->adstyle))
 	    desc->adstyle = build_string (single);
-- 
2.43.0


[-- Attachment #4: 0003-Fix-some-misparsing-in-check_interpreter.patch --]
[-- Type: text/x-patch, Size: 1889 bytes --]

From e0b027d1215ed32fe0f3d0956d2d7a81552274f2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:38:53 -0700
Subject: [PATCH 3/4] Fix some #! misparsing in check_interpreter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* exec/exec.c: Do not include ctype.h, as the kernel
does not care about the locale.
(check_interpreter): Treat only spaces and tabs as white space.
Do not inspect more bytes than were read.
Although the resulting code does not exactly match what
the Android kernel does, it’s closer than what it was before.
---
 exec/exec.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/exec/exec.c b/exec/exec.c
index 775a8b06b96..f31c9a89744 100644
--- a/exec/exec.c
+++ b/exec/exec.c
@@ -24,7 +24,6 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
 #include <fcntl.h>
 #include <assert.h>
 #include <string.h>
-#include <ctype.h>
 #include <stdlib.h>
 
 #include <sys/ptrace.h>
@@ -116,11 +115,11 @@ check_interpreter (const char *name, int fd, const char **extra)
 
   /* Strip leading whitespace.  */
   start = buffer;
-  while (*start && ((unsigned char) *start) < 128 && isspace (*start))
+  while (start < buffer + rc && (*start == ' ' || *start == '\t'))
     ++start;
 
   /* Look for a newline character.  */
-  end = memchr (start, '\n', rc);
+  end = memchr (start, '\n', buffer + rc - start);
 
   if (!end)
     goto fail;
@@ -130,11 +129,12 @@ check_interpreter (const char *name, int fd, const char **extra)
   *end = '\0';
 
   /* Now look for any whitespace characters.  */
-  ws = strchr (start, ' ');
+  for (ws = start; *ws && *ws != ' ' && *ws != '\t'; ws++)
+    continue;
 
   /* If there's no whitespace, return the entire start.  */
 
-  if (!ws)
+  if (!*ws)
     {
       if (lseek (fd, 0, SEEK_SET))
 	goto fail;
-- 
2.43.0


[-- Attachment #5: 0004-Use-c-ctype.h-in-lib-src.patch --]
[-- Type: text/x-patch, Size: 4041 bytes --]

From 865b54e2acea4fdaa3f302ed225f50281b371d6e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:54:17 -0700
Subject: [PATCH 4/4] Use c-ctype.h in lib-src

This fixes some unlikely bugs and removes the temptation
of using ctype.h.  Although some uses were correct,
many weren't.
* lib-src/ebrowse.c: Include c-ctype.h, not ctype.h.
* lib-src/emacsclient.c: Include c-ctype.h, not ctype.h.
* lib-src/update-game-score.c: Include c-ctype.h, not ctype.h.
All uses changed.
---
 lib-src/ebrowse.c           | 10 +++++-----
 lib-src/emacsclient.c       |  6 +++---
 lib-src/update-game-score.c |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib-src/ebrowse.c b/lib-src/ebrowse.c
index 821c29272a4..13c1befdc3e 100644
--- a/lib-src/ebrowse.c
+++ b/lib-src/ebrowse.c
@@ -22,11 +22,11 @@ Copyright (C) 1992-2024 Free Software Foundation, Inc.
 #include <stddef.h>
 #include <stdlib.h>
 #include <string.h>
-#include <ctype.h>
 #include <assert.h>
 #include <getopt.h>
 
 #include <attribute.h>
+#include <c-ctype.h>
 #include <flexmember.h>
 #include <min-max.h>
 #include <unlocked-io.h>
@@ -1875,7 +1875,7 @@ #define RSTRING_EOF_CHECK						\
 
         int_suffixes:
           /* Integer suffixes.  */
-          while (isalpha (c))
+          while (c_isalpha (c))
             GET (c);
           UNGET ();
           return CINT;
@@ -1907,7 +1907,7 @@ #define RSTRING_EOF_CHECK						\
             }
 
           /* Optional type suffixes.  */
-          while (isalpha (c))
+          while (c_isalpha (c))
             GET (c);
 	  UNGET ();
           return CFLOAT;
@@ -2158,7 +2158,7 @@ init_scanner (void)
   /* Set up character class vectors.  */
   for (i = 0; i < sizeof is_ident; ++i)
     {
-      if (i == '_' || isalnum (i))
+      if (i == '_' || c_isalnum (i))
         is_ident[i] = 1;
 
       if (i >= '0' && i <= '9')
@@ -2946,7 +2946,7 @@ operator_name (int *sc)
           MATCH ();
 
 	  /* If this is a simple operator like `+', stop now.  */
-	  if (!isalpha ((unsigned char) *s) && *s != '(' && *s != '[')
+	  if (!c_isalpha (*s) && *s != '(' && *s != '[')
 	    break;
 
 	  ++tokens_matched;
diff --git a/lib-src/emacsclient.c b/lib-src/emacsclient.c
index 8e64f1e92d3..c1ffa1480ec 100644
--- a/lib-src/emacsclient.c
+++ b/lib-src/emacsclient.c
@@ -68,7 +68,6 @@ Copyright (C) 1986-2024 Free Software Foundation, Inc.
 
 #define DEFAULT_TIMEOUT (30)
 
-#include <ctype.h>
 #include <errno.h>
 #include <getopt.h>
 #include <inttypes.h>
@@ -83,6 +82,7 @@ #define DEFAULT_TIMEOUT (30)
 #include <unistd.h>
 
 #include <attribute.h>
+#include <c-ctype.h>
 #include <filename.h>
 #include <intprops.h>
 #include <min-max.h>
@@ -2124,7 +2124,7 @@ main (int argc, char **argv)
 	      unsigned char c;
 	      do
 		c = *++p;
-	      while (isdigit (c) || c == ':');
+	      while (c_isdigit (c) || c == ':');
 
 	      if (c == 0)
                 {
@@ -2136,7 +2136,7 @@ main (int argc, char **argv)
             }
 #ifdef WINDOWSNT
 	  else if (! IS_ABSOLUTE_FILE_NAME (argv[i])
-		   && (isalpha (argv[i][0]) && argv[i][1] == ':'))
+		   && (c_isalpha (argv[i][0]) && argv[i][1] == ':'))
 	    /* Windows can have a different default directory for each
 	       drive, so the cwd passed via "-dir" is not sufficient
 	       to account for that.
diff --git a/lib-src/update-game-score.c b/lib-src/update-game-score.c
index e3b24ad7717..7545d0390c1 100644
--- a/lib-src/update-game-score.c
+++ b/lib-src/update-game-score.c
@@ -41,11 +41,11 @@ Copyright (C) 2002-2024 Free Software Foundation, Inc.
 #include <stdlib.h>
 #include <time.h>
 #include <pwd.h>
-#include <ctype.h>
 #include <fcntl.h>
 #include <sys/stat.h>
 #include <getopt.h>
 
+#include <c-ctype.h>
 #include <unlocked-io.h>
 
 #ifdef WINDOWSNT
@@ -143,7 +143,7 @@ normalize_integer (char *num)
 {
   bool neg;
   char *p;
-  while (*num != '\n' && isspace (*num))
+  while (*num != '\n' && c_isspace (*num))
     num++;
   neg = *num == '-';
   num += neg || *num == '-';
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18  0:05   ` Paul Eggert
@ 2024-09-18  2:13     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-18 11:45     ` Eli Zaretskii
  1 sibling, 0 replies; 10+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-18  2:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, 73307-done, Thomas Klausner

Paul Eggert <eggert@cs.ucla.edu> writes:

> I can't easily test patch 0003, which fixes Android-specific
> code. Although I think it's an improvement, in unlikely cases I
> suspect it still doesn't exactly match what the Android kernel does
> with #! lines. I don't know whether that matters. I'll CC this to Po
> Lu (my goto person for Android) as a heads-up.

It's OK.  Though I have never encountered shebangs with tabs.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18  0:05   ` Paul Eggert
  2024-09-18  2:13     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-09-18 11:45     ` Eli Zaretskii
  2024-09-18 15:18       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-18 15:35       ` Paul Eggert
  1 sibling, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2024-09-18 11:45 UTC (permalink / raw)
  To: Paul Eggert, Stefan Monnier; +Cc: luangruo, 73307, wiz

> Date: Tue, 17 Sep 2024 17:05:29 -0700
> Cc: 73307-done@debbugs.gnu.org, Thomas Klausner <wiz@gatalith.at>,
>  Po Lu <luangruo@yahoo.com>
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 2024-09-17 04:52, Eli Zaretskii wrote:
> > However, I'm not sure this is the right fix, the function is
> > defined with argument type of 'int'.  Paul, any comments?
> 
> Although that patch was an improvement it still had problems, as it 
> incorrectly assumed the string does not end in a multibyte space, and 
> that Emacs's locale matches the system's.
> 
> Emacs itself should not use <ctype.h> unless it knows the string is 
> unibyte and the system locale matches Emacs's. I scanned through its 
> source code looking for all problematic instances of <ctype.h> that have 
> crept in (except I didn't scan the MS-Windows code, where you're the 
> expert), and found five other places where ctype.h was obviously 
> misused. I installed the attached to fix these glitches and am boldly 
> closing this the report.
> 
> I can't easily test patch 0003, which fixes Android-specific code. 
> Although I think it's an improvement, in unlikely cases I suspect it 
> still doesn't exactly match what the Android kernel does with #! lines. 
> I don't know whether that matters. I'll CC this to Po Lu (my goto person 
> for Android) as a heads-up.

Thanks.

This part doesn't look right to me:

> -  {
> -    char *s = SSDATA (prompt);
> -    ptrdiff_t len = strlen (s);
> -    if ((len > 0) && !isspace (s[len - 1]))
> -      prompt = CALLN (Fconcat, prompt, build_string (" "));
> -  }
> -  prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
> +  ptrdiff_t promptlen = SCHARS (prompt);
> +  bool prompt_ends_in_nonspace
> +    = (0 < promptlen
> +       && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
> +	   != Swhitespace));
> +  AUTO_STRING (space_string, " ");
> +  prompt = CALLN (Fconcat, prompt,
> +		  prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
> +		  Vyes_or_no_prompt);

You are testing characters for whitespace syntax, which is AFAIU
subject to buffer-local syntax tables.  Thus, a strange enough setting
of buffer syntax could make this code return unexpected results.

Stefan, am I right?

Why can't we use something closer to the original code, which doesn't
depend on buffer-local customizations?  In particular, what's the
problem with using c_isspace (and what do you mean by "multibyte
space"?)






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18 11:45     ` Eli Zaretskii
@ 2024-09-18 15:18       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-18 15:35       ` Paul Eggert
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-18 15:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, 73307, Paul Eggert, wiz

> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables.  Thus, a strange enough setting
> of buffer syntax could make this code return unexpected results.
>
> Stefan, am I right?

Yup.

> Why can't we use something closer to the original code, which doesn't
> depend on buffer-local customizations?

We could use the global/default syntax table?
[ Or rely on some of the unicode tables?  ]


        Stefan






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18 11:45     ` Eli Zaretskii
  2024-09-18 15:18       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-09-18 15:35       ` Paul Eggert
  2024-09-18 16:11         ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2024-09-18 15:35 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: luangruo, 73307, wiz

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

On 2024-09-18 04:45, Eli Zaretskii wrote:
> what's the
> problem with using c_isspace (and what do you mean by "multibyte
> space"

Using c_isspace would be OK, in that it'd handle most practical 
examples. I wrote the fancier version to handle prompts that end in 
non-ASCII spaces, e.g., (yes-or-no-p "Delete all files? ") where the 
last character is actually U+00A0 NO-BREAK_SPACE instead of U+0020 
SPACE, so its UTF-8 encoding is multiple bytes. If it's not important to 
handle such cases we could just use c_isspace.


> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables.

Thanks, I didn't think of that. How about the attached patch? It is an 
alternative to just using c_isspace.

[-- Attachment #2: fns.c.diff --]
[-- Type: text/x-patch, Size: 825 bytes --]

diff --git a/src/fns.c b/src/fns.c
index 370f7711b90..d3ee98c3bae 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -35,7 +35,6 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include "composite.h"
 #include "buffer.h"
 #include "intervals.h"
-#include "syntax.h"
 #include "window.h"
 #include "puresize.h"
 #include "gnutls.h"
@@ -3579,8 +3578,7 @@ DEFUN ("yes-or-no-p", Fyes_or_no_p, Syes_or_no_p, 1, 1, 0,
   ptrdiff_t promptlen = SCHARS (prompt);
   bool prompt_ends_in_nonspace
     = (0 < promptlen
-       && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
-	   != Swhitespace));
+       && !blankp (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1)))));
   AUTO_STRING (space_string, " ");
   prompt = CALLN (Fconcat, prompt,
 		  prompt_ends_in_nonspace ? space_string : empty_unibyte_string,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18 15:35       ` Paul Eggert
@ 2024-09-18 16:11         ` Eli Zaretskii
  2024-09-18 16:22           ` Paul Eggert
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2024-09-18 16:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: luangruo, 73307, wiz, monnier

> Date: Wed, 18 Sep 2024 08:35:18 -0700
> Cc: 73307@debbugs.gnu.org, wiz@gatalith.at, luangruo@yahoo.com
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> > You are testing characters for whitespace syntax, which is AFAIU
> > subject to buffer-local syntax tables.
> 
> Thanks, I didn't think of that. How about the attached patch? It is an 
> alternative to just using c_isspace.

That's much better, thanks.  But I'm not sure we really want to allow
any Zs character there.  For example, what about U+3000 IDEOGRAPHIC
SPACE?  But if this is okay with us, please install.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73307: Fix ctype(3) usage
  2024-09-18 16:11         ` Eli Zaretskii
@ 2024-09-18 16:22           ` Paul Eggert
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Eggert @ 2024-09-18 16:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, 73307, wiz, monnier

On 2024-09-18 09:11, Eli Zaretskii wrote:
> what about U+3000 IDEOGRAPHIC SPACE?

That should be OK. For (yes-or-no-p "您想删除所有文件吗? "), where the prompt 
ends in U+3000, no space is needed. I installed the patch.





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-09-18 16:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-16 21:25 bug#73307: Fix ctype(3) usage Thomas Klausner
2024-09-17 11:52 ` Eli Zaretskii
2024-09-17 11:55   ` Thomas Klausner
2024-09-18  0:05   ` Paul Eggert
2024-09-18  2:13     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 11:45     ` Eli Zaretskii
2024-09-18 15:18       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 15:35       ` Paul Eggert
2024-09-18 16:11         ` Eli Zaretskii
2024-09-18 16:22           ` Paul Eggert

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.