From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Po Lu <luangruo@yahoo.com>,
73307-done@debbugs.gnu.org, Thomas Klausner <wiz@gatalith.at>
Subject: bug#73307: Fix ctype(3) usage
Date: Tue, 17 Sep 2024 17:05:29 -0700 [thread overview]
Message-ID: <89deb908-92a6-489f-bab9-116556ce8ce4@cs.ucla.edu> (raw)
In-Reply-To: <86r09ibiq3.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]
On 2024-09-17 04:52, Eli Zaretskii wrote:
> However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'. Paul, any comments?
Although that patch was an improvement it still had problems, as it
incorrectly assumed the string does not end in a multibyte space, and
that Emacs's locale matches the system's.
Emacs itself should not use <ctype.h> unless it knows the string is
unibyte and the system locale matches Emacs's. I scanned through its
source code looking for all problematic instances of <ctype.h> that have
crept in (except I didn't scan the MS-Windows code, where you're the
expert), and found five other places where ctype.h was obviously
misused. I installed the attached to fix these glitches and am boldly
closing this the report.
I can't easily test patch 0003, which fixes Android-specific code.
Although I think it's an improvement, in unlikely cases I suspect it
still doesn't exactly match what the Android kernel does with #! lines.
I don't know whether that matters. I'll CC this to Po Lu (my goto person
for Android) as a heads-up.
[-- Attachment #2: 0001-Fix-yes-or-no-p-with-multibyte-spaces.patch --]
[-- Type: text/x-patch, Size: 2141 bytes --]
From 43cde03fa5a663a1509a762077c11eb57a60cee8 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:22:02 -0700
Subject: [PATCH 1/4] Fix yes-or-no-p with multibyte spaces
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Problem reported by Thomas Klausner (Bug#73307).
Emacs shouldn’t use ctype.h, as it doesn’t work for multibyte
chars and it doesn’t work with Emacs’s locale model anyway.
* src/fns.c: Include syntax.h, not ctype.h.
(Fyes_or_no_p): Check the character category with SYNTAX, not
with isspace, which assumes the current locale and works only
with single-byte characters.
---
src/fns.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/src/fns.c b/src/fns.c
index 57113a8c5ed..370f7711b90 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -26,7 +26,6 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
#include <intprops.h>
#include <vla.h>
#include <errno.h>
-#include <ctype.h>
#include <math.h>
#include "lisp.h"
@@ -36,6 +35,7 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
#include "composite.h"
#include "buffer.h"
#include "intervals.h"
+#include "syntax.h"
#include "window.h"
#include "puresize.h"
#include "gnutls.h"
@@ -3576,13 +3576,15 @@ DEFUN ("yes-or-no-p", Fyes_or_no_p, Syes_or_no_p, 1, 1, 0,
if (use_short_answers)
return call1 (Qy_or_n_p, prompt);
- {
- char *s = SSDATA (prompt);
- ptrdiff_t len = strlen (s);
- if ((len > 0) && !isspace (s[len - 1]))
- prompt = CALLN (Fconcat, prompt, build_string (" "));
- }
- prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
+ ptrdiff_t promptlen = SCHARS (prompt);
+ bool prompt_ends_in_nonspace
+ = (0 < promptlen
+ && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
+ != Swhitespace));
+ AUTO_STRING (space_string, " ");
+ prompt = CALLN (Fconcat, prompt,
+ prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
+ Vyes_or_no_prompt);
specpdl_ref count = SPECPDL_INDEX ();
specbind (Qenable_recursive_minibuffers, Qt);
--
2.43.0
[-- Attachment #3: 0002-Fix-misuse-of-toupper-in-sfnt_parse_style.patch --]
[-- Type: text/x-patch, Size: 1360 bytes --]
From 58a44b6ac317c9a17bcdd208e4ec33ff9772429e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 15:23:47 -0700
Subject: [PATCH 2/4] Fix misuse of toupper in sfnt_parse_style
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/sfntfont.c: Include c-ctype.h, not ctype.h.
(sfnt_parse_style): Upcase just initial ASCII letters;
that’s good enough here.
---
src/sfntfont.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/sfntfont.c b/src/sfntfont.c
index 8c02c05e7a6..f21b6c0782f 100644
--- a/src/sfntfont.c
+++ b/src/sfntfont.c
@@ -20,7 +20,7 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
#include <config.h>
#include <fcntl.h>
-#include <ctype.h>
+#include <c-ctype.h>
#include "lisp.h"
@@ -534,12 +534,12 @@ sfnt_parse_style (Lisp_Object style_name, struct sfnt_font_desc *desc)
}
/* This token is extraneous or was not recognized. Capitalize
- the first letter and set it as the adstyle. */
+ the first letter if it's ASCII lowercase, then set the token as
+ the adstyle. */
if (strlen (single))
{
- if (islower (single[0]))
- single[0] = toupper (single[0]);
+ single[0] = c_toupper (single[0]);
if (NILP (desc->adstyle))
desc->adstyle = build_string (single);
--
2.43.0
[-- Attachment #4: 0003-Fix-some-misparsing-in-check_interpreter.patch --]
[-- Type: text/x-patch, Size: 1889 bytes --]
From e0b027d1215ed32fe0f3d0956d2d7a81552274f2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:38:53 -0700
Subject: [PATCH 3/4] Fix some #! misparsing in check_interpreter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* exec/exec.c: Do not include ctype.h, as the kernel
does not care about the locale.
(check_interpreter): Treat only spaces and tabs as white space.
Do not inspect more bytes than were read.
Although the resulting code does not exactly match what
the Android kernel does, it’s closer than what it was before.
---
exec/exec.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/exec/exec.c b/exec/exec.c
index 775a8b06b96..f31c9a89744 100644
--- a/exec/exec.c
+++ b/exec/exec.c
@@ -24,7 +24,6 @@ Copyright (C) 2023-2024 Free Software Foundation, Inc.
#include <fcntl.h>
#include <assert.h>
#include <string.h>
-#include <ctype.h>
#include <stdlib.h>
#include <sys/ptrace.h>
@@ -116,11 +115,11 @@ check_interpreter (const char *name, int fd, const char **extra)
/* Strip leading whitespace. */
start = buffer;
- while (*start && ((unsigned char) *start) < 128 && isspace (*start))
+ while (start < buffer + rc && (*start == ' ' || *start == '\t'))
++start;
/* Look for a newline character. */
- end = memchr (start, '\n', rc);
+ end = memchr (start, '\n', buffer + rc - start);
if (!end)
goto fail;
@@ -130,11 +129,12 @@ check_interpreter (const char *name, int fd, const char **extra)
*end = '\0';
/* Now look for any whitespace characters. */
- ws = strchr (start, ' ');
+ for (ws = start; *ws && *ws != ' ' && *ws != '\t'; ws++)
+ continue;
/* If there's no whitespace, return the entire start. */
- if (!ws)
+ if (!*ws)
{
if (lseek (fd, 0, SEEK_SET))
goto fail;
--
2.43.0
[-- Attachment #5: 0004-Use-c-ctype.h-in-lib-src.patch --]
[-- Type: text/x-patch, Size: 4041 bytes --]
From 865b54e2acea4fdaa3f302ed225f50281b371d6e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Sep 2024 16:54:17 -0700
Subject: [PATCH 4/4] Use c-ctype.h in lib-src
This fixes some unlikely bugs and removes the temptation
of using ctype.h. Although some uses were correct,
many weren't.
* lib-src/ebrowse.c: Include c-ctype.h, not ctype.h.
* lib-src/emacsclient.c: Include c-ctype.h, not ctype.h.
* lib-src/update-game-score.c: Include c-ctype.h, not ctype.h.
All uses changed.
---
lib-src/ebrowse.c | 10 +++++-----
lib-src/emacsclient.c | 6 +++---
lib-src/update-game-score.c | 4 ++--
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/lib-src/ebrowse.c b/lib-src/ebrowse.c
index 821c29272a4..13c1befdc3e 100644
--- a/lib-src/ebrowse.c
+++ b/lib-src/ebrowse.c
@@ -22,11 +22,11 @@ Copyright (C) 1992-2024 Free Software Foundation, Inc.
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
-#include <ctype.h>
#include <assert.h>
#include <getopt.h>
#include <attribute.h>
+#include <c-ctype.h>
#include <flexmember.h>
#include <min-max.h>
#include <unlocked-io.h>
@@ -1875,7 +1875,7 @@ #define RSTRING_EOF_CHECK \
int_suffixes:
/* Integer suffixes. */
- while (isalpha (c))
+ while (c_isalpha (c))
GET (c);
UNGET ();
return CINT;
@@ -1907,7 +1907,7 @@ #define RSTRING_EOF_CHECK \
}
/* Optional type suffixes. */
- while (isalpha (c))
+ while (c_isalpha (c))
GET (c);
UNGET ();
return CFLOAT;
@@ -2158,7 +2158,7 @@ init_scanner (void)
/* Set up character class vectors. */
for (i = 0; i < sizeof is_ident; ++i)
{
- if (i == '_' || isalnum (i))
+ if (i == '_' || c_isalnum (i))
is_ident[i] = 1;
if (i >= '0' && i <= '9')
@@ -2946,7 +2946,7 @@ operator_name (int *sc)
MATCH ();
/* If this is a simple operator like `+', stop now. */
- if (!isalpha ((unsigned char) *s) && *s != '(' && *s != '[')
+ if (!c_isalpha (*s) && *s != '(' && *s != '[')
break;
++tokens_matched;
diff --git a/lib-src/emacsclient.c b/lib-src/emacsclient.c
index 8e64f1e92d3..c1ffa1480ec 100644
--- a/lib-src/emacsclient.c
+++ b/lib-src/emacsclient.c
@@ -68,7 +68,6 @@ Copyright (C) 1986-2024 Free Software Foundation, Inc.
#define DEFAULT_TIMEOUT (30)
-#include <ctype.h>
#include <errno.h>
#include <getopt.h>
#include <inttypes.h>
@@ -83,6 +82,7 @@ #define DEFAULT_TIMEOUT (30)
#include <unistd.h>
#include <attribute.h>
+#include <c-ctype.h>
#include <filename.h>
#include <intprops.h>
#include <min-max.h>
@@ -2124,7 +2124,7 @@ main (int argc, char **argv)
unsigned char c;
do
c = *++p;
- while (isdigit (c) || c == ':');
+ while (c_isdigit (c) || c == ':');
if (c == 0)
{
@@ -2136,7 +2136,7 @@ main (int argc, char **argv)
}
#ifdef WINDOWSNT
else if (! IS_ABSOLUTE_FILE_NAME (argv[i])
- && (isalpha (argv[i][0]) && argv[i][1] == ':'))
+ && (c_isalpha (argv[i][0]) && argv[i][1] == ':'))
/* Windows can have a different default directory for each
drive, so the cwd passed via "-dir" is not sufficient
to account for that.
diff --git a/lib-src/update-game-score.c b/lib-src/update-game-score.c
index e3b24ad7717..7545d0390c1 100644
--- a/lib-src/update-game-score.c
+++ b/lib-src/update-game-score.c
@@ -41,11 +41,11 @@ Copyright (C) 2002-2024 Free Software Foundation, Inc.
#include <stdlib.h>
#include <time.h>
#include <pwd.h>
-#include <ctype.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <getopt.h>
+#include <c-ctype.h>
#include <unlocked-io.h>
#ifdef WINDOWSNT
@@ -143,7 +143,7 @@ normalize_integer (char *num)
{
bool neg;
char *p;
- while (*num != '\n' && isspace (*num))
+ while (*num != '\n' && c_isspace (*num))
num++;
neg = *num == '-';
num += neg || *num == '-';
--
2.43.0
next prev parent reply other threads:[~2024-09-18 0:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-16 21:25 bug#73307: Fix ctype(3) usage Thomas Klausner
2024-09-17 11:52 ` Eli Zaretskii
2024-09-17 11:55 ` Thomas Klausner
2024-09-18 0:05 ` Paul Eggert [this message]
2024-09-18 2:13 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 11:45 ` Eli Zaretskii
2024-09-18 15:18 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-18 15:35 ` Paul Eggert
2024-09-18 16:11 ` Eli Zaretskii
2024-09-18 16:22 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89deb908-92a6-489f-bab9-116556ce8ce4@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=73307-done@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=luangruo@yahoo.com \
--cc=wiz@gatalith.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.