unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 66614@debbugs.gnu.org
Subject: bug#66614: 29.1.50; Support not capitalizing words inside symbols
Date: Wed, 18 Oct 2023 15:38:34 -0400	[thread overview]
Message-ID: <ierfs274t6t.fsf@janestreet.com> (raw)
In-Reply-To: <831qdrok34.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 18 Oct 2023 21:34:55 +0300")

[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Wed, 18 Oct 2023 13:01:43 -0400
>> 
>> --- a/doc/lispref/strings.texi
>> +++ b/doc/lispref/strings.texi
>> @@ -1510,7 +1510,9 @@ Case Conversion
>>  
>>  The definition of a word is any sequence of consecutive characters that
>>  are assigned to the word constituent syntax class in the current syntax
>> -table (@pxref{Syntax Class Table}).
>> +table (@pxref{Syntax Class Table}), or if @var{case-symbols-as-words} is
>> +non-nil, also characters assigned to the symbol constituent syntax
>> +class.
>>  
>>  When @var{string-or-char} is a character, this function does the same
>>  thing as @code{upcase}.
>> @@ -1542,7 +1544,9 @@ Case Conversion
>>  
>>  The definition of a word is any sequence of consecutive characters that
>>  are assigned to the word constituent syntax class in the current syntax
>> -table (@pxref{Syntax Class Table}).
>> +table (@pxref{Syntax Class Table}), or if @var{case-symbols-as-words} is
>> +non-nil, also characters assigned to the symbol constituent syntax
>> +class.
>
> These two hunks use @var incorrectly: case-symbols-as-words is a
> literal symbol, so it should have the @code markup.

Fixed.

>> ++++
>> +** New variable 'case-symbols-as-words' to change case behavior for symbols.
>
> "Case behavior" is confusing.  I think you mean
>
>   New variable 'case-symbols-as-words' affects case operations for symbols.

Fixed.

>> +If this is set to non-nil, then case operations such as
>> +'upcase-initials' or 'replace-match' (with nil FIXEDCASE) will treat
>> +symbol constituents as if they were part of words.
>
> Don't you mean
>
>   will treat the entire symbol name as a single word
>
> ?  I find the text you used confusing, FWIW.

Fixed.

>>                                                    This is useful for
>> +programming languages and style where words in the middle of symbols
>> +are never capitalized.
>
> Likewise here: instead of talking about "words in the middle of
> symbols", wouldn't it be better to say something like
>
>   ...style where only the first letter of a symbol's name is ever
>   capitalized.
>
> ?
>
> Also, please say here that the default of this new variable is nil.

Fixed.

>> +  DEFVAR_BOOL ("case-symbols-as-words", case_symbols_as_words,
>> +	       doc: /* If non-nil, case functions treat symbol syntax as part of words.
>> +
>> +Functions such as `upcase-initials' and `replace-match' check or modify
>> +the case pattern of sequences of characters.  Normally, these operate on
>> +sequences of characters whose syntax is word constituent.  If this
>> +variable is non-nil, then they operate on sequences of characters who
>> +syntax is either word constituent or symbol constituent.
>> +
>> +This is useful for programming styles which wish to capitalize the
>> +beginning of symbols, but not capitalize individual words in a symbol.*/);
>
> Similar comments about this doc string.

Fixed.

> Also, shouldn't this variable be buffer-local?  You want certain major
> modes to set it, right?

Yes, I want certain major modes to set it, although it's also possible
that some users will want to set it globally.

Are you suggesting it should be a DEFVAR_PER_BUFFER?  I can do that, but
I didn't think it was worth putting another slot into struct buffer.
Plus DEFVAR_PER_BUFFER has bad performance (O(#buffers)) when you
let-bind it, which I expect users might want to do sometimes.

>> -	      if (SYNTAX (prevc) != Sword)
>> +	      if (SYNTAX (prevc) != Sword
>> +		  && (!case_symbols_as_words || SYNTAX (prevc) != Ssymbol))
>
> I think the code will be more clear if you use
>
> 		  && !(case_symbols_as_words && SYNTAX (prevc) == Ssymbol))

Fixed.

>>  	  else if (uppercasep (c))
>>  	    {
>>  	      some_uppercase = 1;
>> -	      if (SYNTAX (prevc) != Sword)
>> +	      if (SYNTAX (prevc) != Sword
>> +		  && (!case_symbols_as_words || SYNTAX (prevc) != Ssymbol))
>
> Same here.
>

Fixed.

>>  	      /* If the initial is a caseless word constituent,
>>  		 treat that like a lowercase initial.  */
>> -	      if (SYNTAX (prevc) != Sword)
>> +	      if (SYNTAX (prevc) != Sword
>> +		  && (!case_symbols_as_words || SYNTAX (prevc) != Ssymbol))
>>  		some_nonuppercase_initial = 1;
>
> And here.
>

Fixed.


[-- Attachment #2: 0001-Add-case-symbols-as-words-to-configure-symbol-case-b.patch --]
[-- Type: text/x-patch, Size: 8673 bytes --]

From 8286118c70288217badbbb2afd7863ae2ba6848c Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Wed, 18 Oct 2023 12:51:37 -0400
Subject: [PATCH] Add case-symbols-as-words to configure symbol case behavior

In some programming languages and styles, a symbol (or every symbol in
a sequence of symbols) might be capitalized, but the individual words
making up the symbol should never be capitalized.

For example, in OCaml, type names Look_like_this and variable names
look_like_this, but it is basically never correct for something to
Look_Like_This.  And one might have "aa_bb cc_dd ee_ff" or "Aa_bb
Cc_dd Ee_ff", but never "Aa_Bb Cc_Dd Ee_Ff".

To support this, the new variable case-symbols-as-words causes symbol
constituents to be treated as part of words only for case operations.

* src/casefiddle.c (case_ch_is_word): Add.
(case_character_impl): Use case_ch_is_word.
(case_character): Use case_ch_is_word.
(syms_of_casefiddle): Define case-symbols-as-words. (bug#66614)
* src/search.c (Freplace_match): Use case-symbols-as-words when
calculating case pattern.
* test/src/casefiddle-tests.el (casefiddle-tests--check-syms)
(casefiddle-case-symbols-as-words): Test case-symbols-as-words.
* etc/NEWS: Announce case-symbols-as-words.
* doc/lispref/strings.texi (Case Conversion): Document
case-symbols-as-words.
---
 doc/lispref/strings.texi     |  8 ++++++--
 etc/NEWS                     |  8 ++++++++
 src/casefiddle.c             | 23 +++++++++++++++++++++--
 src/search.c                 | 11 +++++++----
 test/src/casefiddle-tests.el | 12 ++++++++++++
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index 7d11db49def..665d4f9a8dc 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -1510,7 +1510,9 @@ Case Conversion
 
 The definition of a word is any sequence of consecutive characters that
 are assigned to the word constituent syntax class in the current syntax
-table (@pxref{Syntax Class Table}).
+table (@pxref{Syntax Class Table}), or if @code{case-symbols-as-words}
+is non-nil, also characters assigned to the symbol constituent syntax
+class.
 
 When @var{string-or-char} is a character, this function does the same
 thing as @code{upcase}.
@@ -1542,7 +1544,9 @@ Case Conversion
 
 The definition of a word is any sequence of consecutive characters that
 are assigned to the word constituent syntax class in the current syntax
-table (@pxref{Syntax Class Table}).
+table (@pxref{Syntax Class Table}), or if @code{case-symbols-as-words}
+is non-nil, also characters assigned to the symbol constituent syntax
+class.
 
 When the argument to @code{upcase-initials} is a character,
 @code{upcase-initials} has the same result as @code{upcase}.
diff --git a/etc/NEWS b/etc/NEWS
index 129017f7dbe..23867aafe6f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1100,6 +1100,14 @@ instead of "ctags", "ebrowse", "etags", "hexl", "emacsclient", and
 "rcs2log", when starting one of these built in programs in a
 subprocess.
 
++++
+** New variable 'case-symbols-as-words' affects case operations for symbols.
+If non-nil, then case operations such as 'upcase-initials' or
+'replace-match' (with nil FIXEDCASE) will treat the entire symbol name
+as a single word.  This is useful for programming languages and styles
+where only the first letter of a symbol's name is ever capitalized.
+It defaults to nil.
+
 +++
 ** 'x-popup-menu' now understands touch screen events.
 When a 'touchscreen-begin' or 'touchscreen-end' event is passed as the
diff --git a/src/casefiddle.c b/src/casefiddle.c
index d567a5e353a..47e8950cda6 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -92,6 +92,12 @@ prepare_casing_context (struct casing_context *ctx,
     SETUP_BUFFER_SYNTAX_TABLE ();	/* For syntax_prefix_flag_p.  */
 }
 
+static bool
+case_ch_is_word (enum syntaxcode syntax)
+{
+  return syntax == Sword || (case_symbols_as_words && syntax == Ssymbol);
+}
+
 struct casing_str_buf
 {
   unsigned char data[max (6, MAX_MULTIBYTE_LENGTH)];
@@ -115,7 +121,7 @@ case_character_impl (struct casing_str_buf *buf,
 
   /* Update inword state */
   bool was_inword = ctx->inword;
-  ctx->inword = SYNTAX (ch) == Sword &&
+  ctx->inword = case_ch_is_word (SYNTAX (ch)) &&
     (!ctx->inbuffer || was_inword || !syntax_prefix_flag_p (ch));
 
   /* Normalize flag so its one of CASE_UP, CASE_DOWN or CASE_CAPITALIZE.  */
@@ -222,7 +228,7 @@ case_character (struct casing_str_buf *buf, struct casing_context *ctx,
      has a word syntax (i.e. current character is end of word), use final
      sigma.  */
   if (was_inword && ch == GREEK_CAPITAL_LETTER_SIGMA && changed
-      && (!next || SYNTAX (STRING_CHAR (next)) != Sword))
+      && (!next || !case_ch_is_word (SYNTAX (STRING_CHAR (next)))))
     {
       buf->len_bytes = CHAR_STRING (GREEK_SMALL_LETTER_FINAL_SIGMA, buf->data);
       buf->len_chars = 1;
@@ -720,6 +726,19 @@ syms_of_casefiddle (void)
   3rd argument.  */);
   Vregion_extract_function = Qnil; /* simple.el sets this.  */
 
+  DEFVAR_BOOL ("case-symbols-as-words", case_symbols_as_words,
+	       doc: /* If non-nil, case functions treat symbol syntax as part of words.
+
+Functions such as `upcase-initials' and `replace-match' check or modify
+the case pattern of sequences of characters.  Normally, these operate on
+sequences of characters whose syntax is word constituent.  If this
+variable is non-nil, then they operate on sequences of characters whose
+syntax is either word constituent or symbol constituent.
+
+This is useful for programming languages and styles where only the first
+letter of a symbol's name is ever capitalized.*/);
+  case_symbols_as_words = 0;
+
   defsubr (&Supcase);
   defsubr (&Sdowncase);
   defsubr (&Scapitalize);
diff --git a/src/search.c b/src/search.c
index e9b29bb7179..692d8488049 100644
--- a/src/search.c
+++ b/src/search.c
@@ -2365,7 +2365,7 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
 convert NEWTEXT to all caps.  Otherwise if all words are capitalized
 in the replaced text, capitalize each word in NEWTEXT.  Note that
 what exactly is a word is determined by the syntax tables in effect
-in the current buffer.
+in the current buffer, and the variable `case-symbols-as-words'.
 
 If optional third arg LITERAL is non-nil, insert NEWTEXT literally.
 Otherwise treat `\\' as special:
@@ -2479,7 +2479,8 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
 	      /* Cannot be all caps if any original char is lower case */
 
 	      some_lowercase = 1;
-	      if (SYNTAX (prevc) != Sword)
+	      if (SYNTAX (prevc) != Sword
+		  && !(case_symbols_as_words && SYNTAX (prevc) == Ssymbol))
 		some_nonuppercase_initial = 1;
 	      else
 		some_multiletter_word = 1;
@@ -2487,7 +2488,8 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
 	  else if (uppercasep (c))
 	    {
 	      some_uppercase = 1;
-	      if (SYNTAX (prevc) != Sword)
+	      if (SYNTAX (prevc) != Sword
+		  && !(case_symbols_as_words && SYNTAX (prevc) == Ssymbol))
 		;
 	      else
 		some_multiletter_word = 1;
@@ -2496,7 +2498,8 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
 	    {
 	      /* If the initial is a caseless word constituent,
 		 treat that like a lowercase initial.  */
-	      if (SYNTAX (prevc) != Sword)
+	      if (SYNTAX (prevc) != Sword
+		  && !(case_symbols_as_words && SYNTAX (prevc) == Ssymbol))
 		some_nonuppercase_initial = 1;
 	    }
 
diff --git a/test/src/casefiddle-tests.el b/test/src/casefiddle-tests.el
index e7f4348b0c6..12984d898b9 100644
--- a/test/src/casefiddle-tests.el
+++ b/test/src/casefiddle-tests.el
@@ -294,4 +294,16 @@ casefiddle-turkish
     ;;(should (string-equal (capitalize "indIá") "İndıa"))
     ))
 
+(defun casefiddle-tests--check-syms (init with-words with-symbols)
+  (let ((case-symbols-as-words nil))
+    (should (string-equal (upcase-initials init) with-words)))
+  (let ((case-symbols-as-words t))
+    (should (string-equal (upcase-initials init) with-symbols))))
+
+(ert-deftest casefiddle-case-symbols-as-words ()
+  (casefiddle-tests--check-syms "Aa_bb Cc_dd" "Aa_Bb Cc_Dd" "Aa_bb Cc_dd")
+  (casefiddle-tests--check-syms "Aa_bb cc_DD" "Aa_Bb Cc_DD" "Aa_bb Cc_DD")
+  (casefiddle-tests--check-syms "aa_bb cc_dd" "Aa_Bb Cc_Dd" "Aa_bb Cc_dd")
+  (casefiddle-tests--check-syms "Aa_Bb Cc_Dd" "Aa_Bb Cc_Dd" "Aa_Bb Cc_Dd"))
+
 ;;; casefiddle-tests.el ends here
-- 
2.39.3


  reply	other threads:[~2023-10-18 19:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 16:32 bug#66614: 29.1.50; Support not capitalizing words inside symbols Spencer Baugh
2023-10-18 17:01 ` Spencer Baugh
2023-10-18 18:24   ` Eli Zaretskii
2023-10-18 18:55     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-18 18:34   ` Eli Zaretskii
2023-10-18 19:38     ` Spencer Baugh [this message]
2023-10-19  4:35       ` Eli Zaretskii
2023-10-21 15:11         ` sbaugh
2023-10-29 11:42           ` Eli Zaretskii
2023-10-19 10:54       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-10-21 15:13         ` sbaugh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ierfs274t6t.fsf@janestreet.com \
    --to=sbaugh@janestreet.com \
    --cc=66614@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).