unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Strange whitespace remains after emoji regexp replace
@ 2024-12-25 11:38 Jean Louis
  2024-12-25 12:51 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Jean Louis @ 2024-12-25 11:38 UTC (permalink / raw)
  To: Help GNU Emacs

THere is this function:

(defun wrs-search-clean-entry (entry)
  "Clean and normalize a ENTRY string.

Prepare it for easier searching"
  (let* ((entry (replace-regexp-in-string (rx (one-or-more (or (not alnum) "\n" blank))) " " entry))
	 (entry (replace-regexp-in-string (rx (one-or-more " ")) " " entry))
	 (string-trim entry))
    entry))

And now this emoji here, probably, creates some strange wide white
space. I do not know if anybody can see that wide whitespace, it is
invisible though it comes after the first quote in the result

(wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ " ️ "

It is in the above position, same as X in the below position:
(wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ "X "

M-x describe-char

gives me:
 
             position: 800 of 923 (87%), column: 50
            character: SPC (displayed as SPC) (codepoint 32, #o40, #x20)
              charset: ascii (ASCII (ISO646 IRV))
code point in charset: 0x20
               script: latin
               syntax:   	which means: whitespace
             category: .:Base, a:ASCII, l:Latin
             to input: type "C-x 8 RET 20" or "C-x 8 RET SPACE"
          buffer code: #x20
            file code: not encodable by coding system nil
              display: composed to form " ️" (see below)

Composed with the following character(s) "️" using this font:
  ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-23-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 32 3 29 0 0 0 0 nil]
  [0 1 65039 3 29 0 0 0 0 [0 0 0]]
with these character(s):
  ️ (#xfe0f) VARIATION SELECTOR-16

Character code properties: customize what to show
  name: SPACE
  general-category: Zs (Separator, Space)
  decomposition: (32) (' ')

There are text properties here:
  fontified            t

The difference to normal space is that it has some ️ (#xfe0f)
VARIATION SELECTOR-16

But I don't want it. I want to clean EVERYTHING what is not
alpha-numeric from the string.

How do I make sure of it?



JEan Louis



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-12-25 13:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-25 11:38 Strange whitespace remains after emoji regexp replace Jean Louis
2024-12-25 12:51 ` Eli Zaretskii
2024-12-25 13:44   ` SOLVED - " Jean Louis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).