From: Jean Louis <bugs@gnu.support>
To: Help GNU Emacs <help-gnu-emacs@gnu.org>
Subject: Strange whitespace remains after emoji regexp replace
Date: Wed, 25 Dec 2024 14:38:14 +0300 [thread overview]
Message-ID: <15c8344dc02960139c391f6706c7307a.support1@rcdrun.com> (raw)
THere is this function:
(defun wrs-search-clean-entry (entry)
"Clean and normalize a ENTRY string.
Prepare it for easier searching"
(let* ((entry (replace-regexp-in-string (rx (one-or-more (or (not alnum) "\n" blank))) " " entry))
(entry (replace-regexp-in-string (rx (one-or-more " ")) " " entry))
(string-trim entry))
entry))
And now this emoji here, probably, creates some strange wide white
space. I do not know if anybody can see that wide whitespace, it is
invisible though it comes after the first quote in the result
(wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ " ️ "
It is in the above position, same as X in the below position:
(wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ "X "
M-x describe-char
gives me:
position: 800 of 923 (87%), column: 50
character: SPC (displayed as SPC) (codepoint 32, #o40, #x20)
charset: ascii (ASCII (ISO646 IRV))
code point in charset: 0x20
script: latin
syntax: which means: whitespace
category: .:Base, a:ASCII, l:Latin
to input: type "C-x 8 RET 20" or "C-x 8 RET SPACE"
buffer code: #x20
file code: not encodable by coding system nil
display: composed to form " ️" (see below)
Composed with the following character(s) "️" using this font:
ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-23-*-*-*-m-0-iso10646-1
by these glyphs:
[0 1 32 3 29 0 0 0 0 nil]
[0 1 65039 3 29 0 0 0 0 [0 0 0]]
with these character(s):
️ (#xfe0f) VARIATION SELECTOR-16
Character code properties: customize what to show
name: SPACE
general-category: Zs (Separator, Space)
decomposition: (32) (' ')
There are text properties here:
fontified t
The difference to normal space is that it has some ️ (#xfe0f)
VARIATION SELECTOR-16
But I don't want it. I want to clean EVERYTHING what is not
alpha-numeric from the string.
How do I make sure of it?
JEan Louis
next reply other threads:[~2024-12-25 11:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-25 11:38 Jean Louis [this message]
2024-12-25 12:51 ` Strange whitespace remains after emoji regexp replace Eli Zaretskii
2024-12-25 13:44 ` SOLVED - " Jean Louis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=15c8344dc02960139c391f6706c7307a.support1@rcdrun.com \
--to=bugs@gnu.support \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).