From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: Strange whitespace remains after emoji regexp replace Date: Wed, 25 Dec 2024 14:38:14 +0300 Message-ID: <15c8344dc02960139c391f6706c7307a.support1@rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25331"; mail-complaints-to="usenet@ciao.gmane.io" To: Help GNU Emacs Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 25 12:38:51 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tQPj1-0006WM-K1 for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 25 Dec 2024 12:38:51 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tQPiZ-0002tp-Pi; Wed, 25 Dec 2024 06:38:23 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQPiX-0002tb-CV for help-gnu-emacs@gnu.org; Wed, 25 Dec 2024 06:38:21 -0500 Original-Received: from stw1.rcdrun.com ([217.170.207.13]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQPiV-0001J9-NN for help-gnu-emacs@gnu.org; Wed, 25 Dec 2024 06:38:21 -0500 Original-Received: from localhost ([::ffff:41.75.177.228]) (AUTH: PLAIN admin, TLS: TLS1.3,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by stw1.rcdrun.com with ESMTPSA id 000000000007DC8B.00000000676BEEA9.0011A963; Wed, 25 Dec 2024 04:38:16 -0700 Received-SPF: pass client-ip=217.170.207.13; envelope-from=support1@rcdrun.com; helo=stw1.rcdrun.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.186, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:148969 Archived-At: THere is this function: (defun wrs-search-clean-entry (entry) "Clean and normalize a ENTRY string. Prepare it for easier searching" (let* ((entry (replace-regexp-in-string (rx (one-or-more (or (not alnum) "\n" blank))) " " entry)) (entry (replace-regexp-in-string (rx (one-or-more " ")) " " entry)) (string-trim entry)) entry)) And now this emoji here, probably, creates some strange wide white space. I do not know if anybody can see that wide whitespace, it is invisible though it comes after the first quote in the result (wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ " ️ " It is in the above position, same as X in the below position: (wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ "X " M-x describe-char gives me: position: 800 of 923 (87%), column: 50 character: SPC (displayed as SPC) (codepoint 32, #o40, #x20) charset: ascii (ASCII (ISO646 IRV)) code point in charset: 0x20 script: latin syntax: which means: whitespace category: .:Base, a:ASCII, l:Latin to input: type "C-x 8 RET 20" or "C-x 8 RET SPACE" buffer code: #x20 file code: not encodable by coding system nil display: composed to form " ️" (see below) Composed with the following character(s) "️" using this font: ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-23-*-*-*-m-0-iso10646-1 by these glyphs: [0 1 32 3 29 0 0 0 0 nil] [0 1 65039 3 29 0 0 0 0 [0 0 0]] with these character(s): ️ (#xfe0f) VARIATION SELECTOR-16 Character code properties: customize what to show name: SPACE general-category: Zs (Separator, Space) decomposition: (32) (' ') There are text properties here: fontified t The difference to normal space is that it has some ️ (#xfe0f) VARIATION SELECTOR-16 But I don't want it. I want to clean EVERYTHING what is not alpha-numeric from the string. How do I make sure of it? JEan Louis