From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Strange whitespace remains after emoji regexp replace Date: Wed, 25 Dec 2024 14:51:24 +0200 Message-ID: <86cyhg0vub.fsf@gnu.org> References: <15c8344dc02960139c391f6706c7307a.support1@rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32311"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 25 13:52:15 2024 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tQQs2-0008HD-4s for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 25 Dec 2024 13:52:14 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tQQrS-0007EB-0o; Wed, 25 Dec 2024 07:51:38 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQQrJ-000780-VC for help-gnu-emacs@gnu.org; Wed, 25 Dec 2024 07:51:31 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQQrJ-0004rR-Gt for help-gnu-emacs@gnu.org; Wed, 25 Dec 2024 07:51:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=LwaJ5onhYY7Uz83weUDueiCgvARXudxHVCqKSSTcD98=; b=F0pO9b3Mh7LlYpQKj9Gg Z8Qfg7ZZk/eIcqb3+3xrbD4qREZkfmoZvCIwO6BoFBG80E+iQM5X0kf0CIAu1WPIi4Hq+2H5Lk/AL nHiOkErSnwmT8tth0aehdVNAZUTfuFEngGNOYTGLQkjUUOzi4LF5hxzfKFLruDoEh0xu3uCDBvhW9 mQXp9/J40Z5g/oOFUBZpFOO1KSj3ie1ywCoUTp2RzHKjfX57QNGX3RXJYn5WKZ0dnx+luOJ0h465C VwDBReeWzBhkQaalHSr0J8lVsdYjDUShAZfpxC467lupC/lJwEvo4SMuMr03akN2YaqS0g1ww289H +x4c1dh+i4WuqA==; In-Reply-To: <15c8344dc02960139c391f6706c7307a.support1@rcdrun.com> (message from Jean Louis on Wed, 25 Dec 2024 14:38:14 +0300) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:148970 Archived-At: > Date: Wed, 25 Dec 2024 14:38:14 +0300 > From: Jean Louis > > THere is this function: > > (defun wrs-search-clean-entry (entry) > "Clean and normalize a ENTRY string. > > Prepare it for easier searching" > (let* ((entry (replace-regexp-in-string (rx (one-or-more (or (not alnum) "\n" blank))) " " entry)) > (entry (replace-regexp-in-string (rx (one-or-more " ")) " " entry)) > (string-trim entry)) > entry)) > > And now this emoji here, probably, creates some strange wide white > space. I do not know if anybody can see that wide whitespace, it is > invisible though it comes after the first quote in the result > > (wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ " ️ " > > It is in the above position, same as X in the below position: > (wrs-search-clean-entry "☺️ )(**(&&^%^$##@!))") ➜ "X " > > M-x describe-char > > gives me: > > position: 800 of 923 (87%), column: 50 > character: SPC (displayed as SPC) (codepoint 32, #o40, #x20) > charset: ascii (ASCII (ISO646 IRV)) > code point in charset: 0x20 > script: latin > syntax: which means: whitespace > category: .:Base, a:ASCII, l:Latin > to input: type "C-x 8 RET 20" or "C-x 8 RET SPACE" > buffer code: #x20 > file code: not encodable by coding system nil > display: composed to form " ️" (see below) > > Composed with the following character(s) "️" using this font: > ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-23-*-*-*-m-0-iso10646-1 > by these glyphs: > [0 1 32 3 29 0 0 0 0 nil] > [0 1 65039 3 29 0 0 0 0 [0 0 0]] > with these character(s): > ️ (#xfe0f) VARIATION SELECTOR-16 > > Character code properties: customize what to show > name: SPACE > general-category: Zs (Separator, Space) > decomposition: (32) (' ') > > There are text properties here: > fontified t > > The difference to normal space is that it has some ️ (#xfe0f) > VARIATION SELECTOR-16 > > But I don't want it. I want to clean EVERYTHING what is not > alpha-numeric from the string. > > How do I make sure of it? Remove the VS-16 character as well, how else?