From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Platon Pronko Newsgroups: gmane.emacs.help Subject: Re: How to convert an arbitrary string into a filename Date: Thu, 27 Apr 2023 16:06:59 +0800 Message-ID: <289d6468-3219-8d9a-be1c-b297e34e9835@gmail.com> References: <87wn1z8fgo.fsf@mbork.pl> <83h6t2u6zl.fsf@gnu.org> <834jp1ub04.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19984"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Eli Zaretskii , help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Apr 27 10:07:51 2023 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1prwfO-00052V-RF for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 27 Apr 2023 10:07:50 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1prwek-00037N-N8; Thu, 27 Apr 2023 04:07:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1prwei-000371-UR for help-gnu-emacs@gnu.org; Thu, 27 Apr 2023 04:07:09 -0400 Original-Received: from mail-pf1-x42b.google.com ([2607:f8b0:4864:20::42b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1prweh-00065T-CO; Thu, 27 Apr 2023 04:07:08 -0400 Original-Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-63b733fd00bso6427653b3a.0; Thu, 27 Apr 2023 01:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682582824; x=1685174824; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=x2inuiJDD6tSp5LxeDDS00e2aTR9V4z8glsHcbSeolo=; b=fkdt2AoZfoO5fqf68WF8qQScAOQ3AS3N3CCrB1Ee2dwAL/0AOpb8A7eCthqDqV4YRe +Rc7x0/CdSyN9bHZtp1EJUsmjaJc4wqaw03imwt/1ryl2xsQ052qFeG7xok53fetqOun Yeyn2E8ommTeKVdLg1kiXZTqHPAhOx+T9ecUQpY/XdPn2brIkIjZJJsijT60k2I1s9SS jEx2Xm6d9/ap5HCP3MnauDwE1tkkvoDKj3FwEh78ZEWK+yXzfeG7N0fYEmhUKWSGjVj6 VDUFt+QVIaecbWbHjnBFZF0ODr5Jgo96pp7pDyQ8JYzcLoTp4WUVjEz7cxqv+ZqEnBaN Gngg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682582824; x=1685174824; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=x2inuiJDD6tSp5LxeDDS00e2aTR9V4z8glsHcbSeolo=; b=N5SKR4XcCrWOhnRFyPslJyaps6KO0AOwzJPtl6FCP+9l4eBd6z/+hiE6JNeNb1+JpW lKEjU3qW3xNrh2Nc09tMBh7DZl7tLcAWg4ehiKfkLZi7GasJJ05TiwY/EEbiWL3VV7Tm c4YUGgEsrd2w39GIUWPDuKF9Ah2EDzmFpp1dwpTI/Bbq/TxHhzyi+u5x+UTQ3Kpnm1Ib ylljwC91/JzcBzJmAkwC0tp9v0SegOa+uSiHIkEqHG/RlJ5eroPaD3GEsxv2F5i0oY8Y VktnIEQ4uvt0P8qmfWIIMtihSim23g5rNs0JG/kPQXEhK1yqsbnllilalhmhRIV5msO8 gq3A== X-Gm-Message-State: AC+VfDy71zsctsXekt/JqRPMKU0Xd6HIqqAPXYNOFqh2W3Hr3AUBAtc3 jR50Ol7f3J0tdXABKdtanvuRb+ePjUa3Cg== X-Google-Smtp-Source: ACHHUZ6pelRZnNdkdqMaJqav1GW6ZYkueXVkpvCKRgzeifv8cYqvhXs8lXSyYQN+K2v3i2yRq6zu5A== X-Received: by 2002:a05:6a00:22ce:b0:637:f447:9916 with SMTP id f14-20020a056a0022ce00b00637f4479916mr1262532pfj.16.1682582824108; Thu, 27 Apr 2023 01:07:04 -0700 (PDT) Original-Received: from [192.170.1.133] ([103.24.106.35]) by smtp.gmail.com with ESMTPSA id g1-20020a056a00078100b0062ddcad2cc5sm12574145pfu.30.2023.04.27.01.07.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Apr 2023 01:07:03 -0700 (PDT) Content-Language: en-US In-Reply-To: <834jp1ub04.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::42b; envelope-from=platon7pronko@gmail.com; helo=mail-pf1-x42b.google.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-1.422, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.help:143415 Archived-At: On 2023-04-27 13:53, Eli Zaretskii wrote: >> Date: Thu, 27 Apr 2023 07:52:55 +0300 >> From: Jean Louis >> Cc: help-gnu-emacs@gnu.org >> >> * Eli Zaretskii [2023-04-26 16:09]: >>> If you need to convert an accented character to its base character >>> (i.e. "remove" the accent), Emacs has much more general facilities: >>> >>> (require 'ucs-normalize) >>> (substring (ucs-normalize-NFKD-string "Ć") 0 1) >>> => "C" >> >> Alright, then like this: >> >> (defun string-slug (s &optional random) >> "Return slug for Website Revision System by using string S. >> >> RANDOM number may be added on the end." >> (let* ((random (or random nil)) >> ;; (case-fold-search t) >> (s (replace-regexp-in-string "[^[:word:]]" " " s)) >> (s (replace-regexp-in-string " +" " " s)) >> (s (substring (ucs-normalize-NFKD-string s) 0 1)) >> (s (replace-regexp-in-string "^[[:space:]]+" "" s)) >> (s (replace-regexp-in-string "[[:space:]]+$" "" s)) >> (s (replace-regexp-in-string " " "-" s)) >> (s (if random (concat s "-" (number-to-string (random-number))) s))) >> s)) >> >> (string-slug " OK, here, üößčć") ➜ "" >> >> It doesn't give good result. > > Of course. Because you didn't understand how to use > ucs-normalize-NFKD-string for your purposes. Please read its doc > string, and try to play with it, starting from the example I've shown. > I think something like this should work better: (replace-regexp-in-string ucs-normalize-combining-chars-regexp "" (ucs-normalize-NFKD-string "Ć")) The idea here is to replace "combined" codepoints with their Compatibility Decomposition, so instead of one "Ć" codepoint (0x0106) you will get "C" codepoint (0x43) followed by "combining acute accent" codepoint. Then you can regex-replace these combining characters and get the clean string.