* How to convert an arbitrary string into a filename
@ 2023-04-26 3:55 Marcin Borkowski
2023-04-26 4:03 ` Platon Pronko
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-26 3:55 UTC (permalink / raw)
To: Help Gnu Emacs mailing list
Hi all,
given an arbitrary string, say "Hello, world!!!", I want to have
a filename with all the runs of weird characters (that is,
non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
there a function for that in Emacs already or should I write my own?
TIA,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
@ 2023-04-26 4:03 ` Platon Pronko
2023-04-26 4:42 ` Marcin Borkowski
2023-04-26 5:42 ` Yuri Khan
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Platon Pronko @ 2023-04-26 4:03 UTC (permalink / raw)
To: Marcin Borkowski, Help Gnu Emacs mailing list
On 2023-04-26 11:55, Marcin Borkowski wrote:
> Hi all,
>
> given an arbitrary string, say "Hello, world!!!", I want to have
> a filename with all the runs of weird characters (that is,
> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
> there a function for that in Emacs already or should I write my own?
Something like this?
(let ((input "Hello, world!!!"))
(replace-regexp-in-string "[^a-zA-Z0-9]+" "-" input))
;; "Hello-world-"
--
Best regards,
Platon Pronko
PGP 2A62D77A7A2CB94E
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 4:03 ` Platon Pronko
@ 2023-04-26 4:42 ` Marcin Borkowski
2023-04-26 5:39 ` Platon Pronko
0 siblings, 1 reply; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-26 4:42 UTC (permalink / raw)
To: Platon Pronko; +Cc: Help Gnu Emacs mailing list
On 2023-04-26, at 06:03, Platon Pronko <platon7pronko@gmail.com> wrote:
> On 2023-04-26 11:55, Marcin Borkowski wrote:
>> Hi all,
>> given an arbitrary string, say "Hello, world!!!", I want to have
>> a filename with all the runs of weird characters (that is,
>> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
>> there a function for that in Emacs already or should I write my own?
>
> Something like this?
>
> (let ((input "Hello, world!!!"))
> (replace-regexp-in-string "[^a-zA-Z0-9]+" "-" input))
> ;; "Hello-world-"
More or less, possibly with trimming - I know it's simple, I just didn't
want to do it if it already exists. Which I still don't know... (I
suppose not, but I'm not sure.)
Thanks,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 4:42 ` Marcin Borkowski
@ 2023-04-26 5:39 ` Platon Pronko
2023-04-26 18:32 ` Marcin Borkowski
0 siblings, 1 reply; 19+ messages in thread
From: Platon Pronko @ 2023-04-26 5:39 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list
On 2023-04-26 12:42, Marcin Borkowski wrote:
>
> On 2023-04-26, at 06:03, Platon Pronko <platon7pronko@gmail.com> wrote:
>
>> On 2023-04-26 11:55, Marcin Borkowski wrote:
>>> Hi all,
>>> given an arbitrary string, say "Hello, world!!!", I want to have
>>> a filename with all the runs of weird characters (that is,
>>> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
>>> there a function for that in Emacs already or should I write my own?
>>
>> Something like this?
>>
>> (let ((input "Hello, world!!!"))
>> (replace-regexp-in-string "[^a-zA-Z0-9]+" "-" input))
>> ;; "Hello-world-"
>
> More or less, possibly with trimming - I know it's simple, I just didn't
> want to do it if it already exists. Which I still don't know... (I
> suppose not, but I'm not sure.)
The term you are looking for is "slug" ("slugify").
Quick google search indicates that there's nothing built-in, but there might be some implementations inside other packages. Here's a reference to an implementation in org-roam package: https://mailb.org/pipermail/emacs-berlin/2022/000897.html
Also might be of interest:
https://github.com/masasam/emacs-easy-hugo/issues/64
https://melpa.org/#/unidecode
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
2023-04-26 4:03 ` Platon Pronko
@ 2023-04-26 5:42 ` Yuri Khan
2023-04-26 18:32 ` Marcin Borkowski
2023-04-26 10:08 ` Jean Louis
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Yuri Khan @ 2023-04-26 5:42 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list
On Wed, 26 Apr 2023 at 10:56, Marcin Borkowski <mbork@mbork.pl> wrote:
> given an arbitrary string, say "Hello, world!!!", I want to have
> a filename with all the runs of weird characters (that is,
> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
> there a function for that in Emacs already or should I write my own?
It looks like you want to generate file names for blog posts or
articles so that URLs look pretty. While a useful goal, the mechanic
you ask for is lossy, so multiple titles could map into the same file
name, which, in the worst case, could lead to data loss, or require
disambiguation of some kind.
(Also, depending on how wide your audience is and what your
implementation’s definition of alphanumeric is, it might turn titles
written in non-Latin-based scripts into an empty string, and fixing
that might be moderately easy for some scripts (e.g. Greek, Cyrillic)
and hard for others (e.g. CJK).)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
2023-04-26 4:03 ` Platon Pronko
2023-04-26 5:42 ` Yuri Khan
@ 2023-04-26 10:08 ` Jean Louis
2023-04-26 12:30 ` Jean Louis
` (2 more replies)
2023-04-26 21:29 ` Emanuel Berg
` (2 subsequent siblings)
5 siblings, 3 replies; 19+ messages in thread
From: Jean Louis @ 2023-04-26 10:08 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list
(defun string-slug (s &optional random)
"Return slug for Website Revision System by using string S.
RANDOM number may be added on the end."
(let* ((random (or random nil))
;; (case-fold-search t)
(s (replace-regexp-in-string "[^[:word:]]" " " s))
(s (replace-regexp-in-string " +" " " s))
(s (replace-regexp-in-string "ž" "z" s))
(s (replace-regexp-in-string "Ž" "Z" s))
(s (replace-regexp-in-string "š" "s" s))
(s (replace-regexp-in-string "Š" "S" s))
(s (replace-regexp-in-string "č" "c" s))
(s (replace-regexp-in-string "Č" "C" s))
(s (replace-regexp-in-string "Ć" "C" s))
(s (replace-regexp-in-string "ć" "c" s))
(s (replace-regexp-in-string "đ" "d" s))
(s (replace-regexp-in-string "Đ" "D" s))
(s (replace-regexp-in-string "^[[:space:]]+" "" s))
(s (replace-regexp-in-string "[[:space:]]+$" "" s))
(s (replace-regexp-in-string " " "-" s))
(s (if random (concat s "-" (number-to-string (random-number))) s)))
s))
(string-slug "Hello there, how are you?") ➜ "Hello-there-how-are-you"
From:
GNU Emacs package: rcd-utilities.el:
https://gnu.support/gnu-emacs/packages/rcd-utilities-el.html
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
In support of Richard M. Stallman
https://stallmansupport.org/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 10:08 ` Jean Louis
@ 2023-04-26 12:30 ` Jean Louis
2023-04-26 13:07 ` Eli Zaretskii
2023-04-26 18:30 ` Marcin Borkowski
2 siblings, 0 replies; 19+ messages in thread
From: Jean Louis @ 2023-04-26 12:30 UTC (permalink / raw)
To: Marcin Borkowski, Help Gnu Emacs mailing list
* Jean Louis <bugs@gnu.support> [2023-04-26 15:26]:
> (defun string-slug (s &optional random)
> "Return slug for Website Revision System by using string S.
>
> RANDOM number may be added on the end."
(defun random-number (&optional digits)
"Return the random number with 6 digits by default"
(let ((digits (if digits digits 6))
(count 0))
(string-to-number
(with-output-to-string
(while (/= count digits)
(princ (number-to-string (1+ (random 9))))
(setq count (1+ count)))))))
(string-slug " Hello there, how are you? =)((&/&%))") ➜ "Hello-there-how-are-you"
(string-slug " Hello
there, how are you? =)((&/&%))" t) ➜ "Hello-there-how-are-you-292681"
Random number does not guarantee uniqueness of the slug.
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
In support of Richard M. Stallman
https://stallmansupport.org/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 10:08 ` Jean Louis
2023-04-26 12:30 ` Jean Louis
@ 2023-04-26 13:07 ` Eli Zaretskii
2023-04-27 4:52 ` Jean Louis
2023-04-26 18:30 ` Marcin Borkowski
2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2023-04-26 13:07 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Wed, 26 Apr 2023 13:08:59 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: Help Gnu Emacs mailing list <help-gnu-emacs@gnu.org>
>
> (defun string-slug (s &optional random)
> "Return slug for Website Revision System by using string S.
>
> RANDOM number may be added on the end."
> (let* ((random (or random nil))
> ;; (case-fold-search t)
> (s (replace-regexp-in-string "[^[:word:]]" " " s))
> (s (replace-regexp-in-string " +" " " s))
> (s (replace-regexp-in-string "ž" "z" s))
> (s (replace-regexp-in-string "Ž" "Z" s))
> (s (replace-regexp-in-string "š" "s" s))
> (s (replace-regexp-in-string "Š" "S" s))
> (s (replace-regexp-in-string "č" "c" s))
> (s (replace-regexp-in-string "Č" "C" s))
> (s (replace-regexp-in-string "Ć" "C" s))
> (s (replace-regexp-in-string "ć" "c" s))
> (s (replace-regexp-in-string "đ" "d" s))
> (s (replace-regexp-in-string "Đ" "D" s))
If you need to convert an accented character to its base character
(i.e. "remove" the accent), Emacs has much more general facilities:
(require 'ucs-normalize)
(substring (ucs-normalize-NFKD-string "Ć") 0 1)
=> "C"
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 10:08 ` Jean Louis
2023-04-26 12:30 ` Jean Louis
2023-04-26 13:07 ` Eli Zaretskii
@ 2023-04-26 18:30 ` Marcin Borkowski
2 siblings, 0 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-26 18:30 UTC (permalink / raw)
To: Jean Louis; +Cc: Help Gnu Emacs mailing list
On 2023-04-26, at 12:08, Jean Louis <bugs@gnu.support> wrote:
> (defun string-slug (s &optional random)
> "Return slug for Website Revision System by using string S.
>
> RANDOM number may be added on the end."
> (let* ((random (or random nil))
> ;; (case-fold-search t)
> (s (replace-regexp-in-string "[^[:word:]]" " " s))
> (s (replace-regexp-in-string " +" " " s))
> (s (replace-regexp-in-string "ž" "z" s))
> (s (replace-regexp-in-string "Ž" "Z" s))
> (s (replace-regexp-in-string "š" "s" s))
> (s (replace-regexp-in-string "Š" "S" s))
> (s (replace-regexp-in-string "č" "c" s))
> (s (replace-regexp-in-string "Č" "C" s))
> (s (replace-regexp-in-string "Ć" "C" s))
> (s (replace-regexp-in-string "ć" "c" s))
> (s (replace-regexp-in-string "đ" "d" s))
> (s (replace-regexp-in-string "Đ" "D" s))
> (s (replace-regexp-in-string "^[[:space:]]+" "" s))
> (s (replace-regexp-in-string "[[:space:]]+$" "" s))
> (s (replace-regexp-in-string " " "-" s))
> (s (if random (concat s "-" (number-to-string (random-number))) s)))
> s))
>
> (string-slug "Hello there, how are you?") ➜ "Hello-there-how-are-you"
>
> From:
>
> GNU Emacs package: rcd-utilities.el:
> https://gnu.support/gnu-emacs/packages/rcd-utilities-el.html
Thanks, though - as Eli said - this is very specific. Most of these
letters have 0 probability of appearing in _my_ files, and most letters
from (Polish alphabet \setminus English alphabet) are not there
anyway...
Best,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 5:42 ` Yuri Khan
@ 2023-04-26 18:32 ` Marcin Borkowski
0 siblings, 0 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-26 18:32 UTC (permalink / raw)
To: Yuri Khan; +Cc: Help Gnu Emacs mailing list
On 2023-04-26, at 07:42, Yuri Khan <yuri.v.khan@gmail.com> wrote:
> On Wed, 26 Apr 2023 at 10:56, Marcin Borkowski <mbork@mbork.pl> wrote:
>
>> given an arbitrary string, say "Hello, world!!!", I want to have
>> a filename with all the runs of weird characters (that is,
>> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
>> there a function for that in Emacs already or should I write my own?
>
> It looks like you want to generate file names for blog posts or
> articles so that URLs look pretty. While a useful goal, the mechanic
Nice try;-), but no, my use case is different.
> you ask for is lossy, so multiple titles could map into the same file
> name, which, in the worst case, could lead to data loss, or require
> disambiguation of some kind.
I am well aware of that - I don't care about lossiness, and I have ways
to combat ambiguity (the "slug" would only be one of several parts of
the filename).
> (Also, depending on how wide your audience is and what your
> implementation’s definition of alphanumeric is, it might turn titles
> written in non-Latin-based scripts into an empty string, and fixing
> that might be moderately easy for some scripts (e.g. Greek, Cyrillic)
> and hard for others (e.g. CJK).)
My audience is exactly one person, so again - not a problem. (Though
I /will/ blog about it at some point, and then it's possible that
someone might have to adapt the code.)
Best,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 5:39 ` Platon Pronko
@ 2023-04-26 18:32 ` Marcin Borkowski
0 siblings, 0 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-26 18:32 UTC (permalink / raw)
To: Platon Pronko; +Cc: Help Gnu Emacs mailing list
On 2023-04-26, at 07:39, Platon Pronko <platon7pronko@gmail.com> wrote:
> On 2023-04-26 12:42, Marcin Borkowski wrote:
>> On 2023-04-26, at 06:03, Platon Pronko <platon7pronko@gmail.com>
>> wrote:
>>
>>> On 2023-04-26 11:55, Marcin Borkowski wrote:
>>>> Hi all,
>>>> given an arbitrary string, say "Hello, world!!!", I want to have
>>>> a filename with all the runs of weird characters (that is,
>>>> non-alphanumeric ones) converted to dashes (say, "Hello-world"). Is
>>>> there a function for that in Emacs already or should I write my own?
>>>
>>> Something like this?
>>>
>>> (let ((input "Hello, world!!!"))
>>> (replace-regexp-in-string "[^a-zA-Z0-9]+" "-" input))
>>> ;; "Hello-world-"
>> More or less, possibly with trimming - I know it's simple, I just
>> didn't
>> want to do it if it already exists. Which I still don't know... (I
>> suppose not, but I'm not sure.)
>
> The term you are looking for is "slug" ("slugify").
Thanks!
> Quick google search indicates that there's nothing built-in, but there might be some implementations inside other packages. Here's a reference to an implementation in org-roam package: https://mailb.org/pipermail/emacs-berlin/2022/000897.html
>
> Also might be of interest:
> https://github.com/masasam/emacs-easy-hugo/issues/64
> https://melpa.org/#/unidecode
Thanks, too!
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
` (2 preceding siblings ...)
2023-04-26 10:08 ` Jean Louis
@ 2023-04-26 21:29 ` Emanuel Berg
2023-04-27 6:51 ` Marcin Borkowski
2023-04-26 21:32 ` Emanuel Berg
2023-04-29 6:20 ` James Thomas
5 siblings, 1 reply; 19+ messages in thread
From: Emanuel Berg @ 2023-04-26 21:29 UTC (permalink / raw)
To: help-gnu-emacs
Marcin Borkowski wrote:
> given an arbitrary string, say "Hello, world!!!", I want to
> have a filename with all the runs of weird characters (that
> is, non-alphanumeric ones) converted to dashes (say,
> "Hello-world"). Is there a function for that in Emacs
> already or should I write my own?
There are functions to do this to strings, what comes to mind
is `replace-regexp-in-string', but what difference do you mean
there are between filenames and strings that look like filenames?
To me it would be enough downcase all chars, convert chars
with various decorations to their ASCII base equivalents (e.g.
our Swedish å, ä, and ö to a, a, and o - not that one should
use non-English for filenames so a bad example, but sometimes
English has such chars as well), then use the dash char as the
word delimiter, and after that probably just drop remaining
non-alphanumerics.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
` (3 preceding siblings ...)
2023-04-26 21:29 ` Emanuel Berg
@ 2023-04-26 21:32 ` Emanuel Berg
2023-04-27 6:49 ` Marcin Borkowski
2023-04-29 6:20 ` James Thomas
5 siblings, 1 reply; 19+ messages in thread
From: Emanuel Berg @ 2023-04-26 21:32 UTC (permalink / raw)
To: help-gnu-emacs
Marcin Borkowski wrote:
> Is there a function for that in Emacs already or should
> I write my own?
There should be one, and then you could pass arguments to
convey your style, so that Elisp people
would-get-filenames-like-this.png and Java people likeThis.png
and C people lk_ths.png :)
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 13:07 ` Eli Zaretskii
@ 2023-04-27 4:52 ` Jean Louis
2023-04-27 5:53 ` Eli Zaretskii
0 siblings, 1 reply; 19+ messages in thread
From: Jean Louis @ 2023-04-27 4:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: help-gnu-emacs
* Eli Zaretskii <eliz@gnu.org> [2023-04-26 16:09]:
> If you need to convert an accented character to its base character
> (i.e. "remove" the accent), Emacs has much more general facilities:
>
> (require 'ucs-normalize)
> (substring (ucs-normalize-NFKD-string "Ć") 0 1)
> => "C"
Alright, then like this:
(defun string-slug (s &optional random)
"Return slug for Website Revision System by using string S.
RANDOM number may be added on the end."
(let* ((random (or random nil))
;; (case-fold-search t)
(s (replace-regexp-in-string "[^[:word:]]" " " s))
(s (replace-regexp-in-string " +" " " s))
(s (substring (ucs-normalize-NFKD-string s) 0 1))
(s (replace-regexp-in-string "^[[:space:]]+" "" s))
(s (replace-regexp-in-string "[[:space:]]+$" "" s))
(s (replace-regexp-in-string " " "-" s))
(s (if random (concat s "-" (number-to-string (random-number))) s)))
s))
(string-slug " OK, here, üößčć") ➜ ""
It doesn't give good result.
--
Jean
Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns
In support of Richard M. Stallman
https://stallmansupport.org/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-27 4:52 ` Jean Louis
@ 2023-04-27 5:53 ` Eli Zaretskii
2023-04-27 8:06 ` Platon Pronko
0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2023-04-27 5:53 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Thu, 27 Apr 2023 07:52:55 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: help-gnu-emacs@gnu.org
>
> * Eli Zaretskii <eliz@gnu.org> [2023-04-26 16:09]:
> > If you need to convert an accented character to its base character
> > (i.e. "remove" the accent), Emacs has much more general facilities:
> >
> > (require 'ucs-normalize)
> > (substring (ucs-normalize-NFKD-string "Ć") 0 1)
> > => "C"
>
> Alright, then like this:
>
> (defun string-slug (s &optional random)
> "Return slug for Website Revision System by using string S.
>
> RANDOM number may be added on the end."
> (let* ((random (or random nil))
> ;; (case-fold-search t)
> (s (replace-regexp-in-string "[^[:word:]]" " " s))
> (s (replace-regexp-in-string " +" " " s))
> (s (substring (ucs-normalize-NFKD-string s) 0 1))
> (s (replace-regexp-in-string "^[[:space:]]+" "" s))
> (s (replace-regexp-in-string "[[:space:]]+$" "" s))
> (s (replace-regexp-in-string " " "-" s))
> (s (if random (concat s "-" (number-to-string (random-number))) s)))
> s))
>
> (string-slug " OK, here, üößčć") ➜ ""
>
> It doesn't give good result.
Of course. Because you didn't understand how to use
ucs-normalize-NFKD-string for your purposes. Please read its doc
string, and try to play with it, starting from the example I've shown.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 21:32 ` Emanuel Berg
@ 2023-04-27 6:49 ` Marcin Borkowski
0 siblings, 0 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-27 6:49 UTC (permalink / raw)
To: Emanuel Berg; +Cc: help-gnu-emacs
On 2023-04-26, at 23:32, Emanuel Berg <incal@dataswamp.org> wrote:
> Marcin Borkowski wrote:
>
>> Is there a function for that in Emacs already or should
>> I write my own?
>
> There should be one, and then you could pass arguments to
> convey your style, so that Elisp people
> would-get-filenames-like-this.png and Java people likeThis.png
> and C people lk_ths.png :)
:-)
This is actually pretty funny. OTOH, I'd be afraid of scope creep...
Best,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 21:29 ` Emanuel Berg
@ 2023-04-27 6:51 ` Marcin Borkowski
0 siblings, 0 replies; 19+ messages in thread
From: Marcin Borkowski @ 2023-04-27 6:51 UTC (permalink / raw)
To: Emanuel Berg; +Cc: help-gnu-emacs
On 2023-04-26, at 23:29, Emanuel Berg <incal@dataswamp.org> wrote:
> Marcin Borkowski wrote:
>
>> given an arbitrary string, say "Hello, world!!!", I want to
>> have a filename with all the runs of weird characters (that
>> is, non-alphanumeric ones) converted to dashes (say,
>> "Hello-world"). Is there a function for that in Emacs
>> already or should I write my own?
>
> There are functions to do this to strings, what comes to mind
> is `replace-regexp-in-string', but what difference do you mean
> there are between filenames and strings that look like filenames?
Not sure if I wrote anything like that...
> To me it would be enough downcase all chars, convert chars
> with various decorations to their ASCII base equivalents (e.g.
> our Swedish å, ä, and ö to a, a, and o - not that one should
> use non-English for filenames so a bad example, but sometimes
> English has such chars as well), then use the dash char as the
> word delimiter, and after that probably just drop remaining
> non-alphanumerics.
Yeah, more or less. (Actually, I'm a bit on the fence about downcasing,
and I agree with the rest.)
Best,
--
Marcin Borkowski
http://mbork.pl
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-27 5:53 ` Eli Zaretskii
@ 2023-04-27 8:06 ` Platon Pronko
0 siblings, 0 replies; 19+ messages in thread
From: Platon Pronko @ 2023-04-27 8:06 UTC (permalink / raw)
To: Eli Zaretskii, help-gnu-emacs
On 2023-04-27 13:53, Eli Zaretskii wrote:
>> Date: Thu, 27 Apr 2023 07:52:55 +0300
>> From: Jean Louis <bugs@gnu.support>
>> Cc: help-gnu-emacs@gnu.org
>>
>> * Eli Zaretskii <eliz@gnu.org> [2023-04-26 16:09]:
>>> If you need to convert an accented character to its base character
>>> (i.e. "remove" the accent), Emacs has much more general facilities:
>>>
>>> (require 'ucs-normalize)
>>> (substring (ucs-normalize-NFKD-string "Ć") 0 1)
>>> => "C"
>>
>> Alright, then like this:
>>
>> (defun string-slug (s &optional random)
>> "Return slug for Website Revision System by using string S.
>>
>> RANDOM number may be added on the end."
>> (let* ((random (or random nil))
>> ;; (case-fold-search t)
>> (s (replace-regexp-in-string "[^[:word:]]" " " s))
>> (s (replace-regexp-in-string " +" " " s))
>> (s (substring (ucs-normalize-NFKD-string s) 0 1))
>> (s (replace-regexp-in-string "^[[:space:]]+" "" s))
>> (s (replace-regexp-in-string "[[:space:]]+$" "" s))
>> (s (replace-regexp-in-string " " "-" s))
>> (s (if random (concat s "-" (number-to-string (random-number))) s)))
>> s))
>>
>> (string-slug " OK, here, üößčć") ➜ ""
>>
>> It doesn't give good result.
>
> Of course. Because you didn't understand how to use
> ucs-normalize-NFKD-string for your purposes. Please read its doc
> string, and try to play with it, starting from the example I've shown.
>
I think something like this should work better:
(replace-regexp-in-string ucs-normalize-combining-chars-regexp "" (ucs-normalize-NFKD-string "Ć"))
The idea here is to replace "combined" codepoints with their Compatibility Decomposition, so instead of one "Ć" codepoint (0x0106) you will get "C" codepoint (0x43) followed by "combining acute accent" codepoint. Then you can regex-replace these combining characters and get the clean string.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: How to convert an arbitrary string into a filename
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
` (4 preceding siblings ...)
2023-04-26 21:32 ` Emanuel Berg
@ 2023-04-29 6:20 ` James Thomas
5 siblings, 0 replies; 19+ messages in thread
From: James Thomas @ 2023-04-29 6:20 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list
Marcin Borkowski wrote:
> Is there a function for that in Emacs already
I don't think so, because the denote package for eg., uses a custom
function called denote-sluggify:
https://git.sr.ht/~protesilaos/denote/tree/main/item/denote.el
There's also something similar in the function `make-backup-file-name-1`
in files.el, for paths.
--
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-04-29 6:20 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-26 3:55 How to convert an arbitrary string into a filename Marcin Borkowski
2023-04-26 4:03 ` Platon Pronko
2023-04-26 4:42 ` Marcin Borkowski
2023-04-26 5:39 ` Platon Pronko
2023-04-26 18:32 ` Marcin Borkowski
2023-04-26 5:42 ` Yuri Khan
2023-04-26 18:32 ` Marcin Borkowski
2023-04-26 10:08 ` Jean Louis
2023-04-26 12:30 ` Jean Louis
2023-04-26 13:07 ` Eli Zaretskii
2023-04-27 4:52 ` Jean Louis
2023-04-27 5:53 ` Eli Zaretskii
2023-04-27 8:06 ` Platon Pronko
2023-04-26 18:30 ` Marcin Borkowski
2023-04-26 21:29 ` Emanuel Berg
2023-04-27 6:51 ` Marcin Borkowski
2023-04-26 21:32 ` Emanuel Berg
2023-04-27 6:49 ` Marcin Borkowski
2023-04-29 6:20 ` James Thomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).