* How to check whether a character (or one-character string) is a letter? @ 2014-10-04 0:29 Marcin Borkowski 2014-10-04 1:38 ` Thorsten Jolitz ` (4 more replies) 0 siblings, 5 replies; 8+ messages in thread From: Marcin Borkowski @ 2014-10-04 0:29 UTC (permalink / raw) To: help-gnu-emacs@gnu.org Hello, this is a problem I have. Assume that I have a character (taken from some string, which in turn is copied from the buffer - so it need not be ASCII). What is the best way to check whether it is a letter within ASCII range? The reason I'm asking is that I'm writing a function which converts an arbitrary string to a valid (and nice) filename (e.g., only letters and hyphens) - so basically I want to walk a string character by character and convert any space to a hyphen and omit any other non-letter. Am I reinventing the wheel? TIA, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Adam Mickiewicz University ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to check whether a character (or one-character string) is a letter? 2014-10-04 0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski @ 2014-10-04 1:38 ` Thorsten Jolitz [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com> ` (3 subsequent siblings) 4 siblings, 0 replies; 8+ messages in thread From: Thorsten Jolitz @ 2014-10-04 1:38 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: Hello, > this is a problem I have. > > Assume that I have a character (taken from some string, which in turn is > copied from the buffer - so it need not be ASCII). What is the best way > to check whether it is a letter within ASCII range? > > The reason I'm asking is that I'm writing a function which converts an > arbitrary string to a valid (and nice) filename (e.g., only letters and > hyphens) - so basically I want to walk a string character by character > and convert any space to a hyphen and omit any other non-letter. Am I > reinventing the wheel? there is something similar in PicoLisp: #+BEGIN_SRC picolisp :results pp (fold "abc-?/@ä-12 YZ #+ü") #+END_SRC #+results: : "abcä12yzü" but not quite ... -- cheers, Thorsten ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com>]
* Fwd: How to check whether a character (or one-character string) is a letter? [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com> @ 2014-10-04 2:47 ` John Mastro 2014-10-05 0:11 ` Marcin Borkowski 0 siblings, 1 reply; 8+ messages in thread From: John Mastro @ 2014-10-04 2:47 UTC (permalink / raw) To: help-gnu-emacs@gnu.org [I first sent this directly to Marcin in error - yeah, I use the email gateway] Hi Marcin, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: > Assume that I have a character (taken from some string, which in turn is > copied from the buffer - so it need not be ASCII). What is the best way > to check whether it is a letter within ASCII range? > > The reason I'm asking is that I'm writing a function which converts an > arbitrary string to a valid (and nice) filename (e.g., only letters and > hyphens) - so basically I want to walk a string character by character > and convert any space to a hyphen and omit any other non-letter. Am I > reinventing the wheel? There are a bunch of ways to do this, but one reasonable approach is to use a regular expression. I think this will do what you want: (defun reasonable-filename (str) (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str)) (str (replace-regexp-in-string "[^a-zA-Z-]" "" str))) str)) This is a variation which will also allow the result to contain numbers: (defun reasonable-filename (str) (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str)) (str (replace-regexp-in-string "[^a-zA-Z0-9-]" "" str))) str)) To answer your question about identifying whether a character is an ASCII letter, the key is that Emacs's characters are really "just" integers. Wikipedia has some charts[1] that show the numbers associated with the characters. The letters are conveniently grouped together, so we can use something like this: (defun ascii-letter-p (char) (and (characterp char) (>= char 65) (<= char 122))) (Of course, this only works if it's really a character, as opposed to a string of length one. If it's a string of length one you could either "extract" the character with `aref' or use a regular expression instead.) Hope that helps. [1] https://en.wikipedia.org/wiki/ASCII#ASCII_printable_code_chart -- john ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fwd: How to check whether a character (or one-character string) is a letter? 2014-10-04 2:47 ` Fwd: " John Mastro @ 2014-10-05 0:11 ` Marcin Borkowski 0 siblings, 0 replies; 8+ messages in thread From: Marcin Borkowski @ 2014-10-05 0:11 UTC (permalink / raw) To: help-gnu-emacs@gnu.org On 2014-10-04, at 04:47, John Mastro wrote: > [I first sent this directly to Marcin in error - yeah, I use the email gateway] > > Hi Marcin, > > Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: >> Assume that I have a character (taken from some string, which in turn is >> copied from the buffer - so it need not be ASCII). What is the best way >> to check whether it is a letter within ASCII range? >> >> The reason I'm asking is that I'm writing a function which converts an >> arbitrary string to a valid (and nice) filename (e.g., only letters and >> hyphens) - so basically I want to walk a string character by character >> and convert any space to a hyphen and omit any other non-letter. Am I >> reinventing the wheel? > > There are a bunch of ways to do this, but one reasonable approach is to > use a regular expression. I think this will do what you want: > > (defun reasonable-filename (str) > (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str)) > (str (replace-regexp-in-string "[^a-zA-Z-]" "" str))) > str)) I think this is probably better than mapcar'ing through the string... > This is a variation which will also allow the result to contain numbers: > > (defun reasonable-filename (str) > (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str)) > (str (replace-regexp-in-string "[^a-zA-Z0-9-]" "" str))) > str)) This I don't want, since in case of equal filenames, I want to differentiate them by appending a number, and allowing digits might break this. But thanks anyway. > To answer your question about identifying whether a character is an > ASCII letter, the key is that Emacs's characters are really "just" > integers. Wikipedia has some charts[1] that show the numbers associated > with the characters. The letters are conveniently grouped together, so > we can use something like this: > > (defun ascii-letter-p (char) > (and (characterp char) > (>= char 65) > (<= char 122))) > > (Of course, this only works if it's really a character, as opposed to a > string of length one. If it's a string of length one you could either > "extract" the character with `aref' or use a regular expression > instead.) > > Hope that helps. Yes it does! Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Adam Mickiewicz University ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to check whether a character (or one-character string) is a letter? 2014-10-04 0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski 2014-10-04 1:38 ` Thorsten Jolitz [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com> @ 2014-10-04 2:58 ` Eric Abrahamsen 2014-10-04 4:08 ` Yuri Khan 2014-10-04 7:29 ` Eli Zaretskii 4 siblings, 0 replies; 8+ messages in thread From: Eric Abrahamsen @ 2014-10-04 2:58 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: > Hello, > > this is a problem I have. > > Assume that I have a character (taken from some string, which in turn is > copied from the buffer - so it need not be ASCII). What is the best way > to check whether it is a letter within ASCII range? > > The reason I'm asking is that I'm writing a function which converts an > arbitrary string to a valid (and nice) filename (e.g., only letters and > hyphens) - so basically I want to walk a string character by character > and convert any space to a hyphen and omit any other non-letter. Am I > reinventing the wheel? The safest thing is probably still replace-regexp-in-string. You can use the [:ascii:] character class to strip out anything that isn't ascii, with a regexp like "[^[:ascii:]]+". Messing with the string at the character level shouldn't be necessary. Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to check whether a character (or one-character string) is a letter? 2014-10-04 0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski ` (2 preceding siblings ...) 2014-10-04 2:58 ` Eric Abrahamsen @ 2014-10-04 4:08 ` Yuri Khan 2014-10-05 0:08 ` Marcin Borkowski 2014-10-04 7:29 ` Eli Zaretskii 4 siblings, 1 reply; 8+ messages in thread From: Yuri Khan @ 2014-10-04 4:08 UTC (permalink / raw) To: Marcin Borkowski; +Cc: help-gnu-emacs@gnu.org On Sat, Oct 4, 2014 at 7:29 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: > The reason I'm asking is that I'm writing a function which converts an > arbitrary string to a valid (and nice) filename (e.g., only letters and > hyphens) - so basically I want to walk a string character by character > and convert any space to a hyphen and omit any other non-letter. Am I > reinventing the wheel? What are your assumptions about input string arbitrariness, your requirements about output filename niceness, and your requirements about the properties of the mapping? Because these may be in conflict. For example, if you assume any arbitrary strings, want only [-0-9A-Za-z_] characters, and want reasonably different strings to map into different filenames, then you will end up having to preserve non-nice characters as ugly character encodings (in the spirit of urlencode, XML character references, or Punycode). Otherwise, whole words or sentences in Russian, Japanese or Greek will map into an empty filename. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to check whether a character (or one-character string) is a letter? 2014-10-04 4:08 ` Yuri Khan @ 2014-10-05 0:08 ` Marcin Borkowski 0 siblings, 0 replies; 8+ messages in thread From: Marcin Borkowski @ 2014-10-05 0:08 UTC (permalink / raw) To: help-gnu-emacs@gnu.org On 2014-10-04, at 06:08, Yuri Khan wrote: > On Sat, Oct 4, 2014 at 7:29 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: > >> The reason I'm asking is that I'm writing a function which converts an >> arbitrary string to a valid (and nice) filename (e.g., only letters and >> hyphens) - so basically I want to walk a string character by character >> and convert any space to a hyphen and omit any other non-letter. Am I >> reinventing the wheel? > > What are your assumptions about input string arbitrariness, your > requirements about output filename niceness, and your requirements > about the properties of the mapping? > > Because these may be in conflict. > > For example, if you assume any arbitrary strings, want only > [-0-9A-Za-z_] characters, and want reasonably different strings to map > into different filenames, then you will end up having to preserve > non-nice characters as ugly character encodings (in the spirit of > urlencode, XML character references, or Punycode). Otherwise, whole > words or sentences in Russian, Japanese or Greek will map into an > empty filename. Good point. However, I intend to keep a list of filenames, and in case some of them is already taken, append a number to it. (This is an extremely primitive hashing function, but it will suffice for my needs.) Regards, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Adam Mickiewicz University ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to check whether a character (or one-character string) is a letter? 2014-10-04 0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski ` (3 preceding siblings ...) 2014-10-04 4:08 ` Yuri Khan @ 2014-10-04 7:29 ` Eli Zaretskii 4 siblings, 0 replies; 8+ messages in thread From: Eli Zaretskii @ 2014-10-04 7:29 UTC (permalink / raw) To: help-gnu-emacs > From: Marcin Borkowski <mbork@wmi.amu.edu.pl> > Date: Sat, 04 Oct 2014 02:29:24 +0200 > > Assume that I have a character (taken from some string, which in turn is > copied from the buffer - so it need not be ASCII). What is the best way > to check whether it is a letter within ASCII range? One way is this: (= 1 (string-bytes (char-to-string ch))) where 'ch' is the character you want to test. Note that this will pass any 7-bit ASCII character, including control characters, digits, and punctuation, not just "letters". ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-10-05 0:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-04 0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski 2014-10-04 1:38 ` Thorsten Jolitz [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com> 2014-10-04 2:47 ` Fwd: " John Mastro 2014-10-05 0:11 ` Marcin Borkowski 2014-10-04 2:58 ` Eric Abrahamsen 2014-10-04 4:08 ` Yuri Khan 2014-10-05 0:08 ` Marcin Borkowski 2014-10-04 7:29 ` Eli Zaretskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).