all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* How to check whether a character (or one-character string) is a letter?
@ 2014-10-04  0:29 Marcin Borkowski
  2014-10-04  1:38 ` Thorsten Jolitz
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Marcin Borkowski @ 2014-10-04  0:29 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org

Hello,

this is a problem I have.

Assume that I have a character (taken from some string, which in turn is
copied from the buffer - so it need not be ASCII).  What is the best way
to check whether it is a letter within ASCII range?

The reason I'm asking is that I'm writing a function which converts an
arbitrary string to a valid (and nice) filename (e.g., only letters and
hyphens) - so basically I want to walk a string character by character
and convert any space to a hyphen and omit any other non-letter.  Am I
reinventing the wheel?

TIA,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to check whether a character (or one-character string) is a letter?
  2014-10-04  0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski
@ 2014-10-04  1:38 ` Thorsten Jolitz
       [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com>
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Thorsten Jolitz @ 2014-10-04  1:38 UTC (permalink / raw
  To: help-gnu-emacs

Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:

Hello,

> this is a problem I have.
>
> Assume that I have a character (taken from some string, which in turn is
> copied from the buffer - so it need not be ASCII).  What is the best way
> to check whether it is a letter within ASCII range?
>
> The reason I'm asking is that I'm writing a function which converts an
> arbitrary string to a valid (and nice) filename (e.g., only letters and
> hyphens) - so basically I want to walk a string character by character
> and convert any space to a hyphen and omit any other non-letter.  Am I
> reinventing the wheel?

there is something similar in PicoLisp:

#+BEGIN_SRC picolisp :results pp
 (fold "abc-?/@ä-12 YZ  #+ü")
#+END_SRC

#+results:
: "abcä12yzü"

but not quite ...


-- 
cheers,
Thorsten




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fwd: How to check whether a character (or one-character string) is a letter?
       [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com>
@ 2014-10-04  2:47   ` John Mastro
  2014-10-05  0:11     ` Marcin Borkowski
  0 siblings, 1 reply; 8+ messages in thread
From: John Mastro @ 2014-10-04  2:47 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org

[I first sent this directly to Marcin in error - yeah, I use the email gateway]

Hi Marcin,

Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:
> Assume that I have a character (taken from some string, which in turn is
> copied from the buffer - so it need not be ASCII).  What is the best way
> to check whether it is a letter within ASCII range?
>
> The reason I'm asking is that I'm writing a function which converts an
> arbitrary string to a valid (and nice) filename (e.g., only letters and
> hyphens) - so basically I want to walk a string character by character
> and convert any space to a hyphen and omit any other non-letter.  Am I
> reinventing the wheel?

There are a bunch of ways to do this, but one reasonable approach is to
use a regular expression. I think this will do what you want:

    (defun reasonable-filename (str)
      (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str))
             (str (replace-regexp-in-string "[^a-zA-Z-]" "" str)))
        str))

This is a variation which will also allow the result to contain numbers:

    (defun reasonable-filename (str)
      (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str))
             (str (replace-regexp-in-string "[^a-zA-Z0-9-]" "" str)))
        str))

To answer your question about identifying whether a character is an
ASCII letter, the key is that Emacs's characters are really "just"
integers. Wikipedia has some charts[1] that show the numbers associated
with the characters. The letters are conveniently grouped together, so
we can use something like this:

    (defun ascii-letter-p (char)
      (and (characterp char)
           (>= char 65)
           (<= char 122)))

(Of course, this only works if it's really a character, as opposed to a
string of length one. If it's a string of length one you could either
"extract" the character with `aref' or use a regular expression
instead.)

Hope that helps.

[1] https://en.wikipedia.org/wiki/ASCII#ASCII_printable_code_chart

--
john



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to check whether a character (or one-character string) is a letter?
  2014-10-04  0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski
  2014-10-04  1:38 ` Thorsten Jolitz
       [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com>
@ 2014-10-04  2:58 ` Eric Abrahamsen
  2014-10-04  4:08 ` Yuri Khan
  2014-10-04  7:29 ` Eli Zaretskii
  4 siblings, 0 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2014-10-04  2:58 UTC (permalink / raw
  To: help-gnu-emacs

Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:

> Hello,
>
> this is a problem I have.
>
> Assume that I have a character (taken from some string, which in turn is
> copied from the buffer - so it need not be ASCII).  What is the best way
> to check whether it is a letter within ASCII range?
>
> The reason I'm asking is that I'm writing a function which converts an
> arbitrary string to a valid (and nice) filename (e.g., only letters and
> hyphens) - so basically I want to walk a string character by character
> and convert any space to a hyphen and omit any other non-letter.  Am I
> reinventing the wheel?

The safest thing is probably still replace-regexp-in-string. You can use
the [:ascii:] character class to strip out anything that isn't ascii,
with a regexp like "[^[:ascii:]]+". Messing with the string at the
character level shouldn't be necessary.

Eric




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to check whether a character (or one-character string) is a letter?
  2014-10-04  0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski
                   ` (2 preceding siblings ...)
  2014-10-04  2:58 ` Eric Abrahamsen
@ 2014-10-04  4:08 ` Yuri Khan
  2014-10-05  0:08   ` Marcin Borkowski
  2014-10-04  7:29 ` Eli Zaretskii
  4 siblings, 1 reply; 8+ messages in thread
From: Yuri Khan @ 2014-10-04  4:08 UTC (permalink / raw
  To: Marcin Borkowski; +Cc: help-gnu-emacs@gnu.org

On Sat, Oct 4, 2014 at 7:29 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:

> The reason I'm asking is that I'm writing a function which converts an
> arbitrary string to a valid (and nice) filename (e.g., only letters and
> hyphens) - so basically I want to walk a string character by character
> and convert any space to a hyphen and omit any other non-letter.  Am I
> reinventing the wheel?

What are your assumptions about input string arbitrariness, your
requirements about output filename niceness, and your requirements
about the properties of the mapping?

Because these may be in conflict.

For example, if you assume any arbitrary strings, want only
[-0-9A-Za-z_] characters, and want reasonably different strings to map
into different filenames, then you will end up having to preserve
non-nice characters as ugly character encodings (in the spirit of
urlencode, XML character references, or Punycode). Otherwise, whole
words or sentences in Russian, Japanese or Greek will map into an
empty filename.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to check whether a character (or one-character string) is a letter?
  2014-10-04  0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski
                   ` (3 preceding siblings ...)
  2014-10-04  4:08 ` Yuri Khan
@ 2014-10-04  7:29 ` Eli Zaretskii
  4 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2014-10-04  7:29 UTC (permalink / raw
  To: help-gnu-emacs

> From: Marcin Borkowski <mbork@wmi.amu.edu.pl>
> Date: Sat, 04 Oct 2014 02:29:24 +0200
> 
> Assume that I have a character (taken from some string, which in turn is
> copied from the buffer - so it need not be ASCII).  What is the best way
> to check whether it is a letter within ASCII range?

One way is this:

  (= 1 (string-bytes (char-to-string ch)))

where 'ch' is the character you want to test.  Note that this will
pass any 7-bit ASCII character, including control characters, digits,
and punctuation, not just "letters".




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to check whether a character (or one-character string) is a letter?
  2014-10-04  4:08 ` Yuri Khan
@ 2014-10-05  0:08   ` Marcin Borkowski
  0 siblings, 0 replies; 8+ messages in thread
From: Marcin Borkowski @ 2014-10-05  0:08 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org


On 2014-10-04, at 06:08, Yuri Khan wrote:

> On Sat, Oct 4, 2014 at 7:29 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:
>
>> The reason I'm asking is that I'm writing a function which converts an
>> arbitrary string to a valid (and nice) filename (e.g., only letters and
>> hyphens) - so basically I want to walk a string character by character
>> and convert any space to a hyphen and omit any other non-letter.  Am I
>> reinventing the wheel?
>
> What are your assumptions about input string arbitrariness, your
> requirements about output filename niceness, and your requirements
> about the properties of the mapping?
>
> Because these may be in conflict.
>
> For example, if you assume any arbitrary strings, want only
> [-0-9A-Za-z_] characters, and want reasonably different strings to map
> into different filenames, then you will end up having to preserve
> non-nice characters as ugly character encodings (in the spirit of
> urlencode, XML character references, or Punycode). Otherwise, whole
> words or sentences in Russian, Japanese or Greek will map into an
> empty filename.

Good point.  However, I intend to keep a list of filenames, and in case
some of them is already taken, append a number to it.  (This is an
extremely primitive hashing function, but it will suffice for my needs.)

Regards,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: How to check whether a character (or one-character string) is a letter?
  2014-10-04  2:47   ` Fwd: " John Mastro
@ 2014-10-05  0:11     ` Marcin Borkowski
  0 siblings, 0 replies; 8+ messages in thread
From: Marcin Borkowski @ 2014-10-05  0:11 UTC (permalink / raw
  To: help-gnu-emacs@gnu.org


On 2014-10-04, at 04:47, John Mastro wrote:

> [I first sent this directly to Marcin in error - yeah, I use the email gateway]
>
> Hi Marcin,
>
> Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:
>> Assume that I have a character (taken from some string, which in turn is
>> copied from the buffer - so it need not be ASCII).  What is the best way
>> to check whether it is a letter within ASCII range?
>>
>> The reason I'm asking is that I'm writing a function which converts an
>> arbitrary string to a valid (and nice) filename (e.g., only letters and
>> hyphens) - so basically I want to walk a string character by character
>> and convert any space to a hyphen and omit any other non-letter.  Am I
>> reinventing the wheel?
>
> There are a bunch of ways to do this, but one reasonable approach is to
> use a regular expression. I think this will do what you want:
>
>     (defun reasonable-filename (str)
>       (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str))
>              (str (replace-regexp-in-string "[^a-zA-Z-]" "" str)))
>         str))

I think this is probably better than mapcar'ing through the string...

> This is a variation which will also allow the result to contain numbers:
>
>     (defun reasonable-filename (str)
>       (let* ((str (replace-regexp-in-string "[ \t\n\r]" "-" str))
>              (str (replace-regexp-in-string "[^a-zA-Z0-9-]" "" str)))
>         str))

This I don't want, since in case of equal filenames, I want to
differentiate them by appending a number, and allowing digits might
break this.  But thanks anyway.

> To answer your question about identifying whether a character is an
> ASCII letter, the key is that Emacs's characters are really "just"
> integers. Wikipedia has some charts[1] that show the numbers associated
> with the characters. The letters are conveniently grouped together, so
> we can use something like this:
>
>     (defun ascii-letter-p (char)
>       (and (characterp char)
>            (>= char 65)
>            (<= char 122)))
>
> (Of course, this only works if it's really a character, as opposed to a
> string of length one. If it's a string of length one you could either
> "extract" the character with `aref' or use a regular expression
> instead.)
>
> Hope that helps.

Yes it does!

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-05  0:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-04  0:29 How to check whether a character (or one-character string) is a letter? Marcin Borkowski
2014-10-04  1:38 ` Thorsten Jolitz
     [not found] ` <CAOj2CQQsnNxtUPzPV8Vw_DgfGFXUUkZExHbArAu_zDjQn-prvw@mail.gmail.com>
2014-10-04  2:47   ` Fwd: " John Mastro
2014-10-05  0:11     ` Marcin Borkowski
2014-10-04  2:58 ` Eric Abrahamsen
2014-10-04  4:08 ` Yuri Khan
2014-10-05  0:08   ` Marcin Borkowski
2014-10-04  7:29 ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.