unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* national to ASCII character conversion
@ 2009-06-17 20:34 Michal
  2009-06-17 20:56 ` Drew Adams
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Michal @ 2009-06-17 20:34 UTC (permalink / raw)
  To: help-gnu-emacs

Hallo Group Members

I have a string which contains national characters and I would like to
convert them to ASCII equivalents.

for example:
ó -> o
ł -> l
ż -> z

etc.

one of ways is just using:
(replace-regexp-in-string "ó" "o" my_string)

for each national character, but I believe that there is much better
way.

Maybe You know it?

Best regards
Michal



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: national to ASCII character conversion
  2009-06-17 20:34 national to ASCII character conversion Michal
@ 2009-06-17 20:56 ` Drew Adams
  2009-06-17 21:26 ` Teemu Likonen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Drew Adams @ 2009-06-17 20:56 UTC (permalink / raw)
  To: 'Michal', help-gnu-emacs

> I have a string which contains national characters and I would like to
> convert them to ASCII equivalents. for example: ó -> o ł -> l ż -> z
>
> one of ways is just using: (replace-regexp-in-string "ó" "o" my_string)
> for each national character, but I believe that there is much better way.

I'm sure there is is a good way, but I don't know it. (I don't use such chars
much anymore.)

If someone doesn't offer a good way (which I doubt), then you can at least do
what I used to do:
http://www.emacswiki.org/emacs/unaccent.el.

If there is no good way currently (which I also doubt), then perhaps someone
will code something up.

When I used to do this a lot, I found it handy to have commands that do it for
one or more words (`unaccent-word') and for the region (`unaccent-region'). I
bind those to M-" and C-x ", respectively.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: national to ASCII character conversion
  2009-06-17 20:34 national to ASCII character conversion Michal
  2009-06-17 20:56 ` Drew Adams
@ 2009-06-17 21:26 ` Teemu Likonen
  2009-06-17 22:09   ` Teemu Likonen
  2009-06-17 21:57 ` Xah Lee
       [not found] ` <mailman.819.1245272178.2239.help-gnu-emacs@gnu.org>
  3 siblings, 1 reply; 7+ messages in thread
From: Teemu Likonen @ 2009-06-17 21:26 UTC (permalink / raw)
  To: help-gnu-emacs

On 2009-06-17 22:34 (+0200), Michal wrote:

> I have a string which contains national characters and I would like to
> convert them to ASCII equivalents.
>
> for example:
> ó -> o
> ł -> l
> ż -> z

I can't think of any simple and elegant way but using external tool
"iconv" is one solution. Put your string to a temporary buffer and
filter the buffer's content through iconv. A shell example:

    $ echo ółż | iconv -t ASCII//TRANSLIT
    olz

In Lisp it would be something like this:

    (setq my-string "ółż")

    (with-temp-buffer
      (insert my-string)
      (shell-command-on-region (point-min) (point-max)
                               "/usr/bin/iconv -t ASCII//TRANSLIT"
                               (current-buffer) t)
      (setq my-string
            (buffer-substring-no-properties (point-min) (point-max))))


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: national to ASCII character conversion
  2009-06-17 20:34 national to ASCII character conversion Michal
  2009-06-17 20:56 ` Drew Adams
  2009-06-17 21:26 ` Teemu Likonen
@ 2009-06-17 21:57 ` Xah Lee
       [not found] ` <mailman.819.1245272178.2239.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 7+ messages in thread
From: Xah Lee @ 2009-06-17 21:57 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 17, 1:34 pm, Michal <rabbi...@tenbit.pl> wrote:
> Hallo Group Members
>
> I have a string which contains national characters and I would like to
> convert them to ASCII equivalents.
>
> for example:
> ó -> o
> ³ -> l
> ¿ -> z
>
> etc.
>
> one of ways is just using:
> (replace-regexp-in-string "ó" "o" my_string)
>
> for each national character, but I believe that there is much better
> way.

here's my solution. I frequently need to replace several pairs, for
many applications. e.g. html encoding, url encoding, alpha beta gamma
to α β γ, Infinity and Pi to ∞ π, C-x to Ctrl+x and M- to Alt+ , text
replacement based basic math formula syntax transform between langs...
etc.

Here's the 4 basic functions i call in my other elisp programs.

(defun replace-pairs-in-string (str pairs)
  "Replace string STR recursively by find/replace pairs PAIRS
sequence.

Example:
 (replace-pairs-in-string \"abcdef\"
  '([\"a\" \"1\"] [\"b\" \"2\"] [\"c\" \"3\"]))  ⇒ “\"123def\"”.
The replacement is done recursively after each find/replace pair.
Earlier replaced value may be replaced again.
The replacement are literal and case sensitive.

If you want the replacement to be case sensitive, set the global
variable case-fold-search to nil. Like this: (let ((case-fold-search
nil)) (replace-regexp-in-string-pairs ...)

This function calls `replace-regexp-in-string' to do its work.

See also `replace-regexp-pairs-in-string'."
  (let ((mystr str))
    (setq mystr str)
    (mapc
     (lambda (x) (setq mystr (replace-regexp-in-string
                              (regexp-quote (elt x 0))
                              (elt x 1) mystr t t)))
     pairs)
    mystr))

(defun replace-regexp-pairs-in-string (str pairs &optional fixedcase)
  "Replace string STR recursively by regex find/replace pairs PAIRS
sequence.

Form:
 (replace-regexp-in-string-pairs
 '([REGEX1 REPLACE1] [REGEX2 REPLACE2] ...)
  FIXEDCASE)

The PAIRS can be any lisp sequence data type.

The third argument FIXEDCASE, if non-nil, changes the case of the
replacement in a smart way matching the letter case of the find
string.

If you want the regex to be case sensitive, set the global variable
case-fold-search to nil. Like this: (let ((case-fold-search nil))
(replace-regexp-in-string-pairs ...)

This function calls `replace-regexp-in-string' to do its work.

See also `replace-pairs-in-string'."
  (let ((mystr str))
    (setq mystr str)
    (mapc
     (lambda (x) (setq mystr (replace-regexp-in-string
                              (elt x 0)
                              (elt x 1) mystr fixedcase)))
     pairs)
    mystr))

(defun replace-pairs-region (start end pairs)
  "Replace regex string find/replace PAIRS in region.

For detail, see `replace-pairs-in-string'."
  (let (mystr)
    (setq mystr (buffer-substring-no-properties start end))
    (delete-region start end)
    (insert (replace-pairs-in-string mystr pairs))))

(defun replace-regexp-pairs-region (start end pairs &optional
fixedcase)
  "Replace regex string find/replace PAIRS in region.

For detail, see `replace-regexp-pairs-in-string'."
  (let (mystr)
    (setq mystr (buffer-substring-no-properties start end))
    (delete-region start end)
    (insert (replace-regexp-pairs-in-string mystr pairs fixedcase))))


so i have about 10 other functions that does replacement of various
types by calling one of the above.

if all you want is just something one-time, as opposed to the above
more general functions, pls see:


Q: How to replace “&” by “&amp;” in a region?

Place the following in your emacs init file:

(defun replace-string-pairs-region (start end mylist)
  "Replace string pairs in region."
  (save-restriction
    (narrow-to-region start end)
    (mapc
      (lambda (arg)
        (goto-char (point-min))
        (while (search-forward (car arg) nil t) (replace-match (cadr
arg)) )
      ) mylist
    )
  )
)

(defun replace-html-chars (start end)
  "Replace “<” by “&lt;” and other similar HTML chars that needs to be
encoded."
  (interactive "r")
(replace-string-pairs-region start end '(
("&" "&amp;")
("<" "&lt;")
(">" "&gt;")
    )
  )
)

With the above code, you can select a region, then press “Alt+x
replace-html-chars”, and have all “&”, “>”, “<” replaced by their
encoded entity. You can define a keyboard shortcut for easy operation.

You can also use the code to replace some HTML entities by their
actual unicode characters. For example:

&ldquo;    →    “
&rdquo;    →    ”
&eacute;   →    é
&copy;     →    ©

->         →    →
=>         →    ⇒
Pi         →    π
Infinity   →    ∞

This makes the HTML source code more elegant and readible. (You need
to declare your charset as one of unicode encodings. See Character
Sets and Encoding in HTML)

above from:

• Emacs and HTML Tips
  http://xahlee.org/emacs/emacs_html.html

emacs string replacement can be tricky, due to emacs's feature of
smartly choosing letter case plus a weired way to toggle it. (usually
using a global var instead of a function parameter, but not always
because ...) Emacs regex can also be tricky because it's diff from
Perl or posix and in particular has severe toothpick syndrom that
basically doesn't exist in any other lang. For some detail, see:

• Find and Replace with Emacs
  http://xahlee.org/emacs/emacs_find_replace.html

• Text Pattern Matching in Emacs
  http://xahlee.org/emacs/emacs_regex.html

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: national to ASCII character conversion
  2009-06-17 21:26 ` Teemu Likonen
@ 2009-06-17 22:09   ` Teemu Likonen
  0 siblings, 0 replies; 7+ messages in thread
From: Teemu Likonen @ 2009-06-17 22:09 UTC (permalink / raw)
  To: help-gnu-emacs

On 2009-06-17 21:26 (UTC), Teemu Likonen wrote:

> On 2009-06-17 22:34 (+0200), Michal wrote:
>> I have a string which contains national characters and I would like
>> to convert them to ASCII equivalents.

> In Lisp it would be something like this:

Ok, let's turn it into a function:

    (defun my-ascii-translit (string)
      "Turn STRING to Ascii and transliterate non-Ascii chars."
      (with-temp-buffer
        (insert string)
        (shell-command-on-region (point-min) (point-max)
                                 "/usr/bin/iconv -t ASCII//TRANSLIT"
                                 (current-buffer) t)
        (buffer-substring-no-properties (point-min) (point-max))))

The function returns asciified version of its argument string:

    (my-ascii-translit "ółż € —")

    => "olz EUR --"


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: national to ASCII character conversion
       [not found] ` <mailman.819.1245272178.2239.help-gnu-emacs@gnu.org>
@ 2009-06-17 23:02   ` Michal
  2009-06-17 23:42     ` Drew Adams
  0 siblings, 1 reply; 7+ messages in thread
From: Michal @ 2009-06-17 23:02 UTC (permalink / raw)
  To: help-gnu-emacs


Thank YOu People for all Your answers!!!!!
Thanks To You I found the solution!!!
Best regards!!
Michal


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: national to ASCII character conversion
  2009-06-17 23:02   ` Michal
@ 2009-06-17 23:42     ` Drew Adams
  0 siblings, 0 replies; 7+ messages in thread
From: Drew Adams @ 2009-06-17 23:42 UTC (permalink / raw)
  To: 'Michal', help-gnu-emacs

> I found the solution!!!

And it is...?





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-17 23:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-17 20:34 national to ASCII character conversion Michal
2009-06-17 20:56 ` Drew Adams
2009-06-17 21:26 ` Teemu Likonen
2009-06-17 22:09   ` Teemu Likonen
2009-06-17 21:57 ` Xah Lee
     [not found] ` <mailman.819.1245272178.2239.help-gnu-emacs@gnu.org>
2009-06-17 23:02   ` Michal
2009-06-17 23:42     ` Drew Adams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).