From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Xah Lee Newsgroups: gmane.emacs.help Subject: Re: national to ASCII character conversion Date: Wed, 17 Jun 2009 14:57:48 -0700 (PDT) Organization: http://groups.google.com Message-ID: <08c6029f-d776-42c8-b2cc-007fbfbe4c2d@z8g2000prd.googlegroups.com> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1245345657 31343 80.91.229.12 (18 Jun 2009 17:20:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 18 Jun 2009 17:20:57 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jun 18 19:20:53 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MHLI9-0002Ap-P5 for geh-help-gnu-emacs@m.gmane.org; Thu, 18 Jun 2009 19:20:50 +0200 Original-Received: from localhost ([127.0.0.1]:46346 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MHLI9-0005NX-26 for geh-help-gnu-emacs@m.gmane.org; Thu, 18 Jun 2009 13:20:49 -0400 Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!z8g2000prd.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help,comp.emacs Original-Lines: 190 Original-NNTP-Posting-Host: 76.102.12.87 Original-X-Trace: posting.google.com 1245275869 3554 127.0.0.1 (17 Jun 2009 21:57:49 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Wed, 17 Jun 2009 21:57:49 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: z8g2000prd.googlegroups.com; posting-host=76.102.12.87; posting-account=bRPKjQoAAACxZsR8_VPXCX27T2YcsyMA User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.31 Safari/530.5, gzip(gfe), gzip(gfe) Original-Xref: news.stanford.edu gnu.emacs.help:170130 comp.emacs:98271 X-Mailman-Approved-At: Thu, 18 Jun 2009 13:16:37 -0400 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:65370 Archived-At: On Jun 17, 1:34 pm, Michal wrote: > Hallo Group Members > > I have a string which contains national characters and I would like to > convert them to ASCII equivalents. > > for example: > =C3=B3 -> o > =C2=B3 -> l > =C2=BF -> z > > etc. > > one of ways is just using: > (replace-regexp-in-string "=C3=B3" "o" my_string) > > for each national character, but I believe that there is much better > way. here's my solution. I frequently need to replace several pairs, for many applications. e.g. html encoding, url encoding, alpha beta gamma to =CE=B1 =CE=B2 =CE=B3, Infinity and Pi to =E2=88=9E =CF=80, C-x to Ctrl+x= and M- to Alt+ , text replacement based basic math formula syntax transform between langs... etc. Here's the 4 basic functions i call in my other elisp programs. (defun replace-pairs-in-string (str pairs) "Replace string STR recursively by find/replace pairs PAIRS sequence. Example: (replace-pairs-in-string \"abcdef\" '([\"a\" \"1\"] [\"b\" \"2\"] [\"c\" \"3\"])) =E2=87=92 =E2=80=9C\"123de= f\"=E2=80=9D. The replacement is done recursively after each find/replace pair. Earlier replaced value may be replaced again. The replacement are literal and case sensitive. If you want the replacement to be case sensitive, set the global variable case-fold-search to nil. Like this: (let ((case-fold-search nil)) (replace-regexp-in-string-pairs ...) This function calls `replace-regexp-in-string' to do its work. See also `replace-regexp-pairs-in-string'." (let ((mystr str)) (setq mystr str) (mapc (lambda (x) (setq mystr (replace-regexp-in-string (regexp-quote (elt x 0)) (elt x 1) mystr t t))) pairs) mystr)) (defun replace-regexp-pairs-in-string (str pairs &optional fixedcase) "Replace string STR recursively by regex find/replace pairs PAIRS sequence. Form: (replace-regexp-in-string-pairs '([REGEX1 REPLACE1] [REGEX2 REPLACE2] ...) FIXEDCASE) The PAIRS can be any lisp sequence data type. The third argument FIXEDCASE, if non-nil, changes the case of the replacement in a smart way matching the letter case of the find string. If you want the regex to be case sensitive, set the global variable case-fold-search to nil. Like this: (let ((case-fold-search nil)) (replace-regexp-in-string-pairs ...) This function calls `replace-regexp-in-string' to do its work. See also `replace-pairs-in-string'." (let ((mystr str)) (setq mystr str) (mapc (lambda (x) (setq mystr (replace-regexp-in-string (elt x 0) (elt x 1) mystr fixedcase))) pairs) mystr)) (defun replace-pairs-region (start end pairs) "Replace regex string find/replace PAIRS in region. For detail, see `replace-pairs-in-string'." (let (mystr) (setq mystr (buffer-substring-no-properties start end)) (delete-region start end) (insert (replace-pairs-in-string mystr pairs)))) (defun replace-regexp-pairs-region (start end pairs &optional fixedcase) "Replace regex string find/replace PAIRS in region. For detail, see `replace-regexp-pairs-in-string'." (let (mystr) (setq mystr (buffer-substring-no-properties start end)) (delete-region start end) (insert (replace-regexp-pairs-in-string mystr pairs fixedcase)))) so i have about 10 other functions that does replacement of various types by calling one of the above. if all you want is just something one-time, as opposed to the above more general functions, pls see: Q: How to replace =E2=80=9C&=E2=80=9D by =E2=80=9C&=E2=80=9D in a regio= n? Place the following in your emacs init file: (defun replace-string-pairs-region (start end mylist) "Replace string pairs in region." (save-restriction (narrow-to-region start end) (mapc (lambda (arg) (goto-char (point-min)) (while (search-forward (car arg) nil t) (replace-match (cadr arg)) ) ) mylist ) ) ) (defun replace-html-chars (start end) "Replace =E2=80=9C<=E2=80=9D by =E2=80=9C<=E2=80=9D and other similar = HTML chars that needs to be encoded." (interactive "r") (replace-string-pairs-region start end '( ("&" "&") ("<" "<") (">" ">") ) ) ) With the above code, you can select a region, then press =E2=80=9CAlt+x replace-html-chars=E2=80=9D, and have all =E2=80=9C&=E2=80=9D, =E2=80=9C>= =E2=80=9D, =E2=80=9C<=E2=80=9D replaced by their encoded entity. You can define a keyboard shortcut for easy operation. You can also use the code to replace some HTML entities by their actual unicode characters. For example: “ =E2=86=92 =E2=80=9C ” =E2=86=92 =E2=80=9D é =E2=86=92 =C3=A9 © =E2=86=92 =C2=A9 -> =E2=86=92 =E2=86=92 =3D> =E2=86=92 =E2=87=92 Pi =E2=86=92 =CF=80 Infinity =E2=86=92 =E2=88=9E This makes the HTML source code more elegant and readible. (You need to declare your charset as one of unicode encodings. See Character Sets and Encoding in HTML) above from: =E2=80=A2 Emacs and HTML Tips http://xahlee.org/emacs/emacs_html.html emacs string replacement can be tricky, due to emacs's feature of smartly choosing letter case plus a weired way to toggle it. (usually using a global var instead of a function parameter, but not always because ...) Emacs regex can also be tricky because it's diff from Perl or posix and in particular has severe toothpick syndrom that basically doesn't exist in any other lang. For some detail, see: =E2=80=A2 Find and Replace with Emacs http://xahlee.org/emacs/emacs_find_replace.html =E2=80=A2 Text Pattern Matching in Emacs http://xahlee.org/emacs/emacs_regex.html Xah =E2=88=91 http://xahlee.org/ =E2=98=84