all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* ncr (numeric character reference) to unicode
@ 2009-04-13 20:17 B. T. Raven
  2009-04-13 20:52 ` Eli Zaretskii
  2009-04-14  3:07 ` Miles Bader
  0 siblings, 2 replies; 12+ messages in thread
From: B. T. Raven @ 2009-04-13 20:17 UTC (permalink / raw)
  To: help-gnu-emacs

Does any of you know whether nxhtml has the capability to convert 
sequences like this:

שַׁלוֹם.
(shalom in Hebrew)

to the equivalent Unicode string. N. Walsh had a couple of .el files 
that implemented this I think, but they required cl to be loaded also.
Will Emacs 23.1 also support bidi when it is released?

Thanks,

Ed


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-13 20:17 ncr (numeric character reference) to unicode B. T. Raven
@ 2009-04-13 20:52 ` Eli Zaretskii
  2009-04-14  3:07 ` Miles Bader
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2009-04-13 20:52 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Mon, 13 Apr 2009 15:17:47 -0500
> From: "B. T. Raven" <nihil@nihilo.net>
> 
> Will Emacs 23.1 also support bidi when it is released?

Sadly, no.  Bidirectional editing needs quite a bit of supporting
code and major changes to some fundamental Emacs features, such as
text fill, and no one succeeded to write the code for that yet.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-13 20:17 ncr (numeric character reference) to unicode B. T. Raven
  2009-04-13 20:52 ` Eli Zaretskii
@ 2009-04-14  3:07 ` Miles Bader
  2009-04-14 16:42   ` B. T. Raven
  1 sibling, 1 reply; 12+ messages in thread
From: Miles Bader @ 2009-04-14  3:07 UTC (permalink / raw)
  To: help-gnu-emacs

"B. T. Raven" <nihil@nihilo.net> writes:
> Does any of you know whether nxhtml has the capability to convert
> sequences like this:
>
> &#1513;&#1473;&#1463;&#1500;&#1493;&#1465;&#1501;.
> (shalom in Hebrew)

The following should work:

   (defun expand-html-encoded-chars (start end)
     (interactive "r")
     (save-excursion
       (goto-char start)
       (while (re-search-forward "&#\\([0-9]+\\);" end t)
         (replace-match 
          (char-to-string
           (decode-char 'ucs (string-to-number (match-string 1))) )
          t t))))

-Miles

-- 
Guilt, n. The condition of one who is known to have committed an indiscretion,
as distinguished from the state of him who has covered his tracks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-14  3:07 ` Miles Bader
@ 2009-04-14 16:42   ` B. T. Raven
  2009-04-15 21:43     ` Stephen Berman
                       ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: B. T. Raven @ 2009-04-14 16:42 UTC (permalink / raw)
  To: help-gnu-emacs

Miles Bader wrote:
> "B. T. Raven" <nihil@nihilo.net> writes:
>> Does any of you know whether nxhtml has the capability to convert
>> sequences like this:
>>
>> &#1513;&#1473;&#1463;&#1500;&#1493;&#1465;&#1501;.
>> (shalom in Hebrew)
> 
> The following should work:
> 
>    (defun expand-html-encoded-chars (start end)
>      (interactive "r")
>      (save-excursion
>        (goto-char start)
>        (while (re-search-forward "&#\\([0-9]+\\);" end t)
>          (replace-match 
>           (char-to-string
>            (decode-char 'ucs (string-to-number (match-string 1))) )
>           t t))))
> 
> -Miles
> 

Thanks, Eli and Miles. The conversion works fine (with uncomposed 
glyphs, that is, points as separate characters, same as in the html 
codes). I referenced the command in an alias:

(defalias 'xhc 'expand-html-encoded-chars)

and then tried to do the same with this function:

(defun reverse-string (beg end)
   (interactive "r")
   (setq str (buffer-substring beg end))
   (apply #'string (nreverse (string-to-list str))))

but it doesn't seem to work, although it doesn't produce errors in a 
traceback buffer. What am I missing?

Thanks,

Ed



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-14 16:42   ` B. T. Raven
@ 2009-04-15 21:43     ` Stephen Berman
       [not found]     ` <mailman.5401.1239831823.31690.help-gnu-emacs@gnu.org>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Stephen Berman @ 2009-04-15 21:43 UTC (permalink / raw)
  To: help-gnu-emacs

On Tue, 14 Apr 2009 11:42:09 -0500 "B. T. Raven" <nihil@nihilo.net> wrote:

> Miles Bader wrote:
>> "B. T. Raven" <nihil@nihilo.net> writes:
>>> Does any of you know whether nxhtml has the capability to convert
>>> sequences like this:
>>>
>>> &#1513;&#1473;&#1463;&#1500;&#1493;&#1465;&#1501;.
>>> (shalom in Hebrew)
>>
>> The following should work:
>>
>>    (defun expand-html-encoded-chars (start end)
>>      (interactive "r")
>>      (save-excursion
>>        (goto-char start)
>>        (while (re-search-forward "&#\\([0-9]+\\);" end t)
>>          (replace-match           (char-to-string
>>            (decode-char 'ucs (string-to-number (match-string 1))) )
>>           t t))))
>>
>> -Miles
>>
>
> Thanks, Eli and Miles. The conversion works fine (with uncomposed glyphs, that
> is, points as separate characters, same as in the html codes). I referenced
> the command in an alias:
>
> (defalias 'xhc 'expand-html-encoded-chars)
>
> and then tried to do the same with this function:
>
> (defun reverse-string (beg end)
>   (interactive "r")
>   (setq str (buffer-substring beg end))
>   (apply #'string (nreverse (string-to-list str))))
>
> but it doesn't seem to work, although it doesn't produce errors in a traceback
> buffer. What am I missing?
>
> Thanks,
>
> Ed

Does this do what you want?

(defun reverse-string (beg end)
  (interactive "r")
  (xhc beg end)
  (let* ((beg (region-beginning))
	 (end (region-end))
	 (str1 (buffer-substring beg end))
	 (str2 (apply #'string (nreverse (string-to-list str1)))))
    (replace-string str1 str2 nil beg end)))

Steve Berman





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
       [not found]     ` <mailman.5401.1239831823.31690.help-gnu-emacs@gnu.org>
@ 2009-04-16  1:42       ` B. T. Raven
  2009-04-16  4:35         ` Kevin Rodgers
  2009-04-16 13:23         ` Stephen Berman
  0 siblings, 2 replies; 12+ messages in thread
From: B. T. Raven @ 2009-04-16  1:42 UTC (permalink / raw)
  To: help-gnu-emacs

Stephen Berman wrote:
> On Tue, 14 Apr 2009 11:42:09 -0500 "B. T. Raven" <nihil@nihilo.net> wrote:
> 
>> Miles Bader wrote:
>>> "B. T. Raven" <nihil@nihilo.net> writes:
>>>> Does any of you know whether nxhtml has the capability to convert
>>>> sequences like this:
>>>>
>>>> &#1513;&#1473;&#1463;&#1500;&#1493;&#1465;&#1501;.
>>>> (shalom in Hebrew)
>>> The following should work:
>>>
>>>    (defun expand-html-encoded-chars (start end)
>>>      (interactive "r")
>>>      (save-excursion
>>>        (goto-char start)
>>>        (while (re-search-forward "&#\\([0-9]+\\);" end t)
>>>          (replace-match           (char-to-string
>>>            (decode-char 'ucs (string-to-number (match-string 1))) )
>>>           t t))))
>>>
>>> -Miles
>>>
>> Thanks, Eli and Miles. The conversion works fine (with uncomposed glyphs, that
>> is, points as separate characters, same as in the html codes). I referenced
>> the command in an alias:
>>
>> (defalias 'xhc 'expand-html-encoded-chars)
>>
>> and then tried to do the same with this function:
>>
>> (defun reverse-string (beg end)
>>   (interactive "r")
>>   (setq str (buffer-substring beg end))
>>   (apply #'string (nreverse (string-to-list str))))
>>
>> but it doesn't seem to work, although it doesn't produce errors in a traceback
>> buffer. What am I missing?
>>
>> Thanks,
>>
>> Ed
> 
> Does this do what you want?
> 
> (defun reverse-string (beg end)
>   (interactive "r")
>   (xhc beg end)
>   (let* ((beg (region-beginning))
> 	 (end (region-end))
> 	 (str1 (buffer-substring beg end))
> 	 (str2 (apply #'string (nreverse (string-to-list str1)))))
>     (replace-string str1 str2 nil beg end)))
> 
> Steve Berman
> 
> 
> 

That would probably do a little more than I want. Miles' expand html 
function is only needed if someone sends these ncr sequences in email. 
Btw, why are beg and end calculated in the function if they are passed 
to it? This almost does what I want:

(defun reverse-bufsubstring (beg end)
   (interactive "r")
   (let* (
	 (str1 (buffer-substring beg end))
	 (str2 (apply #'string (nreverse (string-to-list str1)))))
     (replace-string str1 str2 nil beg end)))


except that it converts

same
one
as
before

into this:

erofeb
sa
eno
emas

so now that has to be reversed line by line rather than character by 
character. Anyway, all of this is just a kludge until the gurus come up 
with a real bidi functionality.

Thanks again,

Ed


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-14 16:42   ` B. T. Raven
  2009-04-15 21:43     ` Stephen Berman
       [not found]     ` <mailman.5401.1239831823.31690.help-gnu-emacs@gnu.org>
@ 2009-04-16  4:20     ` Kevin Rodgers
       [not found]     ` <mailman.5427.1239855645.31690.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 12+ messages in thread
From: Kevin Rodgers @ 2009-04-16  4:20 UTC (permalink / raw)
  To: help-gnu-emacs

B. T. Raven wrote:
> (defalias 'xhc 'expand-html-encoded-chars)
> 
> and then tried to do the same with this function:
> 
> (defun reverse-string (beg end)
>   (interactive "r")
>   (setq str (buffer-substring beg end))
>   (apply #'string (nreverse (string-to-list str))))
> 
> but it doesn't seem to work, although it doesn't produce errors in a 
> traceback buffer. What am I missing?

Miles' expand-html-encoded-chars function modifies the buffer, with 
replace-match.  Your reverse-string function generates a value, but
does not modify the buffer.

-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-16  1:42       ` B. T. Raven
@ 2009-04-16  4:35         ` Kevin Rodgers
  2009-04-16 13:23         ` Stephen Berman
  1 sibling, 0 replies; 12+ messages in thread
From: Kevin Rodgers @ 2009-04-16  4:35 UTC (permalink / raw)
  To: help-gnu-emacs

B. T. Raven wrote:
> That would probably do a little more than I want. Miles' expand html 
> function is only needed if someone sends these ncr sequences in email. 
> Btw, why are beg and end calculated in the function if they are passed 
> to it? This almost does what I want:
> 
> (defun reverse-bufsubstring (beg end)
>   (interactive "r")
>   (let* (
>      (str1 (buffer-substring beg end))
>      (str2 (apply #'string (nreverse (string-to-list str1)))))
>     (replace-string str1 str2 nil beg end)))
> 
> 
> except that it converts
> 
> same
> one
> as
> before
> 
> into this:
> 
> erofeb
> sa
> eno
> emas
> 
> so now that has to be reversed line by line rather than character by 
> character. Anyway, all of this is just a kludge until the gurus come up 
> with a real bidi functionality.

(defun reverse-region-by-line (beg end)
   (interactive "r")
   (save-excursion
     (goto-char beg)
     (while (and (< (point) end) (re-search-forward "\\=.*$" end t))
       (replace-match (apply #'string
			    (nreverse (string-to-list (match-string 0)))))
       (forward-line))))


-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-16  1:42       ` B. T. Raven
  2009-04-16  4:35         ` Kevin Rodgers
@ 2009-04-16 13:23         ` Stephen Berman
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Berman @ 2009-04-16 13:23 UTC (permalink / raw)
  To: help-gnu-emacs

On Wed, 15 Apr 2009 20:42:49 -0500 "B. T. Raven" <nihil@nihilo.net> wrote:

> Stephen Berman wrote:
[...]
>> Does this do what you want?
>>
>> (defun reverse-string (beg end)
>>   (interactive "r")
>>   (xhc beg end)
>>   (let* ((beg (region-beginning))
>> 	 (end (region-end))
>> 	 (str1 (buffer-substring beg end))
>> 	 (str2 (apply #'string (nreverse (string-to-list str1)))))
>>     (replace-string str1 str2 nil beg end)))
>>
>> Steve Berman
>>
>>
>>
>
> That would probably do a little more than I want. Miles' expand html function
> is only needed if someone sends these ncr sequences in email. Btw, why are beg
> and end calculated in the function if they are passed to it? 

When xhc is called on beg and end (since I mistakenly thought you wanted
to convert the HTML entities and reverse the result in one blow) the
region is changed, so it has to be recalculated for the arguments of
buffer-substring (actually, only region-end changes, so beg really
shouldn't be recalculated).  Of course, new variables could have been
used in the let* clause.

Steve Berman





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
       [not found]     ` <mailman.5427.1239855645.31690.help-gnu-emacs@gnu.org>
@ 2009-04-17  3:39       ` B. T. Raven
  2009-04-17 15:19         ` Stephen Berman
       [not found]         ` <mailman.5538.1239981609.31690.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: B. T. Raven @ 2009-04-17  3:39 UTC (permalink / raw)
  To: help-gnu-emacs

Kevin Rodgers wrote:
> B. T. Raven wrote:
>> (defalias 'xhc 'expand-html-encoded-chars)
>>
>> and then tried to do the same with this function:
>>
>> (defun reverse-string (beg end)
>>   (interactive "r")
>>   (setq str (buffer-substring beg end))
>>   (apply #'string (nreverse (string-to-list str))))
>>
>> but it doesn't seem to work, although it doesn't produce errors in a 
>> traceback buffer. What am I missing?
> 
> Miles' expand-html-encoded-chars function modifies the buffer, with 
> replace-match.  Your reverse-string function generates a value, but
> does not modify the buffer.
> 

Yes, of course. That finally dawned on me. Thanks Kevin and Steve. This 
is finally what I want:

(defun reverse-string (str)
   (apply #'string (nreverse (string-to-list str))))


(defun reverse-region-by-line (beg end)
   (interactive "r")
   (save-excursion
     (goto-char beg)
     (while (and (< (point) end) (re-search-forward "\\=.*$" end t))
       (replace-match (reverse-string (match-string 0)))
       (forward-line))))

But now I find that if I copy-paste from Emacs 23.0.90.1, the Greek letters


αβγδ

appear in Mozilla Tbird (here) in the original order but


בִּּרֵאשׁיתבָּרָּאא לֹהִים אלתשָּׁמַיִם וְ אלת הָ ּאָרֶ ׃


is automatically reversed without running the above command on its 
region. ??? Is there invisible bidi info in the string or is it just the 
fact that the characters are Hebrew that causes this?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
  2009-04-17  3:39       ` B. T. Raven
@ 2009-04-17 15:19         ` Stephen Berman
       [not found]         ` <mailman.5538.1239981609.31690.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Berman @ 2009-04-17 15:19 UTC (permalink / raw)
  To: help-gnu-emacs

On Thu, 16 Apr 2009 22:39:43 -0500 "B. T. Raven" <nihil@nihilo.net> wrote:

> But now I find that if I copy-paste from Emacs 23.0.90.1, the Greek letters
>
>
> αβγδ
>
> appear in Mozilla Tbird (here) in the original order but
>
>
> בִּּרֵאשׁיתבָּרָּאא לֹהִים אלתשָּׁמַיִם וְ אלת הָ ּאָרֶ ׃
>
>
> is automatically reversed without running the above command on its region. ???
> Is there invisible bidi info in the string or is it just the fact that the
> characters are Hebrew that causes this?

Presumably the latter.  I guess Thunderbird works like OpenOffice.org,
which also automatically reverses the Hebrew text, and whose Help entry
for "bi-directional writing" says: 
,----
| Currently, OpenOffice.org supports Hindi, Thai, Hebrew, and Arabic as
| CTL [Complex Text Layout] languages.  If you select the text flow from
| right to left, embedded Western text still runs from left to
| right. The cursor responds to the arrow keys in that Right Arrow moves
| it "to the text end" and Left Arrow "to the text start".
`----

Steve Berman





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ncr (numeric character reference) to unicode
       [not found]         ` <mailman.5538.1239981609.31690.help-gnu-emacs@gnu.org>
@ 2009-04-17 23:20           ` B. T. Raven
  0 siblings, 0 replies; 12+ messages in thread
From: B. T. Raven @ 2009-04-17 23:20 UTC (permalink / raw)
  To: help-gnu-emacs

Stephen Berman wrote:
> On Thu, 16 Apr 2009 22:39:43 -0500 "B. T. Raven" <nihil@nihilo.net> wrote:
> 
>> But now I find that if I copy-paste from Emacs 23.0.90.1, the Greek letters
>>
>>
>> αβγδ
>>
>> appear in Mozilla Tbird (here) in the original order but
>>
>>
>>

Saw this right to left order in Emacs 23:

בִּּרֵאשׁיתבָּרָּאא לֹהִים אלתשָּׁמַיִם וְ אלת הָ ּאָרֶ ׃

Copy-pasted here in Tbird:


׃ ֶרָאּ ָה תלא ְו םִיַמָּׁשתלא םיִהֹל אאָּרָּבתיׁשאֵרִּּב

First line C-c C-v (CUA) here in Tbird:

בִּּרֵאשׁיתבָּרָּאא לֹהִים אלתשָּׁמַיִם וְ אלת הָ ּאָרֶ ׃

And it doesn't matter which direction the Hebrew text is selected in. In 
fact S-arrow won't move over the Hebrew, character-by-character but it 
selects the whole line, whether characters are in forward or reverse 
order and whether cursor starts at left or right.

Ed




>>
>>
>> is automatically reversed without running the above command on its region. ???
>> Is there invisible bidi info in the string or is it just the fact that the
>> characters are Hebrew that causes this?
> 
> Presumably the latter.  I guess Thunderbird works like OpenOffice.org,
> which also automatically reverses the Hebrew text, and whose Help entry
> for "bi-directional writing" says: 
> ,----
> | Currently, OpenOffice.org supports Hindi, Thai, Hebrew, and Arabic as
> | CTL [Complex Text Layout] languages.  If you select the text flow from
> | right to left, embedded Western text still runs from left to
> | right. The cursor responds to the arrow keys in that Right Arrow moves
> | it "to the text end" and Left Arrow "to the text start".
> `----
> 
> Steve Berman


> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-04-17 23:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-13 20:17 ncr (numeric character reference) to unicode B. T. Raven
2009-04-13 20:52 ` Eli Zaretskii
2009-04-14  3:07 ` Miles Bader
2009-04-14 16:42   ` B. T. Raven
2009-04-15 21:43     ` Stephen Berman
     [not found]     ` <mailman.5401.1239831823.31690.help-gnu-emacs@gnu.org>
2009-04-16  1:42       ` B. T. Raven
2009-04-16  4:35         ` Kevin Rodgers
2009-04-16 13:23         ` Stephen Berman
2009-04-16  4:20     ` Kevin Rodgers
     [not found]     ` <mailman.5427.1239855645.31690.help-gnu-emacs@gnu.org>
2009-04-17  3:39       ` B. T. Raven
2009-04-17 15:19         ` Stephen Berman
     [not found]         ` <mailman.5538.1239981609.31690.help-gnu-emacs@gnu.org>
2009-04-17 23:20           ` B. T. Raven

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.