unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
@ 2007-07-08 22:23 Richard Stallman
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Stallman @ 2007-07-08 22:23 UTC (permalink / raw)
  To: emacs-devel

Would someone please DTRT then ack?

------- Start of forwarded message -------
X-Spam-Status: No, score=1.3 required=5.0 tests=RCVD_NUMERIC_HELO,
	SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY autolearn=no version=3.1.0
To: emacs-devel@gnu.org
From: William Xu <william.xwl@gmail.com>
Date: Wed, 04 Jul 2007 18:34:51 +0800
Organization: the Church of Emacs
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Subject: webjump-url-encode and non-ascii characters

webjump-url-encode fails to encode non-ascii characters correctly.

Here's a patch: 

- --- webjump.el	2007-06-03 14:54:53.000000000 +0800
+++ webjump.el.new	2007-07-04 18:29:41.000000000 +0800
@@ -451,14 +451,13 @@
 
 (defun webjump-url-encode (str)
   (mapconcat '(lambda (c)
- -		(cond ((= c 32) "+")
- -		      ((or (and (>= c ?a) (<= c ?z))
- -			   (and (>= c ?A) (<= c ?Z))
- -			   (and (>= c ?0) (<= c ?9)))
- -		       (char-to-string c))
- -		      (t (upcase (format "%%%02x" c)))))
- -	     str
- -	     ""))
+               (let ((s (char-to-string c)))
+                 (cond ((string= s " ") "+")
+                       ((string-match "[a-zA-Z_.-/]" s) s)
+                       (t (upcase (format "%%%02x" c))))))
+             (string-to-list
+              (encode-coding-string str buffer-file-coding-system))
+             ""))
 
 (defun webjump-url-fix (url)
   (if (webjump-null-or-blank-string-p url)

- -- 
William

????????
????????????
????????????????????????????????
????????????????????????????????



_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
------- End of forwarded message -------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
       [not found] <E1ICpXt-00033U-DV@fencepost.gnu.org>
@ 2007-07-24  1:52 ` Kenichi Handa
  2007-07-24  3:36   ` William
  2007-07-24 22:16   ` Richard Stallman
  0 siblings, 2 replies; 16+ messages in thread
From: Kenichi Handa @ 2007-07-24  1:52 UTC (permalink / raw)
  To: rms; +Cc: william.xwl, emacs-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=ISO-2022-JP-2, Size: 2487 bytes --]

Sorry for the late response.

In article <E1ICpXt-00033U-DV@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> [I sent this message a weeks ago but did not get a response.
> Is this in your area?  I would expect it is, since it deals
> with non-ASCII characters, but maybe it isn't.  If it isn't,
> please say so.

> Please respond!]

> Is this patch correct?  Most particularly, is it correct to use
> buffer-file-coding-system for a URL?  I have doubts about that.

I doubts too.  I'm not the expert of URL (or URI) encoding,
but, as far as I remember, non-ASCII characters in URL must
be at first encoded by UTF-8 then by %-encoding.  So, for
instance, ^[$(D+"^[(B (U+00E0) must be encoded to "%C3%80".

---
Kenichi Handa
handa@m17n.org

> ------- Start of forwarded message -------
> X-Spam-Status: No, score=1.3 required=5.0 tests=RCVD_NUMERIC_HELO,
> 	SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY autolearn=no version=3.1.0
> To: emacs-devel@gnu.org
> From: William Xu <william.xwl@gmail.com>
> Date: Wed, 04 Jul 2007 18:34:51 +0800
> Organization: the Church of Emacs
> Mime-Version: 1.0
> Content-Type: text/plain; charset=utf-8
> Subject: webjump-url-encode and non-ascii characters

> webjump-url-encode fails to encode non-ascii characters correctly.

> Here's a patch: 

> - --- webjump.el	2007-06-03 14:54:53.000000000 +0800
> +++ webjump.el.new	2007-07-04 18:29:41.000000000 +0800
> @@ -451,14 +451,13 @@
 
>  (defun webjump-url-encode (str)
>    (mapconcat '(lambda (c)
> - -		(cond ((= c 32) "+")
> - -		      ((or (and (>= c ?a) (<= c ?z))
> - -			   (and (>= c ?A) (<= c ?Z))
> - -			   (and (>= c ?0) (<= c ?9)))
> - -		       (char-to-string c))
> - -		      (t (upcase (format "%%%02x" c)))))
> - -	     str
> - -	     ""))
> +               (let ((s (char-to-string c)))
> +                 (cond ((string= s " ") "+")
> +                       ((string-match "[a-zA-Z_.-/]" s) s)
> +                       (t (upcase (format "%%%02x" c))))))
> +             (string-to-list
> +              (encode-coding-string str buffer-file-coding-system))
> +             ""))
 
>  (defun webjump-url-fix (url)
>    (if (webjump-null-or-blank-string-p url)

> - -- 
> William

> ????????
> ????????????
> ????????????????????????????????
> ????????????????????????????????



> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-devel
> ------- End of forwarded message -------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24  1:52 ` [william.xwl@gmail.com: webjump-url-encode and non-ascii characters] Kenichi Handa
@ 2007-07-24  3:36   ` William
  2007-07-24  4:16     ` Kenichi Handa
  2007-07-24 22:16   ` Richard Stallman
  1 sibling, 1 reply; 16+ messages in thread
From: William @ 2007-07-24  3:36 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 549 bytes --]

2007/7/24, Kenichi Handa <handa@m17n.org>:

> > Is this patch correct?  Most particularly, is it correct to use
> > buffer-file-coding-system for a URL?  I have doubts about that.
>
> I doubts too.  I'm not the expert of URL (or URI) encoding,
> but, as far as I remember, non-ASCII characters in URL must
> be at first encoded by UTF-8 then by %-encoding.  So, for
> instance, �+" (U+00E0) must be encoded to "%C3%80".

Oh, you are right. buffer-file-coding-system should be replaced by
'utf-8. It happens to be the same value here.

-- 
William

[-- Attachment #2: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24  3:36   ` William
@ 2007-07-24  4:16     ` Kenichi Handa
  2007-07-24 12:31       ` William Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Kenichi Handa @ 2007-07-24  4:16 UTC (permalink / raw)
  To: William; +Cc: rms, emacs-devel

In article <ded049930707232036j5abbe258j290c4266f69410a7@mail.gmail.com>, William <william.xwl@gmail.com> writes:

> 2007/7/24, Kenichi Handa <handa@m17n.org>:
> > > Is this patch correct?  Most particularly, is it correct to use
> > > buffer-file-coding-system for a URL?  I have doubts about that.
> >
> > I doubts too.  I'm not the expert of URL (or URI) encoding,
> > but, as far as I remember, non-ASCII characters in URL must
> > be at first encoded by UTF-8 then by %-encoding.  So, for
> > instance, �+" (U+00E0) must be encoded to "%C3%80".

> Oh, you are right. buffer-file-coding-system should be replaced by
> 'utf-8. It happens to be the same value here.

One more note about your change.

As mapconcat accepts a sequence (including string),
string-to-list is not necessary.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24  4:16     ` Kenichi Handa
@ 2007-07-24 12:31       ` William Xu
  2007-07-25  0:55         ` Kenichi Handa
  0 siblings, 1 reply; 16+ messages in thread
From: William Xu @ 2007-07-24 12:31 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> One more note about your change.
>
> As mapconcat accepts a sequence (including string),
> string-to-list is not necessary.

Correct, thanks ! Here's the updated patch.

cvs diff: Diffing .
Index: webjump.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/net/webjump.el,v
retrieving revision 1.4.6.10
diff -u -r1.4.6.10 webjump.el
--- webjump.el	30 May 2007 14:40:32 -0000	1.4.6.10
+++ webjump.el	24 Jul 2007 12:30:46 -0000
@@ -451,14 +451,12 @@
 
 (defun webjump-url-encode (str)
   (mapconcat '(lambda (c)
-		(cond ((= c 32) "+")
-		      ((or (and (>= c ?a) (<= c ?z))
-			   (and (>= c ?A) (<= c ?Z))
-			   (and (>= c ?0) (<= c ?9)))
-		       (char-to-string c))
-		      (t (upcase (format "%%%02x" c)))))
-	     str
-	     ""))
+                (let ((s (char-to-string c)))
+                  (cond ((string= s " ") "+")
+                        ((string-match "[a-zA-Z_.-/]" s) s)
+                        (t (upcase (format "%%%02x" c))))))
+             (encode-coding-string str 'utf-8)
+             ""))
 
 (defun webjump-url-fix (url)
   (if (webjump-null-or-blank-string-p url)

-- 
William

《蜀先主庙》
作者:刘禹锡
天地英雄气,千秋尚凛然。
势分三足鼎,业复五铢钱。
得相能开国,生儿不象贤。
凄凉蜀故妓,来舞魏宫前。

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24  1:52 ` [william.xwl@gmail.com: webjump-url-encode and non-ascii characters] Kenichi Handa
  2007-07-24  3:36   ` William
@ 2007-07-24 22:16   ` Richard Stallman
  2007-07-25  0:56     ` Kenichi Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Stallman @ 2007-07-24 22:16 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: william.xwl, emacs-devel

    I doubts too.  I'm not the expert of URL (or URI) encoding,
    but, as far as I remember, non-ASCII characters in URL must
    be at first encoded by UTF-8 then by %-encoding.  So, for
    instance, ?? (U+00E0) must be encoded to "%C3%80".

Can you implement it with that change?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24 12:31       ` William Xu
@ 2007-07-25  0:55         ` Kenichi Handa
  2007-07-25  2:35           ` William Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Kenichi Handa @ 2007-07-25  0:55 UTC (permalink / raw)
  To: William Xu; +Cc: emacs-devel

In article <87sl7easqc.fsf@www.williamxu.com>, William Xu <william.xwl@gmail.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > One more note about your change.
> >
> > As mapconcat accepts a sequence (including string),
> > string-to-list is not necessary.

> Correct, thanks ! Here's the updated patch.

Ok, I installed it as a tiny change.  Have you already
signed the assignment paper to FSF?  If so, I'll delete
"(tiny change)" part.

2007-07-25  William Xu  <william.xwl@gmail.com>  (tiny change)

	* net/webjump.el (webjump-url-encode): Fix for non-ASCII
	characters.

---
Kenichi Handa
handa@m17n.org

> cvs diff: Diffing .
> Index: webjump.el
> ===================================================================
> RCS file: /sources/emacs/emacs/lisp/net/webjump.el,v
> retrieving revision 1.4.6.10
> diff -u -r1.4.6.10 webjump.el
> --- webjump.el	30 May 2007 14:40:32 -0000	1.4.6.10
> +++ webjump.el	24 Jul 2007 12:30:46 -0000
> @@ -451,14 +451,12 @@
 
>  (defun webjump-url-encode (str)
>    (mapconcat '(lambda (c)
> -		(cond ((= c 32) "+")
> -		      ((or (and (>= c ?a) (<= c ?z))
> -			   (and (>= c ?A) (<= c ?Z))
> -			   (and (>= c ?0) (<= c ?9)))
> -		       (char-to-string c))
> -		      (t (upcase (format "%%%02x" c)))))
> -	     str
> -	     ""))
> +                (let ((s (char-to-string c)))
> +                  (cond ((string= s " ") "+")
> +                        ((string-match "[a-zA-Z_.-/]" s) s)
> +                        (t (upcase (format "%%%02x" c))))))
> +             (encode-coding-string str 'utf-8)
> +             ""))
 
>  (defun webjump-url-fix (url)
>    (if (webjump-null-or-blank-string-p url)

> -- 
> William

> 《蜀先主庙》
> 作者:刘禹锡
> 天地英雄气,千秋尚凛然。
> 势分三足鼎,业复五铢钱。
> 得相能开国,生儿不象贤。
> 凄凉蜀故妓,来舞魏宫前。



> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-24 22:16   ` Richard Stallman
@ 2007-07-25  0:56     ` Kenichi Handa
  0 siblings, 0 replies; 16+ messages in thread
From: Kenichi Handa @ 2007-07-25  0:56 UTC (permalink / raw)
  To: rms; +Cc: william.xwl, emacs-devel

In article <E1IDSgb-0006CE-Vk@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     I doubts too.  I'm not the expert of URL (or URI) encoding,
>     but, as far as I remember, non-ASCII characters in URL must
>     be at first encoded by UTF-8 then by %-encoding.  So, for
>     instance, ?? (U+00E0) must be encoded to "%C3%80".

> Can you implement it with that change?

As he sent me a fixed version, I installed it in the trunk
as a tiny change.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  0:55         ` Kenichi Handa
@ 2007-07-25  2:35           ` William Xu
  2007-07-25  4:30             ` Stefan Monnier
  0 siblings, 1 reply; 16+ messages in thread
From: William Xu @ 2007-07-25  2:35 UTC (permalink / raw)
  To: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> Ok, I installed it as a tiny change.  Have you already
> signed the assignment paper to FSF?  If so, I'll delete
> "(tiny change)" part.

Yes, i do. 

-- 
William

题目:《棋》
作者:王安石(1021-1086)
莫将戏事扰真情,且可随缘道我赢。
战罢两奁分白黑,一枰何处有亏成。

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  2:35           ` William Xu
@ 2007-07-25  4:30             ` Stefan Monnier
  2007-07-25  4:50               ` William Xu
  2007-07-25 20:11               ` Richard Stallman
  0 siblings, 2 replies; 16+ messages in thread
From: Stefan Monnier @ 2007-07-25  4:30 UTC (permalink / raw)
  To: William Xu; +Cc: emacs-devel

>> Ok, I installed it as a tiny change.  Have you already
>> signed the assignment paper to FSF?  If so, I'll delete
>> "(tiny change)" part.

> Yes, i do. 

I don't see you in the copyright.list.
I grepped for "Xu" and got two apparently unrelated entries.  Did you use
some other name and email address?  Or is it vey recent (and the FSF
hasn't yet received or processed the papers)?


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  4:30             ` Stefan Monnier
@ 2007-07-25  4:50               ` William Xu
  2007-07-25  6:50                 ` Stefan Monnier
  2007-07-25 20:11               ` Richard Stallman
  1 sibling, 1 reply; 16+ messages in thread
From: William Xu @ 2007-07-25  4:50 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I don't see you in the copyright.list.
> I grepped for "Xu" and got two apparently unrelated entries.  Did you use
> some other name and email address?  Or is it vey recent (and the FSF
> hasn't yet received or processed the papers)?

Hmm, i might have misunderstood. I have signed the assignment paper to
FSF in EMMS project(nearly two years ago). Does this count ?

-- 
William

《新嫁娘》
作者:王建
三日入厨下,洗手作羹汤。
未谙姑食性,先遣小姑尝。

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  4:50               ` William Xu
@ 2007-07-25  6:50                 ` Stefan Monnier
  2007-07-25  6:57                   ` William Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2007-07-25  6:50 UTC (permalink / raw)
  To: William Xu; +Cc: emacs-devel

>> I don't see you in the copyright.list.
>> I grepped for "Xu" and got two apparently unrelated entries.  Did you use
>> some other name and email address?  Or is it vey recent (and the FSF
>> hasn't yet received or processed the papers)?

> Hmm, i might have misunderstood. I have signed the assignment paper to
> FSF in EMMS project(nearly two years ago). Does this count ?

It wouldn't since the code you submit isn't part of EMMS.
But I can't find your entry for EMMS either in the list anyway.
Please contact assign@gnu.org to figure out what's going on.


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  6:50                 ` Stefan Monnier
@ 2007-07-25  6:57                   ` William Xu
  2007-07-25 14:48                     ` Stefan Monnier
  0 siblings, 1 reply; 16+ messages in thread
From: William Xu @ 2007-07-25  6:57 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> It wouldn't since the code you submit isn't part of EMMS.

Okay, i don't know what to do here, just do what's appropriate then.

> But I can't find your entry for EMMS either in the list anyway.
> Please contact assign@gnu.org to figure out what's going on.

Ah ! at that time, i used this name: William XWL. Could you confirm? 

ps. Is it possible to update it ?

-- 
William

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  6:57                   ` William Xu
@ 2007-07-25 14:48                     ` Stefan Monnier
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2007-07-25 14:48 UTC (permalink / raw)
  To: William Xu; +Cc: emacs-devel

>> It wouldn't since the code you submit isn't part of EMMS.

> Okay, i don't know what to do here, just do what's appropriate then.

>> But I can't find your entry for EMMS either in the list anyway.
>> Please contact assign@gnu.org to figure out what's going on.

> Ah ! at that time, i used this name: William XWL. Could you confirm? 

Oh, yes, it's there.

> ps. Is it possible to update it ?

Probably.  But you'd have to ask people at assign@gnu.org to do that.


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25  4:30             ` Stefan Monnier
  2007-07-25  4:50               ` William Xu
@ 2007-07-25 20:11               ` Richard Stallman
  2007-07-26  2:22                 ` William Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Stallman @ 2007-07-25 20:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: william.xwl, emacs-devel

    I grepped for "Xu" and got two apparently unrelated entries.  Did you use
    some other name and email address?  Or is it vey recent (and the FSF
    hasn't yet received or processed the papers)?

William, we recorded your name wrong in copyright.list, as William
Xwl.  I will get the error corrected.

However, that assignment is for EMMS, not for Emacs changes.
Would you like to sign another for Emacs changes?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [william.xwl@gmail.com: webjump-url-encode and non-ascii characters]
  2007-07-25 20:11               ` Richard Stallman
@ 2007-07-26  2:22                 ` William Xu
  0 siblings, 0 replies; 16+ messages in thread
From: William Xu @ 2007-07-26  2:22 UTC (permalink / raw)
  To: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> William, we recorded your name wrong in copyright.list, as William
> Xwl.  I will get the error corrected.

Thanks ! 

> However, that assignment is for EMMS, not for Emacs changes.
> Would you like to sign another for Emacs changes?

I'd be happy to.

-- 
William

《赋得暮雨送李胄》
作者:韦应物
楚江微雨里,建业暮钟时。
漠漠帆来重,冥冥鸟去迟。
海门深不见,浦树远含滋。
相送情无限,沾襟比散丝。

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-07-26  2:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1ICpXt-00033U-DV@fencepost.gnu.org>
2007-07-24  1:52 ` [william.xwl@gmail.com: webjump-url-encode and non-ascii characters] Kenichi Handa
2007-07-24  3:36   ` William
2007-07-24  4:16     ` Kenichi Handa
2007-07-24 12:31       ` William Xu
2007-07-25  0:55         ` Kenichi Handa
2007-07-25  2:35           ` William Xu
2007-07-25  4:30             ` Stefan Monnier
2007-07-25  4:50               ` William Xu
2007-07-25  6:50                 ` Stefan Monnier
2007-07-25  6:57                   ` William Xu
2007-07-25 14:48                     ` Stefan Monnier
2007-07-25 20:11               ` Richard Stallman
2007-07-26  2:22                 ` William Xu
2007-07-24 22:16   ` Richard Stallman
2007-07-25  0:56     ` Kenichi Handa
2007-07-08 22:23 Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).