* encoding and content-length for url-http.el
@ 2005-06-10 15:41 Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Mark A. Hershberger @ 2005-06-10 15:41 UTC (permalink / raw)
[-- Attachment #1.1: Type: text/plain, Size: 1162 bytes --]
Could I get input on the following patch before I apply it? The first
part (using string-bytes instead of length) seems like a no-brainer.
The second part, I'm less sure about.
--- url-http.el 4 Jun 2005 18:37:16 -0000 1.14
+++ url-http.el 10 Jun 2005 18:36:06 -0000
@@ -259,7 +259,7 @@
(if url-request-data
(concat
"Content-length: " (number-to-string
- (length url-request-data))
+ (string-bytes url-request-data))
"\r\n"))
;; End request
"\r\n"
@@ -1066,6 +1066,9 @@
(set-process-buffer connection buffer)
(set-process-sentinel connection 'url-http-end-of-document-sentinel)
(set-process-filter connection 'url-http-generic-filter)
+ (set-process-coding-system connection
+ (detect-coding-string url-request-data t)
+ url-request-coding-system)
(process-send-string connection (url-http-create-request url))))
buffer))
--
http://mah.everybody.org/weblog/
GPG Fingerprint: 7E15 362D A32C DFAB E4D2 B37A 735E F10A 2DFC BFF5
More people are killed every year by pigs than by sharks, which shows
you how good we are at evaluating risk. -- Bruce Schneier
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
@ 2005-06-10 15:53 ` Mark A. Hershberger
2005-06-10 19:47 ` Stefan Monnier
2005-06-11 11:06 ` Kenichi Handa
2 siblings, 0 replies; 9+ messages in thread
From: Mark A. Hershberger @ 2005-06-10 15:53 UTC (permalink / raw)
[-- Attachment #1.1.1: Type: text/plain, Size: 880 bytes --]
On Fri, 2005-06-10 at 11:46 -0400, Mark A. Hershberger wrote:
> Could I get input on the following patch before I apply it? The first
> part (using string-bytes instead of length) seems like a no-brainer.
> The second part, I'm less sure about.
Full patch included this time.
And a fuller explanation.
I've come across circumstances where it appears that url-http.el isn't
doing the right thing. For instance, I'm using xml-rpc to post weblog
entries that contain unicode quotes. I need these changes to get it to
do the right thing. For (a little) more information, see
<http://elisp.info/archive/80614312>.
--
http://mah.everybody.org/weblog/
GPG Fingerprint: 7E15 362D A32C DFAB E4D2 B37A 735E F10A 2DFC BFF5
More people are killed every year by pigs than by sharks, which shows
you how good we are at evaluating risk. -- Bruce Schneier
[-- Attachment #1.1.2: tmp.diff --]
[-- Type: text/x-patch, Size: 1618 bytes --]
Index: lisp/url/url-http.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/url/url-http.el,v
retrieving revision 1.14
diff -u -r1.14 url-http.el
--- lisp/url/url-http.el 4 Jun 2005 18:37:16 -0000 1.14
+++ lisp/url/url-http.el 10 Jun 2005 18:45:05 -0000
@@ -259,7 +259,7 @@
(if url-request-data
(concat
"Content-length: " (number-to-string
- (length url-request-data))
+ (string-bytes url-request-data))
"\r\n"))
;; End request
"\r\n"
@@ -1066,6 +1066,9 @@
(set-process-buffer connection buffer)
(set-process-sentinel connection 'url-http-end-of-document-sentinel)
(set-process-filter connection 'url-http-generic-filter)
+ (set-process-coding-system connection
+ (detect-coding-string url-request-data t)
+ url-request-coding-system)
(process-send-string connection (url-http-create-request url))))
buffer))
Index: lisp/url/url-vars.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/url/url-vars.el,v
retrieving revision 1.9
diff -u -r1.9 url-vars.el
--- lisp/url/url-vars.el 9 Feb 2005 15:50:36 -0000 1.9
+++ lisp/url/url-vars.el 10 Jun 2005 18:45:05 -0000
@@ -218,6 +218,8 @@
(defvar url-request-data nil "Any data to send with the next request.")
+(defvar url-request-coding-system 'binary "The coding system to use for the request.")
+
(defvar url-request-extra-headers nil
"A list of extra headers to send with the next request.
Should be an assoc list of headers/contents.")
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
@ 2005-06-10 19:47 ` Stefan Monnier
2005-06-10 17:14 ` Mark A. Hershberger
2005-06-11 11:06 ` Kenichi Handa
2 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2005-06-10 19:47 UTC (permalink / raw)
Cc: emacs-devel
> Could I get input on the following patch before I apply it? The first
> part (using string-bytes instead of length) seems like a no-brainer.
> The second part, I'm less sure about.
> --- url-http.el 4 Jun 2005 18:37:16 -0000 1.14
> +++ url-http.el 10 Jun 2005 18:36:06 -0000
> @@ -259,7 +259,7 @@
> (if url-request-data
> (concat
> "Content-length: " (number-to-string
> - (length url-request-data))
> + (string-bytes url-request-data))
I must say I haven't looked at the code, but it's anything but
a no-brainer. I'd rather say that it's obviously wrong. `string-bytes'
will give you the number of bytes used by Emacs for the internal
representation of the string, not the number of bytes that the string will
use on the write.
Actually the two will be the same in 2 cases:
1 - url-request-data is a unibyte string (in which case `length' also
returns the same value and the match makes no difference).
2 - the process's coding system is `emacs-mule'.
The first case is probably what we want. The second is unlikely to ever
be right.
> + (defvar url-request-coding-system 'binary "The coding system to use for the request.")
[...]
> + (set-process-coding-system connection
> + (detect-coding-string url-request-data t)
> + url-request-coding-system)
This says "the data we send is binary (i.e. unibyte, thus case 1 above) and
the data we receive uses the coding system that we infer from
url-request-data". Does that sound right to you?
Assuming the data we send is url-request-data, it doesn't sound right to me.
Using binary when sending sounds right (assuming url-request-data is
unibyte, which is desirable). But when receiving we then probably would
want to use binary as well.
Also I'm not sure what is the purpose of url-request-coding-system.
Would it make sense to set it to something else?
If the change from length to string-bytes solves your problem, it means that
url-request-data is not unibyte (i.e. not a seq of bytes, but a seq of
chars), in which case using `binary' when sending can't be right.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 19:47 ` Stefan Monnier
@ 2005-06-10 17:14 ` Mark A. Hershberger
2005-06-10 21:22 ` Stefan Monnier
0 siblings, 1 reply; 9+ messages in thread
From: Mark A. Hershberger @ 2005-06-10 17:14 UTC (permalink / raw)
Cc: Emacs Development
[-- Attachment #1.1: Type: text/plain, Size: 1814 bytes --]
On Fri, 2005-06-10 at 15:47 -0400, Stefan Monnier wrote:
> > - (length url-request-data))
> > + (string-bytes url-request-data))
>
> I must say I haven't looked at the code, but it's anything but
> a no-brainer. I'd rather say that it's obviously wrong. `string-bytes'
> will give you the number of bytes used by Emacs for the internal
> representation of the string, not the number of bytes that the string will
> use on the write.
So I was wrong. But length is even more obviously wrong than
string-bytes.
The description for length says "If the string contains multibyte
characters, this is not necessarily the number of bytes in the string;
it is the number of characters. To get the number of bytes, use
`string-bytes'."
Which is why I thought this was a no-brainer. We want number of bytes,
not number of characters. RFC2616 says "The Content-Length
entity-header field indicates the size of the entity-body, in decimal
number of OCTETs, sent to the recipient"
> If the change from length to string-bytes solves your problem, it means that
> url-request-data is not unibyte (i.e. not a seq of bytes, but a seq of
> chars), in which case using `binary' when sending can't be right.
I've been using the patch successfully for some time on unicode strings
(seq of chars). It works for me and works were what is currently in CVS
fails.
I'm quite willing to concede that its wrong, but I've had trouble
finding documentation for this stuff. And, like I said, this works
better for me than what is in CVS.
--
http://mah.everybody.org/weblog/
GPG Fingerprint: 7E15 362D A32C DFAB E4D2 B37A 735E F10A 2DFC BFF5
More people are killed every year by pigs than by sharks, which shows
you how good we are at evaluating risk. -- Bruce Schneier
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 17:14 ` Mark A. Hershberger
@ 2005-06-10 21:22 ` Stefan Monnier
2005-06-16 4:21 ` Mark A. Hershberger
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2005-06-10 21:22 UTC (permalink / raw)
Cc: Emacs Development
>> > - (length url-request-data))
>> > + (string-bytes url-request-data))
>>
>> I must say I haven't looked at the code, but it's anything but
>> a no-brainer. I'd rather say that it's obviously wrong. `string-bytes'
>> will give you the number of bytes used by Emacs for the internal
>> representation of the string, not the number of bytes that the string will
>> use on the write.
> So I was wrong. But length is even more obviously wrong than
> string-bytes.
> The description for length says "If the string contains multibyte
> characters, this is not necessarily the number of bytes in the string;
> it is the number of characters. To get the number of bytes, use
> `string-bytes'."
> Which is why I thought this was a no-brainer. We want number of bytes,
> not number of characters. RFC2616 says "The Content-Length
> entity-header field indicates the size of the entity-body, in decimal
> number of OCTETs, sent to the recipient"
Problem is that the byte length depends on the encoding that will be used.
I.e. it's not just a property of the string itself.
I think the code should keep `length' while making sure that
url-request-data is always a sequence of bytes rather than a sequence of
strings (i.e. its content has already been explicitly encoded in whichever
coding-system was deemed appropriate).
>> If the change from length to string-bytes solves your problem, it means that
>> url-request-data is not unibyte (i.e. not a seq of bytes, but a seq of
>> chars), in which case using `binary' when sending can't be right.
> I've been using the patch successfully for some time on unicode strings
> (seq of chars). It works for me and works were what is currently in CVS
> fails.
I believe you, that your code worked on your test cases, but if it does it
seems to be by accident.
> I'm quite willing to concede that its wrong, but I've had trouble
> finding documentation for this stuff. And, like I said, this works
> better for me than what is in CVS.
Could you describe much more precisely what you're doing (especially how
you use the URL package: which functions of it you call, etc...).
Are you using WebDAV (i.e. url-dav.el)?
I've found url-dav.el to be pretty buggy and looking through it, I see some
places where a few more encode-coding-string wouldn't be amiss.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 21:22 ` Stefan Monnier
@ 2005-06-16 4:21 ` Mark A. Hershberger
2005-06-16 7:05 ` Kenichi Handa
0 siblings, 1 reply; 9+ messages in thread
From: Mark A. Hershberger @ 2005-06-16 4:21 UTC (permalink / raw)
Cc: Kenichi Handa, Emacs Development
[-- Attachment #1.1.1: Type: text/plain, Size: 3427 bytes --]
On Fri, 2005-06-10 at 17:22 -0400, Stefan Monnier wrote:
> Could you describe much more precisely what you're doing (especially
> how
> you use the URL package: which functions of it you call, etc...).
> Are you using WebDAV (i.e. url-dav.el)?
I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain.
However, to eliminate the reliance on external code, I've pulled the bit
from xml-rpc.el that makes the call to post to a weblog hosted on
Blogger.com:
(let ((url-debug t)) (setq url-request-data "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>")
(setq my-resp (unwind-protect
(save-excursion
(let ((url-working-buffer (get-buffer-create
(xml-rpc-get-temp-buffer-name)))
(url-request-method "POST")
(url-request-coding-system 'utf-8)
(url-http-attempt-keepalives nil)
(url-request-extra-headers (list
(cons "Content-Type" "text/xml; charset=utf-8"))))
(set-buffer url-working-buffer)
(let ((buffer (url-retrieve-synchronously "http://plant.blogger.com/api/RPC2"))
result)
(set-buffer buffer)
(url-http-parse-headers)
(if (> url-http-response-status 299)
(error "Error during request: %s"
url-http-response-status))
(url-extract-mime-headers)
(setq result (xml-rpc-request-process-buffer buffer))
result))))))
Without the patch that I supplied, this results in a server error:
"unexpected end of file found"
With the patch, it works perfectly. The result can be seen at
http://emacs-weblogger.blogspot.com/
(I originally used the LiveJournal platform, but decided it wasn't good
enough for this demonstration since it uses Perl's SOAP::Lite which is
very liberal in what it accepts. Blogger.com runs on Java and Java's
static typing makes it stricter in what it will accept when it comes to
XML-RPC types.)
For purposes of this report, I've attached a .zip file with my a elisp
snippets, a perl snippet and some packet traces of the working and
non-working code. I've elided my username and password, but you'll be
able to see that the server fails when I submit without the patch and
succeeds when I submit with the patch.
Further, the only differences in the packet traces are the way
"Iñtërnâtiônàlizætiøn" is encoded and the Content-Length header.
I'm fully prepared to admit that my patch only works in this case
because of some fluke, but the fact is that url-http.el works with the
patch and fails without it. I humbly ask your assistance in fixing
url-http.el.
--
http://mah.everybody.org/weblog/
GPG Fingerprint: 7E15 362D A32C DFAB E4D2 B37A 735E F10A 2DFC BFF5
More people are killed every year by pigs than by sharks, which shows
you how good we are at evaluating risk. -- Bruce Schneier
[-- Attachment #1.1.2: emacs-xml-url.zip --]
[-- Type: application/zip, Size: 6968 bytes --]
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-16 4:21 ` Mark A. Hershberger
@ 2005-06-16 7:05 ` Kenichi Handa
2005-06-16 16:05 ` Mark A. Hershberger
0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2005-06-16 7:05 UTC (permalink / raw)
Cc: monnier, emacs-devel
In article <1118895704.7936.19.camel@localhost.localdomain>, "Mark A. Hershberger" <mah@everybody.org> writes:
[...]
> I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain.
> However, to eliminate the reliance on external code, I've pulled the bit
> from xml-rpc.el that makes the call to post to a weblog hosted on
> Blogger.com:
> (let ((url-debug t)) (setq url-request-data "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>")
[...]
> result))))))
> Without the patch that I supplied, this results in a server error:
> "unexpected end of file found"
> With the patch, it works perfectly. The result can be seen at
> http://emacs-weblogger.blogspot.com/
In the code above, you set url-request-data to a multibyte
string.
All non-ascii characters in "Iñtërnâtiônàlizætiøn" are
iso-8859-1 and Emacs internally represents each character in
iso-8859-1 in 2-byte. That means string-bytes on
url-request-data returns, by chance, the same byte length of
the result of encoding it by utf-8.
(string-bytes "Iñtërnâtiônàlizætiøn")
== (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 'utf-8))
== 27
That's why your change to url-http.el works for the above
case. But, that is just coincidence. If the string
contains, for instance, an Ethiopic character, it doesn't
work.
What I still don't know is what value url-request-data
should have?
If it should be an already encoded string (and make it
callers responsibility to pre-encode a string), just using
`length' as now is ok. And you can use this kind of code:
< (let ((url-debug t)) (setq url-request-data (encode-coding-string "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>" 'utf-8))
Please try it after cancelling your change.
If it should be a multibyte string, the correct way to
calculate Content-length: is to use this code:
(length (encode-coding-string "Iñtërnâtiônàlizætiøn"
url-request-coding-system))
with your patch for introducing url-request-coding-system.
Anyway, this change of yours:
> + (set-process-coding-system connection
> + (detect-coding-string url-request-data t)
> + url-request-coding-system)
is bad as Stefan wrote. The second arg must be `binary',
and we should decode the received data according to the
contents (perhaps by parsing the header and detecting what
charset is specified and falling back to Emacs' code
detection routine).
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: encoding and content-length for url-http.el
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
2005-06-10 19:47 ` Stefan Monnier
@ 2005-06-11 11:06 ` Kenichi Handa
2 siblings, 0 replies; 9+ messages in thread
From: Kenichi Handa @ 2005-06-11 11:06 UTC (permalink / raw)
Cc: emacs-devel
In article <1118418076.8854.41.camel@localhost.localdomain>, "Mark A. Hershberger" <mah@everybody.org> writes:
> Could I get input on the following patch before I apply it? The first
> part (using string-bytes instead of length) seems like a no-brainer.
> The second part, I'm less sure about.
If url-request-data is a string of raw bytes (i.e. not-yet
decoded, or already-encoded), you had better use length
because it works both for multibyte-string and
unibyte-string.
Otherwise, you must at first encode it and check the
resulting length to decide the value for "Content-Length:",
and calling detect-coding-string on url-request-data is
nonsense.
---
Kenichi Handa
handa@m17n.org
> --- url-http.el 4 Jun 2005 18:37:16 -0000 1.14
> +++ url-http.el 10 Jun 2005 18:36:06 -0000
> @@ -259,7 +259,7 @@
> (if url-request-data
> (concat
> "Content-length: " (number-to-string
> - (length url-request-data))
> + (string-bytes url-request-data))
> "\r\n"))
> ;; End request
> "\r\n"
> @@ -1066,6 +1066,9 @@
> (set-process-buffer connection buffer)
> (set-process-sentinel connection 'url-http-end-of-document-sentinel)
> (set-process-filter connection 'url-http-generic-filter)
> + (set-process-coding-system connection
> + (detect-coding-string url-request-data t)
> + url-request-coding-system)
> (process-send-string connection (url-http-create-request url))))
> buffer))
> --
> http://mah.everybody.org/weblog/
> GPG Fingerprint: 7E15 362D A32C DFAB E4D2 B37A 735E F10A 2DFC BFF5
> More people are killed every year by pigs than by sharks, which shows
> you how good we are at evaluating risk. -- Bruce Schneier
> [1.2 This is a digitally signed message part <application/pgp-signature (7bit)>]
> [2 <text/plain; us-ascii (7bit)>]
> _______________________________________________
> Emacs-devel mailing list
> Emacs-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-06-16 16:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
2005-06-10 19:47 ` Stefan Monnier
2005-06-10 17:14 ` Mark A. Hershberger
2005-06-10 21:22 ` Stefan Monnier
2005-06-16 4:21 ` Mark A. Hershberger
2005-06-16 7:05 ` Kenichi Handa
2005-06-16 16:05 ` Mark A. Hershberger
2005-06-11 11:06 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).