From: Kenichi Handa <handa@m17n.org>
Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
Subject: Re: encoding and content-length for url-http.el
Date: Thu, 16 Jun 2005 16:05:50 +0900 [thread overview]
Message-ID: <E1DioRi-0006bF-00@etlken> (raw)
In-Reply-To: <1118895704.7936.19.camel@localhost.localdomain> (mah@everybody.org)
In article <1118895704.7936.19.camel@localhost.localdomain>, "Mark A. Hershberger" <mah@everybody.org> writes:
[...]
> I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain.
> However, to eliminate the reliance on external code, I've pulled the bit
> from xml-rpc.el that makes the call to post to a weblog hosted on
> Blogger.com:
> (let ((url-debug t)) (setq url-request-data "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>")
[...]
> result))))))
> Without the patch that I supplied, this results in a server error:
> "unexpected end of file found"
> With the patch, it works perfectly. The result can be seen at
> http://emacs-weblogger.blogspot.com/
In the code above, you set url-request-data to a multibyte
string.
All non-ascii characters in "Iñtërnâtiônàlizætiøn" are
iso-8859-1 and Emacs internally represents each character in
iso-8859-1 in 2-byte. That means string-bytes on
url-request-data returns, by chance, the same byte length of
the result of encoding it by utf-8.
(string-bytes "Iñtërnâtiônàlizætiøn")
== (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 'utf-8))
== 27
That's why your change to url-http.el works for the above
case. But, that is just coincidence. If the string
contains, for instance, an Ethiopic character, it doesn't
work.
What I still don't know is what value url-request-data
should have?
If it should be an already encoded string (and make it
callers responsibility to pre-encode a string), just using
`length' as now is ok. And you can use this kind of code:
< (let ((url-debug t)) (setq url-request-data (encode-coding-string "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>" 'utf-8))
Please try it after cancelling your change.
If it should be a multibyte string, the correct way to
calculate Content-length: is to use this code:
(length (encode-coding-string "Iñtërnâtiônàlizætiøn"
url-request-coding-system))
with your patch for introducing url-request-coding-system.
Anyway, this change of yours:
> + (set-process-coding-system connection
> + (detect-coding-string url-request-data t)
> + url-request-coding-system)
is bad as Stefan wrote. The second arg must be `binary',
and we should decode the received data according to the
contents (perhaps by parsing the header and detecting what
charset is specified and falling back to Emacs' code
detection routine).
---
Kenichi Handa
handa@m17n.org
next prev parent reply other threads:[~2005-06-16 7:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
2005-06-10 19:47 ` Stefan Monnier
2005-06-10 17:14 ` Mark A. Hershberger
2005-06-10 21:22 ` Stefan Monnier
2005-06-16 4:21 ` Mark A. Hershberger
2005-06-16 7:05 ` Kenichi Handa [this message]
2005-06-16 16:05 ` Mark A. Hershberger
2005-06-11 11:06 ` Kenichi Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1DioRi-0006bF-00@etlken \
--to=handa@m17n.org \
--cc=emacs-devel@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).