all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
Subject: Re: encoding and content-length for url-http.el
Date: Thu, 16 Jun 2005 16:05:50 +0900	[thread overview]
Message-ID: <E1DioRi-0006bF-00@etlken> (raw)
In-Reply-To: <1118895704.7936.19.camel@localhost.localdomain> (mah@everybody.org)

In article <1118895704.7936.19.camel@localhost.localdomain>, "Mark A. Hershberger" <mah@everybody.org> writes:
[...]
> I'm not using url-dav.el -- I'm using xml-rpc.el which I maintain.

> However, to eliminate the reliance on external code, I've pulled the bit
> from xml-rpc.el that makes the call to post to a weblog hosted on
> Blogger.com:

>         (let ((url-debug t)) (setq url-request-data "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>")
        
[...]
>         	    result))))))

> Without the patch that I supplied, this results in a server error:
> "unexpected end of file found"

> With the patch, it works perfectly.  The result can be seen at
> http://emacs-weblogger.blogspot.com/

In the code above, you set url-request-data to a multibyte
string.

All non-ascii characters in "Iñtërnâtiônàlizætiøn" are
iso-8859-1 and Emacs internally represents each character in
iso-8859-1 in 2-byte.  That means string-bytes on
url-request-data returns, by chance, the same byte length of
the result of encoding it by utf-8.

(string-bytes "Iñtërnâtiônàlizætiøn")
== (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 'utf-8))
== 27

That's why your change to url-http.el works for the above
case.  But, that is just coincidence.  If the string
contains, for instance, an Ethiopic character, it doesn't
work.

What I still don't know is what value url-request-data
should have?

If it should be an already encoded string (and make it
callers responsibility to pre-encode a string), just using
`length' as now is ok.  And you can use this kind of code:

<         (let ((url-debug t)) (setq url-request-data (encode-coding-string "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>blogger.newPost</methodName><params><param><value><string>0123456789ABCDEF</string></value></param><param><value><string>9380140</string></value></param><param><value><string>usrname</string></value></param><param><value><string>passwrd</string></value></param><param><value><string>Iñtërnâtiônàlizætiøn from emacs with patch</string></value></param><param><value><boolean>1</boolean></value></param></params></methodCall>" 'utf-8))

Please try it after cancelling your change.

If it should be a multibyte string, the correct way to
calculate Content-length: is to use this code:
  (length (encode-coding-string "Iñtërnâtiônàlizætiøn" 
                                url-request-coding-system))
with your patch for introducing url-request-coding-system.

Anyway, this change of yours:

> +	(set-process-coding-system connection
> +				   (detect-coding-string url-request-data t)
> +				   url-request-coding-system)

is bad as Stefan wrote.  The second arg must be `binary',
and we should decode the received data according to the
contents (perhaps by parsing the header and detecting what
charset is specified and falling back to Emacs' code
detection routine).

---
Kenichi Handa
handa@m17n.org

  reply	other threads:[~2005-06-16  7:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-10 15:41 encoding and content-length for url-http.el Mark A. Hershberger
2005-06-10 15:53 ` Mark A. Hershberger
2005-06-10 19:47 ` Stefan Monnier
2005-06-10 17:14   ` Mark A. Hershberger
2005-06-10 21:22     ` Stefan Monnier
2005-06-16  4:21       ` Mark A. Hershberger
2005-06-16  7:05         ` Kenichi Handa [this message]
2005-06-16 16:05           ` Mark A. Hershberger
2005-06-11 11:06 ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1DioRi-0006bF-00@etlken \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.