unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
@ 2016-07-31  8:26 Sho Takemori
  2016-07-31 14:31 ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Sho Takemori @ 2016-07-31  8:26 UTC (permalink / raw)
  To: 24117

[-- Attachment #1: Type: text/plain, Size: 25133 bytes --]

I got an error "error in process sentinel: url-http-create-request:
Multibyte text in HTTP request" when I visited a Python file which contains
a multibyte character with `anaconda-eldoc-mode' turned on.

At first, I thought this was a bug of anaconda-mode. So I opened an issue
in github (https://github.com/proofit404/anaconda-mode/issues/189).

I guess `(= (string-bytes request) (length request))` in
`url-http-create-request' should be `(= (string-bytes url-http-data)
(length url-http-data))`, because `(= (string-bytes request) (length
request))` may be `nil' even if `(= (string-bytes url-http-data) (length
url-http-data))` is `t'.

Sho Takemori

In GNU Emacs 25.1.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9)
 of 2016-07-26 built on HP-500-270jp
Repository revision: 0f0b191a5324115fe9e8c438eceef4043decf209
Windowing system distributor 'The X.Org Foundation', version 11.0.11803000
System Description: Ubuntu 16.04.1 LTS

Configured using:
 'configure --with-sound=no --with-modules'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK GPM DBUS GCONF GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LC_MONETARY: ja_JP.UTF-8
  value of $LC_NUMERIC: ja_JP.UTF-8
  value of $LC_TIME: ja_JP.UTF-8
  value of $LANG: ja_JP.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  diff-auto-refine-mode: t
  shell-dirtrack-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
MIME-Version: 1.0
Connection: keep-alive
Extension: Security/Digest Security/SSL
Host: 127.0.0.1:9001
Accept-encoding: gzip
Accept: */*
User-Agent: URL/Emacs
Content-length: 20552

{"jsonrpc":"2.0","id":1,"method":"eldoc","params":{"source":"# -*- coding:
utf-8 -*-\nimport itertools\nfrom itertools import groupby\n\nfrom sage.all
import mul\nfrom sage.arith.all import kronecker_symbol\nfrom
sage.functions.all import ceil, floor, sgn\nfrom sage.matrix.all import
(block_diagonal_matrix, block_matrix,\n
diagonal_matrix, identity_matrix, matrix)\nfrom sage.misc.all import
cached_function\nfrom sage.quadratic_forms.all import QuadraticForm,
least_quadratic_nonresidue\nfrom sage.rings.all import QQ, ZZ,
CyclotomicField, FiniteField, PolynomialRing\n\n\ndef
_index_of_gamma_0_gl_n(alpha, p):\n    ’’’\n    Returns delta(a1, ..., an)
defined in Shimura, Euler products and Eisenstein\n    series, pp 118,
(15.1.7).\n    ’’’\n    if p in ZZ:\n        p = ZZ(p)\n\n    def _bn(n):\n
       return mul(1 - p ** (-i) for i in xrange(1, n + 1))\n\n    e_r_ls =
[(k, len(list(v)))\n              for k, v in groupby(sorted(alpha), lambda
x: x)]\n    res = _bn(len(alpha)) / mul(_bn(r) for _, r in e_r_ls)\n    for
i, (ei, ri) in enumerate(e_r_ls):\n        for j, (ej, rj) in
enumerate(e_r_ls):\n            if i < j:\n                res *= p ** ((ej
- ei) * ri * rj)\n    return res\n\n\ndef _gl2_coset_gamma0(a, p):\n    w =
matrix([[0, -1],\n                [1, 0]])\n    for m12 in range(p ** a):\n
       yield matrix([[1, m12],\n                      [0, 1]])\n    for m21
in range(p ** (a - 1)):\n        m = matrix([[1, 0],\n
 [p * m21, 1]])\n        yield w * m\n\n\ndef _gl3_coset_gamma0(alpha,
p):\n    r’’’\n    Let alpha = [a0, a1, a2] with a0 <= a1 <= a2,\n    g =
diag([p^a0, p^a1, p^a2]), and Gamma0 = g^(-1) GL3(Z) g ∧ GL3(Z).\n
 Return a complete set Gamma0 \\ GL3(Z).\n    ’’’\n    if p in ZZ:\n
 p = ZZ(p)\n    a0, a1, a2 = alpha\n    if a0 < a1 < a2:\n        return
list(__gl3_coset_gamma0_distinct(a0, a1, a2, p))\n    elif a0 == a1 and a1
< a2:\n        return list(__gl3_coset_gamma0_2_1(a0, a2, p))\n    elif a0
< a1 and a1 == a2:\n        return list(__gl3_coset_gamma0_1_2(a0, a2,
p))\n    elif a0 == a1 == a2:\n        return [identity_matrix(ZZ, 3)]\n
 else:\n        raise ValueError\n\n\ndef __gl3_coset_gamma0_2_1(a1, a3,
p):\n    w23 = matrix([[1, 0, 0],\n                  [0, 0, 1],\n
       [0, 1, 0]])\n    for m13 in range(p ** (a3 - a1 - 1)):\n        for
m23 in range(p ** (a3 - a1 - 1)):\n            m = matrix([[1, 0, p *
m13],\n                        [0, 1, p * m23],\n
 [0, 0, 1]])\n            yield m\n\n    for m32 in range(p ** (a3 -
a1)):\n        m = matrix([[1, 0, 0],\n                    [0, 1, 0],\n
               [0, m32, 1]])\n        for g in _gl2_coset_gamma0(a3 - a1,
p):\n            n = block_diagonal_matrix(g, matrix([[1]]))\n
 yield w23 * m * n\n\n\ndef __gl3_coset_gamma0_1_2(a1, a2, p):\n    w12 =
matrix([[0, 1, 0],\n                  [1, 0, 0],\n                  [0, 0,
1]])\n\n    for m12 in range(p ** (a2 - a1 - 1)):\n        for m13 in
range(p ** (a2 - a1 - 1)):\n            m = matrix([[1, p * m12, p *
m13],\n                        [0, 1, 0],\n                        [0, 0,
1]])\n            yield m\n    for m21 in range(p ** (a2 - a1)):\n        m
= matrix([[1, 0, 0],\n                    [m21, 1, 0],\n
 [0, 0, 1]])\n        for g in _gl2_coset_gamma0(a2 - a1, p):\n
 n = block_diagonal_matrix(matrix([[1]]), g)\n            yield w12 * m *
n\n\n\ndef __gl3_coset_gamma0_distinct(a1, a2, a3, p):\n\n    w12 =
matrix([[0, 1, 0],\n                  [1, 0, 0],\n                  [0, 0,
1]])\n\n    w23 = matrix([[1, 0, 0],\n                  [0, 0, 1],\n
           [0, 1, 0]])\n\n    w13 = matrix([[0, 0, 1],\n
 [0, 1, 0],\n                  [1, 0, 0]])\n\n    w123 = matrix([[0, 1,
0],\n                   [0, 0, 1],\n                   [1, 0, 0]])\n\n
 w132 = matrix([[0, 0, 1],\n                   [1, 0, 0],\n
  [0, 1, 0]])\n\n    # w = 1\n    for m12 in range(p ** (a2 - a1 - 1)):\n
     for m13 in range(p ** (a3 - a1 - 1)):\n            for m23 in range(p
** (a3 - a2 - 1)):\n                yield matrix([[1, p * m12, p * m13],\n
                             [0, 1, p * m23],\n
 [0, 0, 1]])\n    # w = (12)\n    for m13 in range(p ** (a3 - a2 - 1)):\n
     for m21 in range(p ** (a2 - a1)):\n            for m23 in range(p **
(a3 - a1 - 1)):\n                m = matrix([[1, 0, p * m13],\n
               [m21, 1, p * m23],\n                            [0, 0,
1]])\n                yield w12 * m\n    # w = (23)\n    for m12 in range(p
** (a3 - a1 - 1)):\n        for m13 in range(p ** (a2 - a1 - 1)):\n
   for m32 in range(p ** (a3 - a2)):\n                m = matrix([[1, p *
m12, p * m13],\n                            [0, 1, 0],\n
         [0, m32, 1]])\n                yield w23 * m\n\n    # w = (13)\n
 for m21 in range(p ** (a3 - a2)):\n        for m31 in range(p ** (a3 -
a1)):\n            for m32 in range(p ** (a2 - a1)):\n                m =
matrix([[1, 0, 0],\n                            [m21, 1, 0],\n
               [m31, m32, 1]])\n                yield w13 * m\n\n    # w =
(123)\n    for m21 in range(p ** (a3 - a1)):\n        for m23 in range(p **
(a2 - a1 - 1)):\n            for m31 in range(p ** (a3 - a2)):\n
     m = matrix([[1, 0, 0],\n                            [m21, 1, p *
m23],\n                            [m31, 0, 1]])\n                yield
w123 * m\n    # w = (132)\n    for m12 in range(p ** (a3 - a2 - 1)):\n
   for m31 in range(p ** (a2 - a1)):\n            for m32 in range(p ** (a3
- a1)):\n                m = matrix([[1, p * m12, 0],\n
       [0, 1, 0],\n                            [m31, m32, 1]])\n
     yield w132 * m\n\n\nclass HalfIntMatElement(object):\n\n    def
__init__(self, T):\n        ’’’\n        :params T: half integral matrix of
size 3 or a list\n        ’’’\n        if isinstance(T, list):\n
 a, b, c, d, e, f = [ZZ(x) for x in T]\n            mat = matrix([[a, f /
2, e / 2],\n                          [f / 2, b, d / 2],\n
         [e / 2, d / 2, c]])\n        else:\n            mat = T\n
 self.__entries = tuple(mat.list())\n\n    def __eq__(self, other):\n
 if isinstance(other, HalfIntMatElement):\n            return
self.__entries == other.__entries\n        else:\n            raise
NotImplementedError\n\n    def __repr__(self):\n        return
self.T.__repr__()\n\n    def __hash__(self):\n        return
hash(self.__entries)\n\n    @property\n    def T(self):\n        return
matrix(3, self.__entries)\n\n    def right_action(self, g):\n        ’’’\n
       :param g: matrix of size n\n        return self[g] (Siegel’s
notation)\n        ’’’\n        S = g.transpose() * self.T * g\n
 return HalfIntMatElement(S)\n\n    def satisfy_cong_condition_tp(self, p,
alpha):\n        ’’’\n        Test if sum_{B mod D} exp(2pi T B D^(-1)) is
zero, where D = diag(p^a1, p^a2, a^a3),\n        a1, a2, a3 = alpha.\n
   ’’’\n        return (all(ZZ(self.T[i, i]) % p ** alpha[i] == 0 for i in
range(3)) and\n                all(ZZ(self.T[i, j] * 2) % p ** alpha[i] ==
0\n                    for i in range(3) for j in range(i + 1, 3)))\n\n
 def is_divisible_by(self, m):\n        ’’’\n        Test if self is
divisible by m\n        :param m: integer\n        ’’’\n        return
_half_int_mat_is_div_by(self.T, m)\n\n    def __floordiv__(self, other):\n
       S = matrix(QQ, 3)\n        for i in range(3):\n            S[i, i] =
ZZ(self.T[i, i]) // other\n        for i in range(3):\n            for j in
range(i + 1, 3):\n                S[i, j] = S[j, i] = (ZZ(self.T[i, j] * 2)
// other) / 2\n        return HalfIntMatElement(S)\n\n\ndef
alpha_list(dl):\n    ’’’\n    Return a list of (a0, a1, a2) with 0 <= a0 <=
a1 <= a2 <= dl\n    ’’’\n    return [(a0, a1, a2) for a0 in range(dl + 1)\n
           for a1 in range(a0, dl + 1) for a2 in range(a1, dl +
1)]\n\n\ndef tp_action_fourier_coeff(p, T, F):\n    ’’’\n    Return the Tth
Fourier coefficient of F|T(p), where F is a modular form.\n    :param p: a
prime number\n    :param T: a half integral matrix or an instance of
HalfIntMatElement\n    :param F: a dictionary or a Siegel modular form of
degree 3\n    ’’’\n    p = ZZ(p)\n    return
_action_fc_base(tp_action_fc_alist(p, T), F, T)\n\n\ndef
tp2_action_fourier_coeff(p, i, T, F):\n    ’’’\n    Similar to
tp_action_fourier_coeff for T_i(p^2).\n    ’’’\n    p = ZZ(p)\n    return
_action_fc_base(tp2_action_fc_alist(p, T, i), F, T)\n\n\ndef
_action_fc_base(ls, F, T):\n    if not isinstance(T, HalfIntMatElement):\n
       T = HalfIntMatElement(T)\n    res = 0\n    for s, a, g in ls:\n
   res = a * F[s].left_action(g) + res\n    return res\n\n\ndef
hecke_eigenvalue_tp(p, F, T=None):\n    ’’’\n    p, F, T: same as aruments
of tp_action_fourier_coeff.\n    Assuming F is an eigenform, return the
eigenvalue for T(p),\n    T is used for the computation of Fourier
coefficients.\n    If T is omitted, T will be set to\n    matrix([[1, 1/2,
1/2], [1/2, 1, 1/2], [1/2, 1/2, 1]]).\n    ’’’\n    return
_hecke_eigenvalue_base(lambda s: tp_action_fourier_coeff(p, s, F), F,
T=T)\n\n\ndef hecke_eigenvalue_tp2(p, i, F, T=None):\n    ’’’\n    Similar
to hecke_eigenvalue_tp for T(p^2).\n    ’’’\n    return
_hecke_eigenvalue_base(lambda s: tp2_action_fourier_coeff(p, i, s, F), F,
T=T)\n\n\ndef spinor_l_euler_factor(p, F, t=None, T=None):\n    ’’’\n    F:
a dict or Siegel modular form of degree 3.\n    Return a polynomial G(t) of
degree 8, s.t.\n    G(p^(-s))^(-1) is the p-Euler factor of the spinor L
function of F.\n    ’’’\n    p = ZZ(p)\n    if t is None:\n        t =
PolynomialRing(QQ, 1, names=’t’, order=\"neglex\").gens()[0]\n    c = {}\n
   tp = hecke_eigenvalue_tp(p, F, T=T)\n    tpp1, tpp2, tpp3 =
[hecke_eigenvalue_tp2(p, i, F, T=T) for i in [1, 2, 3]]\n    c[0] = ZZ(1)\n
   c[1] = tp\n    c[2] = p * (tpp1 + (p**2 + 1) * tpp2 + (p**2 + 1)**2 *
tpp3)\n    c[3] = p**3 * tp * (tpp2 + tpp3)\n    c[4] = p**6 * (tp**2 *
tpp3 + tpp2**2 - 2 * p * tpp1 * tpp3 -\n                   2 * (p - 1) *
tpp2 * tpp3 -\n                   (p**6 + 2 * p**5 + 2 * p**3 + 2 * p - 1)
* tpp3**2)\n    c[5] = p**6 * tpp3 * c[3]\n    c[6] = p**12 * tpp3 ** 2 *
c[2]\n    c[7] = p**18 * tpp3 ** 3 * c[1]\n    c[8] = p**24 * tpp3 ** 4\n
 return sum((-1)**k * v * t**k for k, v in c.items())\n\n\ndef
rankin_convolution_degree1(f, g, p, name=None):\n    u’’’\n    f, g:
primitive forms of degree 1 and level 1.\n    Return p-euler factor of the
Rankin convolution of f and g as\n    a polynomial.\n    ’’’\n    k1 =
f.weight()\n    k2 = g.weight()\n    ap = f[p]\n    bp = g[p]\n    t =
PolynomialRing(QQ, 1, names=’t’ if name is None else name,\n
        order=\"neglex\").gens()[0]\n    return (1 - ap * bp * t +\n
     (ap**2 * p**(k2 - 1) + bp**2 * p**(k1 - 1) - 2 * p**(k1 + k2 - 2)) *
t**2 -\n            ap * bp * p**(k1 + k2 - 2) * t**3 + p**(2 * (k1 + k2 -
2)) * t**4)\n\n\ndef _hecke_eigenvalue_base(fc_func, F, T=None):\n    if T
is None:\n        T = HalfIntMatElement(matrix([[ZZ(1), ZZ(1) / ZZ(2),
ZZ(1) / ZZ(2)],\n                                      [ZZ(1) / ZZ(2),
ZZ(1), ZZ(1) / ZZ(2)],\n                                      [ZZ(1) /
ZZ(2), ZZ(1) / ZZ(2), ZZ(1)]]))\n    if not isinstance(T,
HalfIntMatElement):\n        T = HalfIntMatElement(T)\n    v1 =
fc_func(T).vector\n    v = F[T].vector\n    if v == 0:\n        raise
ZeroDivisionError\n    else:\n        i = next(i for i in range(len(v)) if
v[i] != 0)\n        return v1[i] / v[i]\n\n\n@cached_function\ndef
tp_action_fc_alist(p, T):\n    ’’’\n    return a list of tuples (S, a, g)
s.t.\n    S: an instance of HalfIntMatElement\n    a: integer\n    g: 3 by
3 matrix s.t.\n    F|T(p) = sum(a rho(g) F[S] | (a, g, S)).\n    ’’’\n
 res1 = []\n    for alpha in alpha_list(1):\n        D = diagonal_matrix([p
** a for a in alpha])\n        for V in _gl3_coset_gamma0(alpha, p):\n
       M = D * V\n            S = T.right_action(M.transpose())\n
 if S.is_divisible_by(p):\n                S = S // p\n                if
S.satisfy_cong_condition_tp(p, alpha):\n                    # p**(-6) and p
in the third item are for normalization.\n
 res1.append(\n                        (S, p ** (-6) * mul(p ** alpha[i]
for i in range(3) for j in range(i, 3)),\n                         M **
(-1) * p))\n    return __convert_reduced_nonisom_matrices(res1)\n\n\ndef
__convert_reduced_nonisom_matrices(alst):\n    red_res = []\n    for s, a,
g in alst:\n        u = _minkowski_reduction_transform_matrix(s.T)\n
 t = s.right_action(u)\n        red_res.append((t, a, g * u.transpose() **
(-1)))\n\n    non_isoms = []\n\n    for s, a, g in red_res:\n        q =
QuadraticForm(ZZ, 2 * s.T)\n        u = None\n        for t, _, _ in
non_isoms:\n            q1 = QuadraticForm(ZZ, 2 * t.T)\n            if
q.det() == q1.det():\n                u = q.is_globally_equivalent_to(q1,
return_matrix=True)\n                if u and u.transpose() *
q.Gram_matrix_rational() * u == q1.Gram_matrix_rational():\n
     break\n        if u:\n            non_isoms.append((s.right_action(u),
a, g * u.transpose() ** (-1)))\n        else:\n
 non_isoms.append((s, a, g))\n    return non_isoms\n\n\n@cached_function\ndef
tp2_action_fc_alist(p, T, i):\n    ’’’\n    similar to tp_action_fc_alist
for T_i(p^2) for i = 0, 1, 2, 3.\n    ’’’\n    res1 = []\n\n    for alpha
in alpha_list(2):\n        D = diagonal_matrix([p ** a for a in alpha])\n
     for V in _gl3_coset_gamma0(alpha, p):\n            M = D * V\n
   S = T.right_action(M.transpose())\n            if S.is_divisible_by(p **
2):\n                S = S // (p ** 2)\n                res1.append((S, p
** (-12) * _expt_sum(S, p, alpha, i),\n                             M **
(-1) * p ** 2))\n\n    return __convert_reduced_nonisom_matrices([(a, b, c)
for a, b, c in res1 if b != 0])\n\n\ndef _nearest_integer(x):\n    r =
floor(x)\n    if x - r > 0.5:\n        return r + 1\n    else:\n
 return r\n\n\ndef _gaussian_reduction(b1, b2, S):\n    ’’’\n    b1, b2:
vectors of length 3\n    S: symmetric matrix of size 3\n    ’’’\n    while
True:\n        nb1 = b1 * S * b1\n        nb2 = b2 * S * b2\n        if nb2
< nb1:\n            b1, b2 = b2, b1\n        x = (b2 * S * b1) / (b1 * S *
b1)\n        r = _nearest_integer(x)\n        a = b2 - r * b1\n        if a
* S * a >= b2 * S * b2:\n            return (b1, b2)\n        else:\n
     b1, b2 = a, b1\n\n\ndef _sym_mat_gen(p, n):\n    if n == 1:\n
 for a in range(p):\n            yield matrix([[a]])\n    else:\n
 for s in _sym_mat_gen(p, n - 1):\n            ls = [range(p) for _ in
range(n)]\n            for a in itertools.product(*ls):\n                v
= matrix([a[:-1]])\n                yield block_matrix([[s, v.transpose()],
[v, matrix([[a[-1]]])]])\n\n\ndef _gen_gauss_sum_direct_way(N, p, r):\n
 res = 0\n    K = CyclotomicField(p)\n    zeta = K.gen()\n    for S in
_sym_mat_gen(p, N.ncols()):\n        if
S.change_ring(FiniteField(p)).rank() == r:\n            res += zeta ** ((N
* S).trace())\n    try:\n        return QQ(res)\n    except TypeError:\n
     return res\n\n\ndef _generalized_gauss_sum(N, p, r):\n    if r == 0:\n
       return 1\n    if p == 2:\n        return
_gen_gauss_sum_direct_way(N, p, r)\n    else:\n        N_mp =
N.change_ring(FiniteField(p))\n        d, _, v = N_mp.smith_form()\n
 t = d.rank()\n        N1 = (v.transpose() * N_mp *\n
 v).matrix_from_rows_and_columns(range(t), range(t))\n        eps =
kronecker_symbol(N1.det(), p)\n        return _gen_gauss_sum_non_dyadic(p,
eps, N.ncols(), t, r)\n\n\ndef _half_int_mat_is_div_by(S, m):\n    n =
S.ncols()\n    return (all(ZZ(S[i, i]) % m == 0 for i in range(n)) and\n
         all(ZZ(2 * S[i, j]) % m == 0 for i in range(n) for j in range(i +
1, n)))\n\n\n@cached_function\ndef _gen_gauss_sum_non_dyadic(p, eps, n, t,
r):\n    ’’’\n    cf. H. Saito, a generalization of Gauss sums\n    ’’’\n\n
   def parenthesis_prod(a, b, m):\n        if m == 0:\n            return
1\n        else:\n            return mul(1 - a * b ** i for i in
range(m))\n\n    if (n - t) % 2 == 0:\n        m = (n - t) // 2\n
 else:\n        m = (n - t + 1) // 2\n\n    if n == r:\n        if n % 2 ==
1:\n            return ((-1) ** ((n - 2 * m + 1) // 2) * p ** ((n ** 2 + (2
* m) ** 2 - 1) // 4) *\n                    parenthesis_prod(p ** (-1), p
** (-2), m))\n        elif n % 2 == t % 2 == 0:\n            return
((-kronecker_symbol(-1, p)) ** ((n - 2 * m) // 2) *\n
 eps * p ** ((n ** 2 + (2 * m + 1) ** 2 - 1) // 4) *\n
 parenthesis_prod(p ** (-1), p ** (-2), m))\n        else:\n
 return 0\n    else:\n        diag = [1 for _ in range(t)]\n        if eps
== -1:\n            diag[-1] = least_quadratic_nonresidue(p)\n        diag
= diag + [0 for _ in range(n - t)]\n        N =
diagonal_matrix(diag).change_ring(FiniteField(p))\n        return
_gen_gauss_sum_direct_way(N, p, r)\n\n\ndef _expt_sum(S, p, alpha, i):\n
 ’’’\n    Return the exponential sum in Miyawaki’s paper, where alpha[-1]
<= 2, for T_i(p^2).\n    ’’’\n    a, b, c = [alpha.count(_i) for _i in
range(3)]\n    S33 = S.T.matrix_from_rows_and_columns(range(a + b, 3),
range(a + b, 3))\n    S22 = S.T.matrix_from_rows_and_columns(range(a, a +
b), range(a, a + b))\n    S32 = S.T.matrix_from_rows_and_columns(range(a +
b, 3), range(a))\n\n    if c > 0 and not _half_int_mat_is_div_by(S33, p **
2):\n        return 0\n    if c > 0 and b > 0 and any(x % p != 0 for x in
(S32 * ZZ(2)).change_ring(ZZ).list()):\n        return 0\n\n    if b == 0
and a + c == 3 - i:\n        return p ** (c * (c + 1))\n    elif b == 0:\n
       return 0\n    else:\n        return p ** (c * (c + 1)) * p ** (b *
c) * _generalized_gauss_sum(S22, p, b - i)\n\n\ndef
_minkowski_reduction(b1, b2, b3, S):\n\n    def inner_prod(x, y):\n
 return x * S * y\n\n    while True:\n        b1, b2, b3 = sorted([b1, b2,
b3], key=lambda b: b * S * b)\n\n        b1, b2 = _gaussian_reduction(b1,
b2, S)\n\n        b11 = inner_prod(b1, b1)\n        b12 = inner_prod(b1,
b2)\n        b13 = inner_prod(b1, b3)\n        b22 = inner_prod(b2, b2)\n
     b23 = inner_prod(b2, b3)\n        b33 = inner_prod(b3, b3)\n\n
 y1 = - (b13 / b11 - b12 * b23 / (b11 * b22)) / \\\n            (1 - b12 **
2 / (b11 * b22))\n        y2 = - (b23 / b22 - b12 * b13 / (b11 * b22)) /
\\\n            (1 - b12 ** 2 / (b11 * b22))\n\n        # Find integers x1,
x2 so that norm(b3 + x2 * b2 + x1 * b1) is minimal.\n        a_norms_alst =
[]\n\n        for x1 in [floor(y1), ceil(y1)]:\n            for x2 in
[floor(y2), ceil(y2)]:\n                a = b3 + x2 * b2 + x1 * b1\n
         a_norms_alst.append((x1, x2, a, inner_prod(a, a)))\n
 _inner_prod_a = min(x[-1] for x in a_norms_alst)\n        x1, x2, a, _ =
next(x for x in a_norms_alst if x[-1] == _inner_prod_a)\n\n        if
_inner_prod_a >= b33:\n            # Change sings of b1, b2, b3 and
terminate the alogrithm\n            sngs = [sgn(b12), sgn(b13),
sgn(b23)]\n            bs = [b1, b2, b3]\n            try:\n
 # If b12, b13 or b23 is zero, change sgns of b1, b2, b3 so that\n
       # b12, b13, b23 >= 0.\n                zero_i = sngs.index(0)\n
           set_ls = [set([1, 2]), set([1, 3]), set([2, 3])]\n
 t = set_ls[zero_i]\n                _other = [x for x in [1, 2, 3] if x
not in t][0]\n                for x in t:\n                    i =
set_ls.index(set([x, _other]))\n                    if sngs[i] < 0:\n
                 bs[x - 1] *= -1\n                b1, b2, b3 = bs\n
   except ValueError:\n                # Else change sgns so that b12, b13
> 0\n                if b12 < 0:\n                    b2 = -b2\n
     if b13 < 0:\n                    b3 = -b3\n            return (b1, b2,
b3)\n        else:\n            b3 = a\n\n\ndef
_minkowski_reduction_transform_matrix(S):\n    ’’’\n    Return a unimodular
matrix u such that u^t * S * u is reduced in Minkowski’s sense.\n    ’’’\n
   b1, b2, b3 = identity_matrix(QQ, 3).columns()\n    c1, c2, c3 =
_minkowski_reduction(b1, b2, b3, S)\n    return matrix([c1, c2,
c3]).transpose()\n","line":10,"column":0,"path":"/home/sho/work/sage_packages/e8theta_degree3/hecke_module.py"}}
Quit [2 times]

Load-path shadows:
/home/sho/.emacs.d/elpa/helm-20160723.2238/helm-multi-match hides
/home/sho/.emacs.d/elpa/helm-core-20160723.944/helm-multi-match
/home/sho/.emacs.d/elpa/scala-mode-20160519.731/ob-scala hides
/usr/local/share/emacs/25.1/lisp/org/ob-scala
/home/sho/.emacs.d/elpa/seq-2.16/seq hides
/usr/local/share/emacs/25.1/lisp/emacs-lisp/seq

Features:
(shadow sort mail-extr emacsbug message dired rfc822 mml mml-sec epg
mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
mail-utils network-stream nsm starttls url-cache url-http tls gnutls
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums url-gw url-auth
anaconda-mode pythonic f url url-proxy url-privacy url-expand
url-methods url-history url-cookie url-domsuf url-util mailcap vc-git
diff-mode easy-mmode python tramp-sh tramp tramp-compat tramp-loaddefs
trampver shell pcomplete format-spec comint ring ansi-color finder-inf
tex-site advice edmacro kmacro gh-common gh-profile url-parse
auth-source gnus-util mm-util help-fns mail-prsvr password-cache
url-vars s ucs-normalize marshal eieio-compat cl-seq json map dash eieio
eieio-core cl-macs go-mode-autoloads rx info package epg-config seq
byte-opt gv bytecomp byte-compile cl-extra help-mode easymenu cconv
cl-loaddefs pcase cl-lib time-date mule-util japan-util tooltip eldoc
electric uniquify ediff-hook vc-hooks lisp-float-type mwheel x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core frame cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help
simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces
cus-face macroexp files text-properties overlay sha1 md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
dbusbind inotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 279045 5729)
 (symbols 48 28763 0)
 (miscs 40 591 109)
 (strings 32 47201 10304)
 (string-bytes 1 1610049)
 (vectors 16 45065)
 (vector-slots 8 885950 3678)
 (floats 8 389 220)
 (intervals 56 445 0)
 (buffers 976 23)
 (heap 1024 44595 2062))

[-- Attachment #2: Type: text/html, Size: 29275 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-07-31  8:26 bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Sho Takemori
@ 2016-07-31 14:31 ` Eli Zaretskii
  2016-07-31 23:21   ` Sho Takemori
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-07-31 14:31 UTC (permalink / raw)
  To: Sho Takemori; +Cc: 24117

> From: Sho Takemori <stakemorii@gmail.com>
> Date: Sun, 31 Jul 2016 17:26:37 +0900
> 
> I got an error "error in process sentinel: url-http-create-request: Multibyte text in HTTP request" when I visited a
> Python file which contains a multibyte character with `anaconda-eldoc-mode' turned on.

That file name should have been encoded by the time it is passed to
url-http.el, so the problem should not have happened, because encoded
strings are unibyte strings.

> At first, I thought this was a bug of anaconda-mode. So I opened an issue in github
> (https://github.com/proofit404/anaconda-mode/issues/189).
> 
> I guess `(= (string-bytes request) (length request))` in `url-http-create-request' should be `(= (string-bytes
> url-http-data) (length url-http-data))`, because `(= (string-bytes request) (length request))` may be `nil' even if
> `(= (string-bytes url-http-data) (length url-http-data))` is `t'.

I don't think I agree in general: all the strings that are used by
url-http-create-request should be unibyte strings.  if they all are
unibyte strings, then I think the situation you describe should not
happen.  However, you didn't provide enough details to analyze the
situation, so perhaps I'm missing something.  Could you please show
all the details, specifically, what were the values of the various
variables used by url-http-create-request to generate the request?
For each value that is a string, please also tell whether it's a
unibyte or a multibyte string.

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-07-31 14:31 ` Eli Zaretskii
@ 2016-07-31 23:21   ` Sho Takemori
  2016-08-01 13:17     ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Sho Takemori @ 2016-07-31 23:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24117

[-- Attachment #1: Type: text/plain, Size: 23129 bytes --]

It seems that anaconda-mode use two global variables (url-request-method
and url-request-data)
to make generate the request.

https://github.com/proofit404/anaconda-mode/blob/master/anaconda-mode.el#L349

url-request-method is bound to an ASCII string "POST".
In my situation, url-request-data is bound to a unibyte string as below.

"{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"eldoc\",\"params\":{\"source\":\"#
-*- coding: utf-8 -*-\\nimport itertools\\nfrom itertools import
groupby\\n\\nfrom sage.all import mul\\nfrom sage.arith.all import
kronecker_symbol\\nfrom sage.functions.all import ceil, floor, sgn\\nfrom
sage.matrix.all import (block_diagonal_matrix, block_matrix,\\n
                diagonal_matrix, identity_matrix, matrix)\\nfrom
sage.misc.all import cached_function\\nfrom sage.quadratic_forms.all import
QuadraticForm, least_quadratic_nonresidue\\nfrom sage.rings.all import QQ,
ZZ, CyclotomicField, FiniteField, PolynomialRing\\n\\n\\ndef
_index_of_gamma_0_gl_n(alpha, p):\\n    '''\\n    Returns delta(a1, ...,
an) defined in Shimura, Euler products and Eisenstein\\n    series, pp 118,
(15.1.7).\\n    '''\\n    if p in ZZ:\\n        p = ZZ(p)\\n\\n    def
_bn(n):\\n        return mul(1 - p ** (-i) for i in xrange(1, n + 1))\\n\\n
   e_r_ls = [(k, len(list(v)))\\n              for k, v in
groupby(sorted(alpha), lambda x: x)]\\n    res = _bn(len(alpha)) /
mul(_bn(r) for _, r in e_r_ls)\\n    for i, (ei, ri) in
enumerate(e_r_ls):\\n        for j, (ej, rj) in enumerate(e_r_ls):\\n
     if i < j:\\n                res *= p ** ((ej - ei) * ri * rj)\\n
 return res\\n\\n\\ndef _gl2_coset_gamma0(a, p):\\n    w = matrix([[0,
-1],\\n                [1, 0]])\\n    for m12 in range(p ** a):\\n
 yield matrix([[1, m12],\\n                      [0, 1]])\\n    for m21 in
range(p ** (a - 1)):\\n        m = matrix([[1, 0],\\n                    [p
* m21, 1]])\\n        yield w * m\\n\\n\\ndef _gl3_coset_gamma0(alpha,
p):\\n    r'''\\n    Let alpha = [a0, a1, a2] with a0 <= a1 <= a2,\\n    g
= diag([p^a0, p^a1, p^a2]), and Gamma0 = g^(-1) GL3(Z) g \342\210\247
GL3(Z).\\n    Return a complete set Gamma0 \\\\ GL3(Z).\\n    '''\\n    if
p in ZZ:\\n        p = ZZ(p)\\n    a0, a1, a2 = alpha\\n    if a0 < a1 <
a2:\\n        return list(__gl3_coset_gamma0_distinct(a0, a1, a2, p))\\n
 elif a0 == a1 and a1 < a2:\\n        return
list(__gl3_coset_gamma0_2_1(a0, a2, p))\\n    elif a0 < a1 and a1 == a2:\\n
       return list(__gl3_coset_gamma0_1_2(a0, a2, p))\\n    elif a0 == a1
== a2:\\n        return [identity_matrix(ZZ, 3)]\\n    else:\\n
 raise ValueError\\n\\n\\ndef __gl3_coset_gamma0_2_1(a1, a3, p):\\n    w23
= matrix([[1, 0, 0],\\n                  [0, 0, 1],\\n                  [0,
1, 0]])\\n    for m13 in range(p ** (a3 - a1 - 1)):\\n        for m23 in
range(p ** (a3 - a1 - 1)):\\n            m = matrix([[1, 0, p * m13],\\n
                     [0, 1, p * m23],\\n                        [0, 0,
1]])\\n            yield m\\n\\n    for m32 in range(p ** (a3 - a1)):\\n
     m = matrix([[1, 0, 0],\\n                    [0, 1, 0],\\n
       [0, m32, 1]])\\n        for g in _gl2_coset_gamma0(a3 - a1, p):\\n
         n = block_diagonal_matrix(g, matrix([[1]]))\\n            yield
w23 * m * n\\n\\n\\ndef __gl3_coset_gamma0_1_2(a1, a2, p):\\n    w12 =
matrix([[0, 1, 0],\\n                  [1, 0, 0],\\n                  [0,
0, 1]])\\n\\n    for m12 in range(p ** (a2 - a1 - 1)):\\n        for m13 in
range(p ** (a2 - a1 - 1)):\\n            m = matrix([[1, p * m12, p *
m13],\\n                        [0, 1, 0],\\n                        [0, 0,
1]])\\n            yield m\\n    for m21 in range(p ** (a2 - a1)):\\n
 m = matrix([[1, 0, 0],\\n                    [m21, 1, 0],\\n
     [0, 0, 1]])\\n        for g in _gl2_coset_gamma0(a2 - a1, p):\\n
     n = block_diagonal_matrix(matrix([[1]]), g)\\n            yield w12 *
m * n\\n\\n\\ndef __gl3_coset_gamma0_distinct(a1, a2, a3, p):\\n\\n    w12
= matrix([[0, 1, 0],\\n                  [1, 0, 0],\\n                  [0,
0, 1]])\\n\\n    w23 = matrix([[1, 0, 0],\\n                  [0, 0, 1],\\n
                 [0, 1, 0]])\\n\\n    w13 = matrix([[0, 0, 1],\\n
       [0, 1, 0],\\n                  [1, 0, 0]])\\n\\n    w123 =
matrix([[0, 1, 0],\\n                   [0, 0, 1],\\n                   [1,
0, 0]])\\n\\n    w132 = matrix([[0, 0, 1],\\n                   [1, 0,
0],\\n                   [0, 1, 0]])\\n\\n    # w = 1\\n    for m12 in
range(p ** (a2 - a1 - 1)):\\n        for m13 in range(p ** (a3 - a1 -
1)):\\n            for m23 in range(p ** (a3 - a2 - 1)):\\n
 yield matrix([[1, p * m12, p * m13],\\n                              [0,
1, p * m23],\\n                              [0, 0, 1]])\\n    # w =
(12)\\n    for m13 in range(p ** (a3 - a2 - 1)):\\n        for m21 in
range(p ** (a2 - a1)):\\n            for m23 in range(p ** (a3 - a1 -
1)):\\n                m = matrix([[1, 0, p * m13],\\n
       [m21, 1, p * m23],\\n                            [0, 0, 1]])\\n
           yield w12 * m\\n    # w = (23)\\n    for m12 in range(p ** (a3 -
a1 - 1)):\\n        for m13 in range(p ** (a2 - a1 - 1)):\\n            for
m32 in range(p ** (a3 - a2)):\\n                m = matrix([[1, p * m12, p
* m13],\\n                            [0, 1, 0],\\n
   [0, m32, 1]])\\n                yield w23 * m\\n\\n    # w = (13)\\n
 for m21 in range(p ** (a3 - a2)):\\n        for m31 in range(p ** (a3 -
a1)):\\n            for m32 in range(p ** (a2 - a1)):\\n                m =
matrix([[1, 0, 0],\\n                            [m21, 1, 0],\\n
                 [m31, m32, 1]])\\n                yield w13 * m\\n\\n    #
w = (123)\\n    for m21 in range(p ** (a3 - a1)):\\n        for m23 in
range(p ** (a2 - a1 - 1)):\\n            for m31 in range(p ** (a3 -
a2)):\\n                m = matrix([[1, 0, 0],\\n
 [m21, 1, p * m23],\\n                            [m31, 0, 1]])\\n
       yield w123 * m\\n    # w = (132)\\n    for m12 in range(p ** (a3 -
a2 - 1)):\\n        for m31 in range(p ** (a2 - a1)):\\n            for m32
in range(p ** (a3 - a1)):\\n                m = matrix([[1, p * m12, 0],\\n
                           [0, 1, 0],\\n                            [m31,
m32, 1]])\\n                yield w132 * m\\n\\n\\nclass
HalfIntMatElement(object):\\n\\n    def __init__(self, T):\\n        '''\\n
       :params T: half integral matrix of size 3 or a list\\n        '''\\n
       if isinstance(T, list):\\n            a, b, c, d, e, f = [ZZ(x) for
x in T]\\n            mat = matrix([[a, f / 2, e / 2],\\n
       [f / 2, b, d / 2],\\n                          [e / 2, d / 2,
c]])\\n        else:\\n            mat = T\\n        self.__entries =
tuple(mat.list())\\n\\n    def __eq__(self, other):\\n        if
isinstance(other, HalfIntMatElement):\\n            return self.__entries
== other.__entries\\n        else:\\n            raise
NotImplementedError\\n\\n    def __repr__(self):\\n        return
self.T.__repr__()\\n\\n    def __hash__(self):\\n        return
hash(self.__entries)\\n\\n    @property\\n    def T(self):\\n        return
matrix(3, self.__entries)\\n\\n    def right_action(self, g):\\n
 '''\\n        :param g: matrix of size n\\n        return self[g]
(Siegel's notation)\\n        '''\\n        S = g.transpose() * self.T *
g\\n        return HalfIntMatElement(S)\\n\\n    def
satisfy_cong_condition_tp(self, p, alpha):\\n        '''\\n        Test if
sum_{B mod D} exp(2pi T B D^(-1)) is zero, where D = diag(p^a1, p^a2,
a^a3),\\n        a1, a2, a3 = alpha.\\n        '''\\n        return
(all(ZZ(self.T[i, i]) % p ** alpha[i] == 0 for i in range(3)) and\\n
         all(ZZ(self.T[i, j] * 2) % p ** alpha[i] == 0\\n
 for i in range(3) for j in range(i + 1, 3)))\\n\\n    def
is_divisible_by(self, m):\\n        '''\\n        Test if self is divisible
by m\\n        :param m: integer\\n        '''\\n        return
_half_int_mat_is_div_by(self.T, m)\\n\\n    def __floordiv__(self,
other):\\n        S = matrix(QQ, 3)\\n        for i in range(3):\\n
   S[i, i] = ZZ(self.T[i, i]) // other\\n        for i in range(3):\\n
       for j in range(i + 1, 3):\\n                S[i, j] = S[j, i] =
(ZZ(self.T[i, j] * 2) // other) / 2\\n        return
HalfIntMatElement(S)\\n\\n\\ndef alpha_list(dl):\\n    '''\\n    Return a
list of (a0, a1, a2) with 0 <= a0 <= a1 <= a2 <= dl\\n    '''\\n    return
[(a0, a1, a2) for a0 in range(dl + 1)\\n            for a1 in range(a0, dl
+ 1) for a2 in range(a1, dl + 1)]\\n\\n\\ndef tp_action_fourier_coeff(p, T,
F):\\n    '''\\n    Return the Tth Fourier coefficient of F|T(p), where F
is a modular form.\\n    :param p: a prime number\\n    :param T: a half
integral matrix or an instance of HalfIntMatElement\\n    :param F: a
dictionary or a Siegel modular form of degree 3\\n    '''\\n    p =
ZZ(p)\\n    return _action_fc_base(tp_action_fc_alist(p, T), F,
T)\\n\\n\\ndef tp2_action_fourier_coeff(p, i, T, F):\\n    '''\\n
 Similar to tp_action_fourier_coeff for T_i(p^2).\\n    '''\\n    p =
ZZ(p)\\n    return _action_fc_base(tp2_action_fc_alist(p, T, i), F,
T)\\n\\n\\ndef _action_fc_base(ls, F, T):\\n    if not isinstance(T,
HalfIntMatElement):\\n        T = HalfIntMatElement(T)\\n    res = 0\\n
 for s, a, g in ls:\\n        res = a * F[s].left_action(g) + res\\n
 return res\\n\\n\\ndef hecke_eigenvalue_tp(p, F, T=None):\\n    '''\\n
 p, F, T: same as aruments of tp_action_fourier_coeff.\\n    Assuming F is
an eigenform, return the eigenvalue for T(p),\\n    T is used for the
computation of Fourier coefficients.\\n    If T is omitted, T will be set
to\\n    matrix([[1, 1/2, 1/2], [1/2, 1, 1/2], [1/2, 1/2, 1]]).\\n
 '''\\n    return _hecke_eigenvalue_base(lambda s:
tp_action_fourier_coeff(p, s, F), F, T=T)\\n\\n\\ndef
hecke_eigenvalue_tp2(p, i, F, T=None):\\n    '''\\n    Similar to
hecke_eigenvalue_tp for T(p^2).\\n    '''\\n    return
_hecke_eigenvalue_base(lambda s: tp2_action_fourier_coeff(p, i, s, F), F,
T=T)\\n\\n\\ndef spinor_l_euler_factor(p, F, t=None, T=None):\\n    '''\\n
   F: a dict or Siegel modular form of degree 3.\\n    Return a polynomial
G(t) of degree 8, s.t.\\n    G(p^(-s))^(-1) is the p-Euler factor of the
spinor L function of F.\\n    '''\\n    p = ZZ(p)\\n    if t is None:\\n
     t = PolynomialRing(QQ, 1, names='t',
order=\\\"neglex\\\").gens()[0]\\n    c = {}\\n    tp =
hecke_eigenvalue_tp(p, F, T=T)\\n    tpp1, tpp2, tpp3 =
[hecke_eigenvalue_tp2(p, i, F, T=T) for i in [1, 2, 3]]\\n    c[0] =
ZZ(1)\\n    c[1] = tp\\n    c[2] = p * (tpp1 + (p**2 + 1) * tpp2 + (p**2 +
1)**2 * tpp3)\\n    c[3] = p**3 * tp * (tpp2 + tpp3)\\n    c[4] = p**6 *
(tp**2 * tpp3 + tpp2**2 - 2 * p * tpp1 * tpp3 -\\n                   2 * (p
- 1) * tpp2 * tpp3 -\\n                   (p**6 + 2 * p**5 + 2 * p**3 + 2 *
p - 1) * tpp3**2)\\n    c[5] = p**6 * tpp3 * c[3]\\n    c[6] = p**12 * tpp3
** 2 * c[2]\\n    c[7] = p**18 * tpp3 ** 3 * c[1]\\n    c[8] = p**24 * tpp3
** 4\\n    return sum((-1)**k * v * t**k for k, v in c.items())\\n\\n\\ndef
rankin_convolution_degree1(f, g, p, name=None):\\n    u'''\\n    f, g:
primitive forms of degree 1 and level 1.\\n    Return p-euler factor of the
Rankin convolution of f and g as\\n    a polynomial.\\n    '''\\n    k1 =
f.weight()\\n    k2 = g.weight()\\n    ap = f[p]\\n    bp = g[p]\\n    t =
PolynomialRing(QQ, 1, names='t' if name is None else name,\\n
        order=\\\"neglex\\\").gens()[0]\\n    return (1 - ap * bp * t +\\n
           (ap**2 * p**(k2 - 1) + bp**2 * p**(k1 - 1) - 2 * p**(k1 + k2 -
2)) * t**2 -\\n            ap * bp * p**(k1 + k2 - 2) * t**3 + p**(2 * (k1
+ k2 - 2)) * t**4)\\n\\n\\ndef _hecke_eigenvalue_base(fc_func, F,
T=None):\\n    if T is None:\\n        T =
HalfIntMatElement(matrix([[ZZ(1), ZZ(1) / ZZ(2), ZZ(1) / ZZ(2)],\\n
                             [ZZ(1) / ZZ(2), ZZ(1), ZZ(1) / ZZ(2)],\\n
                                 [ZZ(1) / ZZ(2), ZZ(1) / ZZ(2),
ZZ(1)]]))\\n    if not isinstance(T, HalfIntMatElement):\\n        T =
HalfIntMatElement(T)\\n    v1 = fc_func(T).vector\\n    v = F[T].vector\\n
   if v == 0:\\n        raise ZeroDivisionError\\n    else:\\n        i =
next(i for i in range(len(v)) if v[i] != 0)\\n        return v1[i] /
v[i]\\n\\n\\n@cached_function\\ndef tp_action_fc_alist(p, T):\\n    '''\\n
   return a list of tuples (S, a, g) s.t.\\n    S: an instance of
HalfIntMatElement\\n    a: integer\\n    g: 3 by 3 matrix s.t.\\n    F|T(p)
= sum(a rho(g) F[S] | (a, g, S)).\\n    '''\\n    res1 = []\\n    for alpha
in alpha_list(1):\\n        D = diagonal_matrix([p ** a for a in alpha])\\n
       for V in _gl3_coset_gamma0(alpha, p):\\n            M = D * V\\n
       S = T.right_action(M.transpose())\\n            if
S.is_divisible_by(p):\\n                S = S // p\\n                if
S.satisfy_cong_condition_tp(p, alpha):\\n                    # p**(-6) and
p in the third item are for normalization.\\n
 res1.append(\\n                        (S, p ** (-6) * mul(p ** alpha[i]
for i in range(3) for j in range(i, 3)),\\n                         M **
(-1) * p))\\n    return
__convert_reduced_nonisom_matrices(res1)\\n\\n\\ndef
__convert_reduced_nonisom_matrices(alst):\\n    red_res = []\\n    for s,
a, g in alst:\\n        u = _minkowski_reduction_transform_matrix(s.T)\\n
     t = s.right_action(u)\\n        red_res.append((t, a, g *
u.transpose() ** (-1)))\\n\\n    non_isoms = []\\n\\n    for s, a, g in
red_res:\\n        q = QuadraticForm(ZZ, 2 * s.T)\\n        u = None\\n
   for t, _, _ in non_isoms:\\n            q1 = QuadraticForm(ZZ, 2 *
t.T)\\n            if q.det() == q1.det():\\n                u =
q.is_globally_equivalent_to(q1, return_matrix=True)\\n                if u
and u.transpose() * q.Gram_matrix_rational() * u ==
q1.Gram_matrix_rational():\\n                    break\\n        if u:\\n
         non_isoms.append((s.right_action(u), a, g * u.transpose() **
(-1)))\\n        else:\\n            non_isoms.append((s, a, g))\\n
 return non_isoms\\n\\n\\n@cached_function\\ndef tp2_action_fc_alist(p, T,
i):\\n    '''\\n    similar to tp_action_fc_alist for T_i(p^2) for i = 0,
1, 2, 3.\\n    '''\\n    res1 = []\\n\\n    for alpha in alpha_list(2):\\n
       D = diagonal_matrix([p ** a for a in alpha])\\n        for V in
_gl3_coset_gamma0(alpha, p):\\n            M = D * V\\n            S =
T.right_action(M.transpose())\\n            if S.is_divisible_by(p **
2):\\n                S = S // (p ** 2)\\n                res1.append((S, p
** (-12) * _expt_sum(S, p, alpha, i),\\n                             M **
(-1) * p ** 2))\\n\\n    return __convert_reduced_nonisom_matrices([(a, b,
c) for a, b, c in res1 if b != 0])\\n\\n\\ndef _nearest_integer(x):\\n    r
= floor(x)\\n    if x - r > 0.5:\\n        return r + 1\\n    else:\\n
   return r\\n\\n\\ndef _gaussian_reduction(b1, b2, S):\\n    '''\\n    b1,
b2: vectors of length 3\\n    S: symmetric matrix of size 3\\n    '''\\n
 while True:\\n        nb1 = b1 * S * b1\\n        nb2 = b2 * S * b2\\n
   if nb2 < nb1:\\n            b1, b2 = b2, b1\\n        x = (b2 * S * b1)
/ (b1 * S * b1)\\n        r = _nearest_integer(x)\\n        a = b2 - r *
b1\\n        if a * S * a >= b2 * S * b2:\\n            return (b1, b2)\\n
       else:\\n            b1, b2 = a, b1\\n\\n\\ndef _sym_mat_gen(p,
n):\\n    if n == 1:\\n        for a in range(p):\\n            yield
matrix([[a]])\\n    else:\\n        for s in _sym_mat_gen(p, n - 1):\\n
       ls = [range(p) for _ in range(n)]\\n            for a in
itertools.product(*ls):\\n                v = matrix([a[:-1]])\\n
     yield block_matrix([[s, v.transpose()], [v,
matrix([[a[-1]]])]])\\n\\n\\ndef _gen_gauss_sum_direct_way(N, p, r):\\n
 res = 0\\n    K = CyclotomicField(p)\\n    zeta = K.gen()\\n    for S in
_sym_mat_gen(p, N.ncols()):\\n        if
S.change_ring(FiniteField(p)).rank() == r:\\n            res += zeta ** ((N
* S).trace())\\n    try:\\n        return QQ(res)\\n    except
TypeError:\\n        return res\\n\\n\\ndef _generalized_gauss_sum(N, p,
r):\\n    if r == 0:\\n        return 1\\n    if p == 2:\\n        return
_gen_gauss_sum_direct_way(N, p, r)\\n    else:\\n        N_mp =
N.change_ring(FiniteField(p))\\n        d, _, v = N_mp.smith_form()\\n
   t = d.rank()\\n        N1 = (v.transpose() * N_mp *\\n
 v).matrix_from_rows_and_columns(range(t), range(t))\\n        eps =
kronecker_symbol(N1.det(), p)\\n        return _gen_gauss_sum_non_dyadic(p,
eps, N.ncols(), t, r)\\n\\n\\ndef _half_int_mat_is_div_by(S, m):\\n    n =
S.ncols()\\n    return (all(ZZ(S[i, i]) % m == 0 for i in range(n)) and\\n
           all(ZZ(2 * S[i, j]) % m == 0 for i in range(n) for j in range(i
+ 1, n)))\\n\\n\\n@cached_function\\ndef _gen_gauss_sum_non_dyadic(p, eps,
n, t, r):\\n    '''\\n    cf. H. Saito, a generalization of Gauss sums\\n
 '''\\n\\n    def parenthesis_prod(a, b, m):\\n        if m == 0:\\n
     return 1\\n        else:\\n            return mul(1 - a * b ** i for i
in range(m))\\n\\n    if (n - t) % 2 == 0:\\n        m = (n - t) // 2\\n
 else:\\n        m = (n - t + 1) // 2\\n\\n    if n == r:\\n        if n %
2 == 1:\\n            return ((-1) ** ((n - 2 * m + 1) // 2) * p ** ((n **
2 + (2 * m) ** 2 - 1) // 4) *\\n                    parenthesis_prod(p **
(-1), p ** (-2), m))\\n        elif n % 2 == t % 2 == 0:\\n
 return ((-kronecker_symbol(-1, p)) ** ((n - 2 * m) // 2) *\\n
       eps * p ** ((n ** 2 + (2 * m + 1) ** 2 - 1) // 4) *\\n
     parenthesis_prod(p ** (-1), p ** (-2), m))\\n        else:\\n
   return 0\\n    else:\\n        diag = [1 for _ in range(t)]\\n        if
eps == -1:\\n            diag[-1] = least_quadratic_nonresidue(p)\\n
 diag = diag + [0 for _ in range(n - t)]\\n        N =
diagonal_matrix(diag).change_ring(FiniteField(p))\\n        return
_gen_gauss_sum_direct_way(N, p, r)\\n\\n\\ndef _expt_sum(S, p, alpha,
i):\\n    '''\\n    Return the exponential sum in Miyawaki's paper, where
alpha[-1] <= 2, for T_i(p^2).\\n    '''\\n    a, b, c = [alpha.count(_i)
for _i in range(3)]\\n    S33 = S.T.matrix_from_rows_and_columns(range(a +
b, 3), range(a + b, 3))\\n    S22 =
S.T.matrix_from_rows_and_columns(range(a, a + b), range(a, a + b))\\n
 S32 = S.T.matrix_from_rows_and_columns(range(a + b, 3), range(a))\\n\\n
 if c > 0 and not _half_int_mat_is_div_by(S33, p ** 2):\\n        return
0\\n    if c > 0 and b > 0 and any(x % p != 0 for x in (S32 *
ZZ(2)).change_ring(ZZ).list()):\\n        return 0\\n\\n    if b == 0 and a
+ c == 3 - i:\\n        return p ** (c * (c + 1))\\n    elif b == 0:\\n
   return 0\\n    else:\\n        return p ** (c * (c + 1)) * p ** (b * c)
* _generalized_gauss_sum(S22, p, b - i)\\n\\n\\ndef
_minkowski_reduction(b1, b2, b3, S):\\n\\n    def inner_prod(x, y):\\n
   return x * S * y\\n\\n    while True:\\n        b1, b2, b3 = sorted([b1,
b2, b3], key=lambda b: b * S * b)\\n\\n        b1, b2 =
_gaussian_reduction(b1, b2, S)\\n\\n        b11 = inner_prod(b1, b1)\\n
   b12 = inner_prod(b1, b2)\\n        b13 = inner_prod(b1, b3)\\n
 b22 = inner_prod(b2, b2)\\n        b23 = inner_prod(b2, b3)\\n        b33
= inner_prod(b3, b3)\\n\\n        y1 = - (b13 / b11 - b12 * b23 / (b11 *
b22)) / \\\\\\n            (1 - b12 ** 2 / (b11 * b22))\\n        y2 = -
(b23 / b22 - b12 * b13 / (b11 * b22)) / \\\\\\n            (1 - b12 ** 2 /
(b11 * b22))\\n\\n        # Find integers x1, x2 so that norm(b3 + x2 * b2
+ x1 * b1) is minimal.\\n        a_norms_alst = []\\n\\n        for x1 in
[floor(y1), ceil(y1)]:\\n            for x2 in [floor(y2), ceil(y2)]:\\n
             a = b3 + x2 * b2 + x1 * b1\\n
 a_norms_alst.append((x1, x2, a, inner_prod(a, a)))\\n        _inner_prod_a
= min(x[-1] for x in a_norms_alst)\\n        x1, x2, a, _ = next(x for x in
a_norms_alst if x[-1] == _inner_prod_a)\\n\\n        if _inner_prod_a >=
b33:\\n            # Change sings of b1, b2, b3 and terminate the
alogrithm\\n            sngs = [sgn(b12), sgn(b13), sgn(b23)]\\n
 bs = [b1, b2, b3]\\n            try:\\n                # If b12, b13 or
b23 is zero, change sgns of b1, b2, b3 so that\\n                # b12,
b13, b23 >= 0.\\n                zero_i = sngs.index(0)\\n
 set_ls = [set([1, 2]), set([1, 3]), set([2, 3])]\\n                t =
set_ls[zero_i]\\n                _other = [x for x in [1, 2, 3] if x not in
t][0]\\n                for x in t:\\n                    i =
set_ls.index(set([x, _other]))\\n                    if sngs[i] < 0:\\n
                   bs[x - 1] *= -1\\n                b1, b2, b3 = bs\\n
       except ValueError:\\n                # Else change sgns so that b12,
b13 > 0\\n                if b12 < 0:\\n                    b2 = -b2\\n
           if b13 < 0:\\n                    b3 = -b3\\n            return
(b1, b2, b3)\\n        else:\\n            b3 = a\\n\\n\\ndef
_minkowski_reduction_transform_matrix(S):\\n    '''\\n    Return a
unimodular matrix u such that u^t * S * u is reduced in Minkowski's
sense.\\n    '''\\n    b1, b2, b3 = identity_matrix(QQ, 3).columns()\\n
 c1, c2, c3 = _minkowski_reduction(b1, b2, b3, S)\\n    return matrix([c1,
c2,
c3]).transpose()\\n\",\"line\":52,\"column\":41,\"path\":\"/home/sho/work/sage_packages/e8theta_degree3/hecke_module.py\"}}"

The file contains a multibyte string "∧" and anaconda-mode converts it to
"\342\210\247".

Sho Takemori


2016-07-31 23:31 GMT+09:00 Eli Zaretskii <eliz@gnu.org>:

> > From: Sho Takemori <stakemorii@gmail.com>
> > Date: Sun, 31 Jul 2016 17:26:37 +0900
> >
> > I got an error "error in process sentinel: url-http-create-request:
> Multibyte text in HTTP request" when I visited a
> > Python file which contains a multibyte character with
> `anaconda-eldoc-mode' turned on.
>
> That file name should have been encoded by the time it is passed to
> url-http.el, so the problem should not have happened, because encoded
> strings are unibyte strings.
>
> > At first, I thought this was a bug of anaconda-mode. So I opened an
> issue in github
> > (https://github.com/proofit404/anaconda-mode/issues/189).
> >
> > I guess `(= (string-bytes request) (length request))` in
> `url-http-create-request' should be `(= (string-bytes
> > url-http-data) (length url-http-data))`, because `(= (string-bytes
> request) (length request))` may be `nil' even if
> > `(= (string-bytes url-http-data) (length url-http-data))` is `t'.
>
> I don't think I agree in general: all the strings that are used by
> url-http-create-request should be unibyte strings.  if they all are
> unibyte strings, then I think the situation you describe should not
> happen.  However, you didn't provide enough details to analyze the
> situation, so perhaps I'm missing something.  Could you please show
> all the details, specifically, what were the values of the various
> variables used by url-http-create-request to generate the request?
> For each value that is a string, please also tell whether it's a
> unibyte or a multibyte string.
>
> Thanks.
>

[-- Attachment #2: Type: text/html, Size: 27122 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-07-31 23:21   ` Sho Takemori
@ 2016-08-01 13:17     ` Eli Zaretskii
  2016-08-02  0:52       ` Dmitry Gutov
  2016-08-02  3:26       ` Sho Takemori
  0 siblings, 2 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-01 13:17 UTC (permalink / raw)
  To: Sho Takemori; +Cc: 24117

> From: Sho Takemori <stakemorii@gmail.com>
> Date: Mon, 1 Aug 2016 08:21:39 +0900
> Cc: 24117@debbugs.gnu.org
> 
> It seems that anaconda-mode use two global variables (url-request-method and url-request-data)
> to make generate the request.
> 
> https://github.com/proofit404/anaconda-mode/blob/master/anaconda-mode.el#L349
> 
> url-request-method is bound to an ASCII string "POST".
> In my situation, url-request-data is bound to a unibyte string as below.

I don't see any non-ASCII characters in that string.  So how come it
causes the error message?

> The file contains a multibyte string "∧"

I don't see this character in the string you show.

> and anaconda-mode converts it to "\342\210\247".

Which is a correct UTF-8 encoding of that character, and should
produce a unibyte string.

To summarize, I still don't understand how come the error happened.
Could you perhaps step with Edebug into url-http-create-request, and
see what is going on there?  Or come up with a reproducible recipe of
calling url-http-create-request that I could examine on my machine?

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-01 13:17     ` Eli Zaretskii
@ 2016-08-02  0:52       ` Dmitry Gutov
  2016-08-02 15:25         ` Eli Zaretskii
  2016-08-02  3:26       ` Sho Takemori
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-02  0:52 UTC (permalink / raw)
  To: Eli Zaretskii, Sho Takemori; +Cc: 24117

On 08/01/2016 04:17 PM, Eli Zaretskii wrote:

> To summarize, I still don't understand how come the error happened.
> Could you perhaps step with Edebug into url-http-create-request, and
> see what is going on there?  Or come up with a reproducible recipe of
> calling url-http-create-request that I could examine on my machine?

Here's the essence of the problem:

(length (concat (encode-coding-string "фыва" 'utf-8) 
(string-as-multibyte "abc")))

=> 11

(string-bytes (concat (encode-coding-string "фыва" 'utf-8) 
(string-as-multibyte "abc")))

=> 19

And

(multibyte-string-p (url-host (url-generic-parse-url "http://127.0.0.1")))

=> t

Apparently, url-generic-parse-url creates a multibyte string for the 
host name because it performs its parsing in a buffer. And 
url-http-create-request uses the return value of (url-host 
url-http-target-url) to set the Location header. And all of that gets 
concatenated in the request.

Some possible solutions:

- Perform the "string-bytes = length" verification only for 
url-http-data, not the the whole request string. This strikes me as 
ugly, but apparently we've been living with using a multibyte string 
here for a while.

- Call url-encode-url on the return value of (url-host 
url-http-target-url), and hope that no similar problem pops up with any 
of the related variables. This does solve the immediate problem with 
anaconda-mode, I've checked.

- Something else?





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-01 13:17     ` Eli Zaretskii
  2016-08-02  0:52       ` Dmitry Gutov
@ 2016-08-02  3:26       ` Sho Takemori
  1 sibling, 0 replies; 62+ messages in thread
From: Sho Takemori @ 2016-08-02  3:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24117

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

> I don't see this character in the string you show.

It does not contain "∧" but it contains "\342\210\247".
That example was unnecessarily big. I should have provided a minimal one.

Sho Takemori


2016-08-01 22:17 GMT+09:00 Eli Zaretskii <eliz@gnu.org>:

> > From: Sho Takemori <stakemorii@gmail.com>
> > Date: Mon, 1 Aug 2016 08:21:39 +0900
> > Cc: 24117@debbugs.gnu.org
> >
> > It seems that anaconda-mode use two global variables (url-request-method
> and url-request-data)
> > to make generate the request.
> >
> >
> https://github.com/proofit404/anaconda-mode/blob/master/anaconda-mode.el#L349
> >
> > url-request-method is bound to an ASCII string "POST".
> > In my situation, url-request-data is bound to a unibyte string as below.
>
> I don't see any non-ASCII characters in that string.  So how come it
> causes the error message?
>
> > The file contains a multibyte string "∧"
>
> I don't see this character in the string you show.
>
> > and anaconda-mode converts it to "\342\210\247".
>
> Which is a correct UTF-8 encoding of that character, and should
> produce a unibyte string.
>
> To summarize, I still don't understand how come the error happened.
> Could you perhaps step with Edebug into url-http-create-request, and
> see what is going on there?  Or come up with a reproducible recipe of
> calling url-http-create-request that I could examine on my machine?
>
> Thanks.
>

[-- Attachment #2: Type: text/html, Size: 2252 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-02  0:52       ` Dmitry Gutov
@ 2016-08-02 15:25         ` Eli Zaretskii
  2016-08-03  2:39           ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-02 15:25 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

> Cc: 24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 2 Aug 2016 03:52:25 +0300
> 
> (length (concat (encode-coding-string "фыва" 'utf-8) 
> (string-as-multibyte "abc")))
> 
> => 11
> 
> (string-bytes (concat (encode-coding-string "фыва" 'utf-8) 
> (string-as-multibyte "abc")))
> 
> => 19
> 
> And
> 
> (multibyte-string-p (url-host (url-generic-parse-url "http://127.0.0.1")))
> 
> => t
> 
> Apparently, url-generic-parse-url creates a multibyte string for the 
> host name because it performs its parsing in a buffer. And 
> url-http-create-request uses the return value of (url-host 
> url-http-target-url) to set the Location header. And all of that gets 
> concatenated in the request.

Thanks for spelling this out.

> Some possible solutions:
> 
> - Perform the "string-bytes = length" verification only for 
> url-http-data, not the the whole request string. This strikes me as 
> ugly, but apparently we've been living with using a multibyte string 
> here for a while.
> 
> - Call url-encode-url on the return value of (url-host 
> url-http-target-url), and hope that no similar problem pops up with any 
> of the related variables. This does solve the immediate problem with 
> anaconda-mode, I've checked.
> 
> - Something else?

How about making the temporary buffer parsed by url-generic-parse-url
a unibyte buffer?  Does that fix the problem?  AFAIU, RFC 3986 doesn't
allow non-ASCII characters, so we should be okay handling that in a
unibyte buffer, right?  I mean something like this:

    (with-temp-buffer
      ;; Don't let those temp-buffer modifications accidentally
      ;; deactivate the mark of the current-buffer.
      (let ((deactivate-mark nil))
        (set-syntax-table url-parse-syntax-table)
	(erase-buffer)
	(set-buffer-multibyte nil)   ;; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
	(insert url)
	(goto-char (point-min))
	...

As for other possible problems like that, are there any that could be
expected already?  If so, we could try fixing them now.
Alternatively, we could just wait for them to come up; after all,
catching those was the main rationale for introducing the length test,
right?

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-02 15:25         ` Eli Zaretskii
@ 2016-08-03  2:39           ` Dmitry Gutov
  2016-08-04 17:02             ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-03  2:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, Lars Magne Ingebrigtsen, 24117

On 08/02/2016 06:25 PM, Eli Zaretskii wrote:

> How about making the temporary buffer parsed by url-generic-parse-url
> a unibyte buffer?  Does that fix the problem?

It does fix anaconda-mode, yes.

> AFAIU, RFC 3986 doesn't
> allow non-ASCII characters, so we should be okay handling that in a
> unibyte buffer, right?

I don't really know. RFC 3986 or not, I suppose in practice the url 
could be quoted before or after it's parsed. And url-parse-tests.el 
doesn't specify this case.

Lars, what do you think?

> I mean something like this:
>
>     (with-temp-buffer
>       ;; Don't let those temp-buffer modifications accidentally
>       ;; deactivate the mark of the current-buffer.
>       (let ((deactivate-mark nil))
>         (set-syntax-table url-parse-syntax-table)
> 	(erase-buffer)
> 	(set-buffer-multibyte nil)   ;; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 	(insert url)
> 	(goto-char (point-min))
> 	...

Heh, that's exactly where I added the line, without looking at your code.

> As for other possible problems like that, are there any that could be
> expected already?  If so, we could try fixing them now.

Nothing else jumps out so far. The function depends on quite a few 
global variables. To be really certain, we'd have to trace how all of 
them are created, and for all that are not directly bound by the user, 
the chains of calls that produce them.

> Alternatively, we could just wait for them to come up;

I'm worried about having a problem crop up in some significant use case 
after we release 25.1. That doesn't feel very probable, but still.

> after all,
> catching those was the main rationale for introducing the length test,
> right?

The most important part was to make sure that the length of the body in 
bytes is equal to the value of the Content-Length header (the difference 
caused actual problems).

But then we decided to make the check wider and test that the whole 
request string is unibyte-ish. Which made sense, but seems to be working 
out less well than we expected.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-03  2:39           ` Dmitry Gutov
@ 2016-08-04 17:02             ` Eli Zaretskii
  2016-08-08  1:56               ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-04 17:02 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, 24117

> Cc: stakemorii@gmail.com, 24117@debbugs.gnu.org,
>  Lars Magne Ingebrigtsen <larsi@gnus.org>
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 3 Aug 2016 05:39:31 +0300
> 
> On 08/02/2016 06:25 PM, Eli Zaretskii wrote:
> 
> > How about making the temporary buffer parsed by url-generic-parse-url
> > a unibyte buffer?  Does that fix the problem?
> 
> It does fix anaconda-mode, yes.

Hmm, but url-generic-parse-url is called from gazillion other places,
so maybe this is not safe.

> > AFAIU, RFC 3986 doesn't
> > allow non-ASCII characters, so we should be okay handling that in a
> > unibyte buffer, right?
> 
> I don't really know. RFC 3986 or not, I suppose in practice the url 
> could be quoted before or after it's parsed. And url-parse-tests.el 
> doesn't specify this case.

No, I meant that since RFC 3986 doesn't allow non-ASCII characters,
and url-generic-parse-url doesn't do anything about that, it is either
already broken for non-ASCII characters, or already copes with them.
So we don't need to worry about that.

However, a safer change would be to do something like this:

   (or (not (multibyte-string-p url-http-target-url))
       (setq url-http-target-url
             (decode-coding-string url-http-target-url 'utf-8)))

in url-http-create-request.  Can you try this?

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-04 17:02             ` Eli Zaretskii
@ 2016-08-08  1:56               ` Dmitry Gutov
  2016-08-08 13:32                 ` Ted Zlatanov
                                   ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-08  1:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, 24117

Hi Eli,

On 08/04/2016 08:02 PM, Eli Zaretskii wrote:

> Hmm, but url-generic-parse-url is called from gazillion other places,
> so maybe this is not safe.

Only about 40 places, all of them either in lisp/url or lisp/gnus. 
Sadly, Lars is being silent on the matter.

It might not be 100% safe, but maybe doing TRT could be enough.

> No, I meant that since RFC 3986 doesn't allow non-ASCII characters,

Indeed.

> and url-generic-parse-url doesn't do anything about that, it is either
> already broken for non-ASCII characters, or already copes with them.
> So we don't need to worry about that.

I imagined that some code that uses the return value of 
url-http-create-request might perform the escaping. But that doesn't 
seem to be the case, see below.

> However, a safer change would be to do something like this:
>
>    (or (not (multibyte-string-p url-http-target-url))
>        (setq url-http-target-url
>              (decode-coding-string url-http-target-url 'utf-8)))
>
> in url-http-create-request.  Can you try this?

I'll try it if you insist, but that choice of encoding seems rather 
arbitrary. I think we should go with your previous suggestion: make the 
URL parsing buffer unibyte.

But we do try to handle non-ASCII URLs on the level above 
url-generic-parse-url. See url-retrieve-internal: one of the first 
things it does is (setq url (url-encode-url url)). And only after that, 
(setq url (url-generic-parse-url url)).

The URL package doesn't seem to support international domains anyway. 
This fails:

(url-retrieve-synchronously "http://банки.рф")

However, the error it fails with is a bit more comprehensible if the URL 
parsing buffer is unibyte:

Debugger entered--Lisp error: (error "банки.рф/80 Name or service not 
known")

Instead of:

Debugger entered--Lisp error: (error 
"\301\220\300\261\301\220\300\260\301\220\300\275\301\220\300\272\301\220\300\270.\301\221\300\200\301\221\300\204/80 
Name or service not known")





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08  1:56               ` Dmitry Gutov
@ 2016-08-08 13:32                 ` Ted Zlatanov
  2016-08-08 23:48                   ` Katsumi Yamaoka
  2016-08-08 15:33                 ` Eli Zaretskii
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Ted Zlatanov @ 2016-08-08 13:32 UTC (permalink / raw)
  To: Dmitry Gutov, Katsumi Yamaoka; +Cc: stakemorii, 24117, larsi

On Mon, 8 Aug 2016 04:56:58 +0300 Dmitry Gutov <dgutov@yandex.ru> wrote: 

DG> Hi Eli,
DG> On 08/04/2016 08:02 PM, Eli Zaretskii wrote:

>> Hmm, but url-generic-parse-url is called from gazillion other places,
>> so maybe this is not safe.

DG> Only about 40 places, all of them either in lisp/url or lisp/gnus. Sadly, Lars
DG> is being silent on the matter.

DG> It might not be 100% safe, but maybe doing TRT could be enough.

Lars tends to work in batches (from experience) so waiting on him can
take a while. Looking at the discussion, I think the change is OK. I've
CC-ed Katsumi Yamaoka since he may have some feedback as well.

If this is pushed, since it's a fairly low-level change, it should
include a test for the specific issue it fixes (I didn't see that in the
discussion so far). Then if we need to tune the code further, we'll have
something to keep us from creating a regression.

Thanks
Ted





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08  1:56               ` Dmitry Gutov
  2016-08-08 13:32                 ` Ted Zlatanov
@ 2016-08-08 15:33                 ` Eli Zaretskii
  2016-08-08 15:52                 ` Lars Ingebrigtsen
  2016-08-08 15:54                 ` Lars Ingebrigtsen
  3 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 15:33 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, 24117

> Cc: stakemorii@gmail.com, larsi@gnus.org, 24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 8 Aug 2016 04:56:58 +0300
> 
> >    (or (not (multibyte-string-p url-http-target-url))
> >        (setq url-http-target-url
> >              (decode-coding-string url-http-target-url 'utf-8)))
> >
> > in url-http-create-request.  Can you try this?
> 
> I'll try it if you insist, but that choice of encoding seems rather 
> arbitrary. I think we should go with your previous suggestion: make the 
> URL parsing buffer unibyte.

OK, let's go with that.  Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08  1:56               ` Dmitry Gutov
  2016-08-08 13:32                 ` Ted Zlatanov
  2016-08-08 15:33                 ` Eli Zaretskii
@ 2016-08-08 15:52                 ` Lars Ingebrigtsen
  2016-08-08 15:54                 ` Lars Ingebrigtsen
  3 siblings, 0 replies; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 15:52 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

>> Hmm, but url-generic-parse-url is called from gazillion other places,
>> so maybe this is not safe.
>
> Only about 40 places, all of them either in lisp/url or
> lisp/gnus. Sadly, Lars is being silent on the matter.

It's also called in ffap, eww and newst, according to grep.  :-)

> The URL package doesn't seem to support international domains
> anyway. This fails:
>
> (url-retrieve-synchronously "http://банки.рф")

I think that's a bug...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08  1:56               ` Dmitry Gutov
                                   ` (2 preceding siblings ...)
  2016-08-08 15:52                 ` Lars Ingebrigtsen
@ 2016-08-08 15:54                 ` Lars Ingebrigtsen
  2016-08-08 16:14                   ` Eli Zaretskii
  2016-08-08 19:46                   ` Dmitry Gutov
  3 siblings, 2 replies; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 15:54 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

> I'll try it if you insist, but that choice of encoding seems rather
> arbitrary. I think we should go with your previous suggestion: make
> the URL parsing buffer unibyte.

It's be sad if

(url-generic-parse-url "http://góogle.com/fóo")

stopped working.

This function is Emacs' workhorse for chopping up URLs, and it's a very
useful function.  There's a bunch of hand-rolled URL parsing
implementations that don't quite work right.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 15:54                 ` Lars Ingebrigtsen
@ 2016-08-08 16:14                   ` Eli Zaretskii
  2016-08-08 16:18                     ` Lars Ingebrigtsen
  2016-08-08 16:21                     ` Lars Ingebrigtsen
  2016-08-08 19:46                   ` Dmitry Gutov
  1 sibling, 2 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 16:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org
> Date: Mon, 08 Aug 2016 17:54:33 +0200
> 
> It's be sad if
> 
> (url-generic-parse-url "http://góogle.com/fóo")
> 
> stopped working.

It's already broken, because that function does nothing special for
non-ASCII characters, although the corresponding RFC says they are not
allowed.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:14                   ` Eli Zaretskii
@ 2016-08-08 16:18                     ` Lars Ingebrigtsen
  2016-08-08 16:33                       ` Eli Zaretskii
  2016-08-08 16:21                     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 16:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, 24117, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> It's be sad if
>> 
>> (url-generic-parse-url "http://góogle.com/fóo")
>> 
>> stopped working.
>
> It's already broken, because that function does nothing special for
> non-ASCII characters, although the corresponding RFC says they are not
> allowed.

No, it does what users expect it to.

(url-generic-parse-url "http://góogle.com/fóo")
=> [cl-struct-url "http" nil nil "góogle.com" nil "/fóo" nil nil t nil t]

"Aha, so 'góogle.com' is the domain name."

That you have to encode the data returned before doing network stuff
(which is what the RFC talks about) is a completely different matter.

The function is used to decompose URLs found in nature.  Those URLs
contain non-ASCII characters.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:14                   ` Eli Zaretskii
  2016-08-08 16:18                     ` Lars Ingebrigtsen
@ 2016-08-08 16:21                     ` Lars Ingebrigtsen
  2016-08-08 16:33                       ` Eli Zaretskii
  1 sibling, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 16:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, 24117, dgutov

(And it is perfectly valid for the domain bits of URLs to contain
non-ASCII characters after the IDNA changeover in the RFCs.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:18                     ` Lars Ingebrigtsen
@ 2016-08-08 16:33                       ` Eli Zaretskii
  2016-08-08 17:11                         ` Andreas Schwab
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 16:33 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: dgutov@yandex.ru,  stakemorii@gmail.com,  24117@debbugs.gnu.org
> Date: Mon, 08 Aug 2016 18:18:18 +0200
> 
> That you have to encode the data returned before doing network stuff
> (which is what the RFC talks about) is a completely different matter.

But no one does.  So evidently, leaving this to applications just
makes us buggy.

> The function is used to decompose URLs found in nature.  Those URLs
> contain non-ASCII characters.

URLs found in nature are all unibyte strings.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:21                     ` Lars Ingebrigtsen
@ 2016-08-08 16:33                       ` Eli Zaretskii
  2016-08-08 16:58                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 16:33 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: dgutov@yandex.ru,  stakemorii@gmail.com,  24117@debbugs.gnu.org
> Date: Mon, 08 Aug 2016 18:21:37 +0200
> 
> (And it is perfectly valid for the domain bits of URLs to contain
> non-ASCII characters after the IDNA changeover in the RFCs.)

They must be unibyte.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:33                       ` Eli Zaretskii
@ 2016-08-08 16:58                         ` Lars Ingebrigtsen
  2016-08-08 17:11                           ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, 24117, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> (And it is perfectly valid for the domain bits of URLs to contain
>> non-ASCII characters after the IDNA changeover in the RFCs.)
>
> They must be unibyte.

I have no idea what you mean.

If we have an <a href="http://góogle.com/foo"> instance, we have to
decompose the URL into the domain part (góogle.com) and the local part
(/foo), and then connect to the domain part (after IDNA encoding) and
issue "GET /foo".  If the decomposition function barfs on the URL, then
we can't make the connection.

Anyway, I won't have much time to carry on bickering here, so do
whatever you want to break the way Emacs handles URLs.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:33                       ` Eli Zaretskii
@ 2016-08-08 17:11                         ` Andreas Schwab
  2016-08-08 17:30                           ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Andreas Schwab @ 2016-08-08 17:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, Lars Ingebrigtsen, 24117, dgutov

On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:

> URLs found in nature are all unibyte strings.

http://góogle.com/fóo isn't unibyte.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 16:58                         ` Lars Ingebrigtsen
@ 2016-08-08 17:11                           ` Eli Zaretskii
  0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 17:11 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: dgutov@yandex.ru,  stakemorii@gmail.com,  24117@debbugs.gnu.org
> Date: Mon, 08 Aug 2016 18:58:23 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> (And it is perfectly valid for the domain bits of URLs to contain
> >> non-ASCII characters after the IDNA changeover in the RFCs.)
> >
> > They must be unibyte.
> 
> I have no idea what you mean.

What I said: the URL strings must be unibyte strings.  Then they will
still work in url-generic-parse-url, and the problems which started
this bug report won't happen.

> If we have an <a href="http://góogle.com/foo"> instance, we have to
> decompose the URL into the domain part (góogle.com) and the local part
> (/foo), and then connect to the domain part (after IDNA encoding) and
> issue "GET /foo".

You can do all that with unibyte strings.

> If the decomposition function barfs on the URL, then we can't make
> the connection.

No one intends to make that function barf.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 17:11                         ` Andreas Schwab
@ 2016-08-08 17:30                           ` Eli Zaretskii
  2016-08-08 19:16                             ` Andreas Schwab
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-08 17:30 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: stakemorii, larsi, 24117, dgutov

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> Date: Mon, 08 Aug 2016 19:11:41 +0200
> 
> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > URLs found in nature are all unibyte strings.
> 
> http://góogle.com/fóo isn't unibyte.

Every string outside of Emacs is unibyte.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 17:30                           ` Eli Zaretskii
@ 2016-08-08 19:16                             ` Andreas Schwab
  2016-08-09  2:32                               ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Andreas Schwab @ 2016-08-08 19:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, 24117, dgutov

On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org,  dgutov@yandex.ru
>> Date: Mon, 08 Aug 2016 19:11:41 +0200
>> 
>> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> > URLs found in nature are all unibyte strings.
>> 
>> http://góogle.com/fóo isn't unibyte.
>
> Every string outside of Emacs is unibyte.

This string is inside Emacs.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 15:54                 ` Lars Ingebrigtsen
  2016-08-08 16:14                   ` Eli Zaretskii
@ 2016-08-08 19:46                   ` Dmitry Gutov
  2016-08-08 20:19                     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-08 19:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117

On 08/08/2016 06:54 PM, Lars Ingebrigtsen wrote:

> It's be sad if
>
> (url-generic-parse-url "http://góogle.com/fóo")
>
> stopped working.

Why don't you pass the url string through `url-encode-url' first?

url-retrieve-internal does that.

> This function is Emacs' workhorse for chopping up URLs, and it's a very
> useful function.  There's a bunch of hand-rolled URL parsing
> implementations that don't quite work right.






^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 19:46                   ` Dmitry Gutov
@ 2016-08-08 20:19                     ` Lars Ingebrigtsen
  2016-08-08 20:35                       ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 20:19 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

>> (url-generic-parse-url "http://góogle.com/fóo")
>>
>> stopped working.
>
> Why don't you pass the url string through `url-encode-url' first?
>
> url-retrieve-internal does that.

(url-encode-url "http://góogle.com/fóo")
=> "http://góogle.com/f%C3%B3o"

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 20:19                     ` Lars Ingebrigtsen
@ 2016-08-08 20:35                       ` Dmitry Gutov
  2016-08-08 20:36                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-08 20:35 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117

On 08/08/2016 11:19 PM, Lars Ingebrigtsen wrote:

> (url-encode-url "http://góogle.com/fóo")
> => "http://góogle.com/f%C3%B3o"

That is true in master, but not in emacs-25, AFAICS.

(Is that related to your work on punycode?)

On emacs-25, it returns "http://g%C3%B3ogle.com/f%C3%B3o".





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 20:35                       ` Dmitry Gutov
@ 2016-08-08 20:36                         ` Lars Ingebrigtsen
  2016-08-09  2:13                           ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-08 20:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 08/08/2016 11:19 PM, Lars Ingebrigtsen wrote:
>
>> (url-encode-url "http://góogle.com/fóo")
>> => "http://góogle.com/f%C3%B3o"
>
> That is true in master, but not in emacs-25, AFAICS.
>
> (Is that related to your work on punycode?)

Might be; I can't recall, though.

> On emacs-25, it returns "http://g%C3%B3ogle.com/f%C3%B3o".

Which is, of course, completely wrong.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 13:32                 ` Ted Zlatanov
@ 2016-08-08 23:48                   ` Katsumi Yamaoka
  0 siblings, 0 replies; 62+ messages in thread
From: Katsumi Yamaoka @ 2016-08-08 23:48 UTC (permalink / raw)
  To: dgutov; +Cc: stakemorii, larsi, 24117

On Mon, 08 Aug 2016 09:32:44 -0400, Ted Zlatanov wrote:
> CC-ed Katsumi Yamaoka since he may have some feedback as well.

Well, I'm not familiar with url-*.el but there seems to be no
function to encode a url containing non-ASCII characters such as:
"http://банки.рф/"
In emacs-w3m[1] there are some encoder functions:

;; An encoder used for a full url string:
(w3m-url-transfer-encode-string "http://банки.рф/")
 => "http://xn--80abwho.xn--p1ai/"

;; An encoder used for a string that is a part of url:
(concat "https://ja.wikipedia.org/wiki/"
	(w3m-url-encode-string "日本語ドメイン名"))
 => "https://ja.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%83%89%E3%83%A1%E3%82%A4%E3%83%B3%E5%90%8D"

Those two encoded urls will work.

[1]
,----
| % cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot login
| CVS password: # No password is set.  Just hit Enter/Return key.
| % cvs -d :pserver:anonymous@cvs.namazu.org:/storage/cvsroot co emacs-w3m
`----





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 20:36                         ` Lars Ingebrigtsen
@ 2016-08-09  2:13                           ` Dmitry Gutov
  2016-08-09  9:39                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-09  2:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117

On 08/08/2016 11:36 PM, Lars Ingebrigtsen wrote:

>>> (url-encode-url "http://góogle.com/fóo")
>>> => "http://góogle.com/f%C3%B3o"
>>
>> That is true in master, but not in emacs-25, AFAICS.
>>
>> (Is that related to your work on punycode?)
>
> Might be; I can't recall, though.

Here's another question: why does url-encode-url pass the argument 
through encode-coding-string before passing it to url-generic-parse-url, 
if the latter is expected to be able to deal with non-ASCII characters?

The only recent change in that function is your commit 8b61c22e dated 
last December, which very much looks like a band-aid in this context.

>> On emacs-25, it returns "http://g%C3%B3ogle.com/f%C3%B3o".
>
> Which is, of course, completely wrong.

I see.

Since you're better versed in this area than me, can you propose a 
specific fix for the currently discussed bug? It is more serious than 
not being able to use unicode in URLs.

On master, the domain part, which is untouched by url-encode-url, is 
converted to an ASCII unibyte string with puny-encode-domain, inside 
url-http-create-request. But real-fname remains a multibyte string, 
triggering the problem anyway.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-08 19:16                             ` Andreas Schwab
@ 2016-08-09  2:32                               ` Eli Zaretskii
  2016-08-09  8:05                                 ` Andreas Schwab
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-09  2:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: stakemorii, larsi, 24117, dgutov

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: stakemorii@gmail.com,  larsi@gnus.org,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> Date: Mon, 08 Aug 2016 21:16:21 +0200
> 
> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> >> From: Andreas Schwab <schwab@linux-m68k.org>
> >> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> >> Date: Mon, 08 Aug 2016 19:11:41 +0200
> >> 
> >> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> >> 
> >> > URLs found in nature are all unibyte strings.
> >> 
> >> http://góogle.com/fóo isn't unibyte.
> >
> > Every string outside of Emacs is unibyte.
> 
> This string is inside Emacs.

Then it should be made unibyte before parsing it.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-09  2:32                               ` Eli Zaretskii
@ 2016-08-09  8:05                                 ` Andreas Schwab
  2016-08-09 14:50                                   ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Andreas Schwab @ 2016-08-09  8:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, 24117, dgutov

On Di, Aug 09 2016, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: stakemorii@gmail.com,  larsi@gnus.org,  24117@debbugs.gnu.org,  dgutov@yandex.ru
>> Date: Mon, 08 Aug 2016 21:16:21 +0200
>> 
>> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> >> From: Andreas Schwab <schwab@linux-m68k.org>
>> >> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org,  dgutov@yandex.ru
>> >> Date: Mon, 08 Aug 2016 19:11:41 +0200
>> >> 
>> >> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
>> >> 
>> >> > URLs found in nature are all unibyte strings.
>> >> 
>> >> http://góogle.com/fóo isn't unibyte.
>> >
>> > Every string outside of Emacs is unibyte.
>> 
>> This string is inside Emacs.
>
> Then it should be made unibyte before parsing it.

You can't encode it properly without parsing it first.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-09  2:13                           ` Dmitry Gutov
@ 2016-08-09  9:39                             ` Lars Ingebrigtsen
  2016-08-10  6:50                               ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-09  9:39 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

> Here's another question: why does url-encode-url pass the argument
> through encode-coding-string before passing it to
> url-generic-parse-url, if the latter is expected to be able to deal
> with non-ASCII characters?

I don't know.  I don't think `url-encode-url' has ever really worked in
any sensible way in the presence of non-ASCII.

> The only recent change in that function is your commit 8b61c22e dated
> last December, which very much looks like a band-aid in this context.

It's debatable what that function should return in the presence of
non-ASCII domain names, but it's a debatable function all around.

> Since you're better versed in this area than me, can you propose a
> specific fix for the currently discussed bug? It is more serious than
> not being able to use unicode in URLs.

I didn't understand the original bug report and there was no simple
recipe to reproduce the bug.  Why changing url-generic-parse-url was
proposed as a solution is even less unclear.  Perhaps you could write a
test case and summarise what you think the problem is?

> On master, the domain part, which is untouched by url-encode-url, is
> converted to an ASCII unibyte string with puny-encode-domain, inside
> url-http-create-request. But real-fname remains a multibyte string,
> triggering the problem anyway.

The domain is encoded according to IDNA, which is an ASCII string, yes.
(Whether the function returns a unibyte string or not I can't recall.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-09  8:05                                 ` Andreas Schwab
@ 2016-08-09 14:50                                   ` Eli Zaretskii
  2016-08-10  7:12                                     ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-09 14:50 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: stakemorii, larsi, 24117, dgutov

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: stakemorii@gmail.com,  larsi@gnus.org,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> Date: Tue, 09 Aug 2016 10:05:31 +0200
> 
> On Di, Aug 09 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> >> From: Andreas Schwab <schwab@linux-m68k.org>
> >> Cc: stakemorii@gmail.com,  larsi@gnus.org,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> >> Date: Mon, 08 Aug 2016 21:16:21 +0200
> >> 
> >> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> >> 
> >> >> From: Andreas Schwab <schwab@linux-m68k.org>
> >> >> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  stakemorii@gmail.com,  24117@debbugs.gnu.org,  dgutov@yandex.ru
> >> >> Date: Mon, 08 Aug 2016 19:11:41 +0200
> >> >> 
> >> >> On Mo, Aug 08 2016, Eli Zaretskii <eliz@gnu.org> wrote:
> >> >> 
> >> >> > URLs found in nature are all unibyte strings.
> >> >> 
> >> >> http://góogle.com/fóo isn't unibyte.
> >> >
> >> > Every string outside of Emacs is unibyte.
> >> 
> >> This string is inside Emacs.
> >
> > Then it should be made unibyte before parsing it.
> 
> You can't encode it properly without parsing it first.

You don't say what you meant by "encode properly".  It's just a
string, and there are ways to make a string unibyte without any
parsing.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-09  9:39                             ` Lars Ingebrigtsen
@ 2016-08-10  6:50                               ` Dmitry Gutov
  2016-08-11  1:31                                 ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-10  6:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117

On 08/09/2016 12:39 PM, Lars Ingebrigtsen wrote:

> I don't know.  I don't think `url-encode-url' has ever really worked in
> any sensible way in the presence of non-ASCII.

My point is, you're saying that url-generic-parse-url should accept (and 
handle properly) multibyte URLs. But url-encode-url still encodes the 
URL string before passing it to url-generic-parse-url.

> It's debatable what that function should return in the presence of
> non-ASCII domain names, but it's a debatable function all around.

The way the version in master works makes quite a bit of sense to me.

> I didn't understand the original bug report and there was no simple
> recipe to reproduce the bug.  Why changing url-generic-parse-url was
> proposed as a solution is even less unclear.  Perhaps you could write a
> test case and summarise what you think the problem is?

Please try this:

(with-current-buffer
     (let ((url-request-data (encode-coding-string "фыва" 'utf-8)))
       (url-retrieve-synchronously "http://posttestserver.com/post.php"))
   (buffer-string))

You'll get the "Multibyte text in HTTP request" error, which was added 
in a98aa02a5dbf079f7b4f3be5487a2f2b741d103d, to validate request data 
and make sure that users don't have to spend too much time investigating 
problems in their own code like bug#23750.

But the added validation relied on the assumption that the situation 
with multibyte/unibyte strings that url-http-create-request acts on is 
somewhat sane, which is not true, as the current discussion has 
demonstrated.

So we either need to straighten it up, or change the validation logic. 
If everything fails, of course, we can revert the aforementioned commit, 
but that would be bad for users.

>> On master, the domain part, which is untouched by url-encode-url, is
>> converted to an ASCII unibyte string with puny-encode-domain, inside
>> url-http-create-request. But real-fname remains a multibyte string,
>> triggering the problem anyway.
>
> The domain is encoded according to IDNA, which is an ASCII string, yes.
> (Whether the function returns a unibyte string or not I can't recall.)

It does.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-09 14:50                                   ` Eli Zaretskii
@ 2016-08-10  7:12                                     ` Dmitry Gutov
  2016-08-10 14:35                                       ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-10  7:12 UTC (permalink / raw)
  To: Eli Zaretskii, Andreas Schwab; +Cc: stakemorii, larsi, 24117

On 08/09/2016 05:50 PM, Eli Zaretskii wrote:

>> You can't encode it properly without parsing it first.
>
> You don't say what you meant by "encode properly".  It's just a
> string, and there are ways to make a string unibyte without any
> parsing.

Different parts of an URL are supposed to be encoded in different ways.

For instance,

   http://банки.рф/фыва/

turns into

   http://xn--80abwho.xn--p1ai/%D1%84%D1%8B%D0%B2%D0%B0/

The domain is encoded with IDNA, whereas the path uses percent-encoding. 
And they're also often encoded separately (e.g. when you copy-paste the 
above URL from Firefox to a text editor, the result is 
http://банки.рф/%D1%84%D1%8B%D0%B2%D0%B0/).

So I think the encoding of the URL parts should be performed inside 
url-http-create-request. On the master branch, host is passed through 
IDNA encoding, but real-fname is untouched. On emacs-25, I think we 
should convert both to unibyte.

Not sure encode-coding-string is the way to go (why would we assume 
UTF-8?). Personally, using string-as-unibyte makes more sense (neither 
string should contain any multibyte characters at that point), but I 
defer to the more qualified colleagues.

(Why doesn't (encode-coding-string "aaaa" 'ascii) work?)





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-10  7:12                                     ` Dmitry Gutov
@ 2016-08-10 14:35                                       ` Eli Zaretskii
  2016-08-11  2:52                                         ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-10 14:35 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, schwab, 24117

> Cc: stakemorii@gmail.com, larsi@gnus.org, 24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 10 Aug 2016 10:12:40 +0300
> 
> On 08/09/2016 05:50 PM, Eli Zaretskii wrote:
> 
> >> You can't encode it properly without parsing it first.
> >
> > You don't say what you meant by "encode properly".  It's just a
> > string, and there are ways to make a string unibyte without any
> > parsing.
> 
> Different parts of an URL are supposed to be encoded in different ways.
> 
> For instance,
> 
>    http://банки.рф/фыва/
> 
> turns into
> 
>    http://xn--80abwho.xn--p1ai/%D1%84%D1%8B%D0%B2%D0%B0/

Are you saying that url-generic-parse-url performs this encoding, and
that using a unibyte buffer causes that to fail?

> So I think the encoding of the URL parts should be performed inside 
> url-http-create-request.

Fine with me, but when I suggested that, you didn't like the
suggestion.  If you changed your mind, let's do that.

> On the master branch, host is passed through IDNA encoding, but
> real-fname is untouched. On emacs-25, I think we should convert both
> to unibyte.

Not sure I understand why there should be a difference between the two
branches.  Encoding an ASCII string doesn't do any harm.

> Not sure encode-coding-string is the way to go (why would we assume 
> UTF-8?).

Because using UTF-8 doesn't lose anything, you basically get the same
byte stream as stored internally (because 8-bit bytes are not supposed
to happen in URLs).

> (Why doesn't (encode-coding-string "aaaa" 'ascii) work?)

It's 'us-ascii, not 'ascii.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-10  6:50                               ` Dmitry Gutov
@ 2016-08-11  1:31                                 ` Dmitry Gutov
  0 siblings, 0 replies; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-11  1:31 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, 24117

On 08/10/2016 09:50 AM, Dmitry Gutov wrote:
> On 08/09/2016 12:39 PM, Lars Ingebrigtsen wrote:
>
>> I don't know.  I don't think `url-encode-url' has ever really worked in
>> any sensible way in the presence of non-ASCII.
>
> My point is, you're saying that url-generic-parse-url should accept (and
> handle properly) multibyte URLs. But url-encode-url still encodes the
> URL string before passing it to url-generic-parse-url.

By the way, one consequence of the current implementation is that the 
set-buffer-multibyte patch proposed earlier does not break url-retrieve 
on multibyte URLs, on master. So it has one benefit, at least.

Does that assuage your concerns with the patch?





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-10 14:35                                       ` Eli Zaretskii
@ 2016-08-11  2:52                                         ` Dmitry Gutov
  2016-08-11  8:53                                           ` Ted Zlatanov
                                                             ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-11  2:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, schwab, 24117

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

On 08/10/2016 05:35 PM, Eli Zaretskii wrote:

> Are you saying that url-generic-parse-url performs this encoding, and
> that using a unibyte buffer causes that to fail?

No, url-generic-parse-url contains logic that allows to distinguish 
between the domain and the path parts of an URL. So apparently it might 
have to work on multibyte URLs.

That's not strictly necessary, however, given how url-encode-url uses it 
currently (it performs encode-coding-string and decode-coding-string on 
the URL string).

That approach seems flawed to me, but either way, someone will have to 
choose how url-encode-url should use url-generic-parse-url. If we intend 
to leave it as-is, then the proposed patch using set-buffer-multibyte 
actually works fine, even on master, with multibyte URLs.

>> So I think the encoding of the URL parts should be performed inside
>> url-http-create-request.
>
> Fine with me, but when I suggested that, you didn't like the
> suggestion.  If you changed your mind, let's do that.

See below. But yes, I'm more inclined toward this approach now, after 
Lar's objection, and after looking at the code in master.

>> On the master branch, host is passed through IDNA encoding, but
>> real-fname is untouched. On emacs-25, I think we should convert both
>> to unibyte.
>
> Not sure I understand why there should be a difference between the two
> branches.  Encoding an ASCII string doesn't do any harm.

Since it's ASCII, using utf-8 there seems misleading to me. It's a 
question of readability. As a bonus, using us-ascii will validate that 
the strings indeed do not contain any unexpected characters.

>> (Why doesn't (encode-coding-string "aaaa" 'ascii) work?)
>
> It's 'us-ascii, not 'ascii.

Thanks. Attaching a patch, it seems to work well enough.

I'd like to wait for Lar's response now, but someone will have to make 
an executive decision. Both patches (this and the set-multibyte-buffer-p 
one), work in the cases I've tested.

This one seems more conservative, but it'll require a manual merge to 
master. The other one is very trivial, will merge automatically, but 
might cause problems for potential less-careful uses of 
url-generic-parse-url.

[-- Attachment #2: url-http--encode-string.diff --]
[-- Type: text/x-patch, Size: 1359 bytes --]

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 7156e6f..860e652 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -235,7 +235,7 @@ url-http-create-request
 			      'url-http-proxy-basic-auth-storage))
 			 (url-get-authentication url-http-proxy nil 'any nil))))
 	 (real-fname (url-filename url-http-target-url))
-	 (host (url-host url-http-target-url))
+	 (host (url-http--encode-string (url-host url-http-target-url)))
 	 (auth (if (cdr-safe (assoc "Authorization" url-http-extra-headers))
 		   nil
 		 (url-get-authentication (or
@@ -278,7 +278,8 @@ url-http-create-request
           (concat
              ;; The request
              (or url-http-method "GET") " "
-             (if using-proxy (url-recreate-url url-http-target-url) real-fname)
+             (url-http--encode-string
+              (if using-proxy (url-recreate-url url-http-target-url) real-fname))
              " HTTP/" url-http-version "\r\n"
              ;; Version of MIME we speak
              "MIME-Version: 1.0\r\n"
@@ -360,6 +361,11 @@ url-http-create-request
     (url-http-debug "Request is: \n%s" request)
     request))
 
+(defun url-http--encode-string (s)
+  (if (multibyte-string-p s)
+      (encode-coding-string s 'us-ascii)
+    s))
+
 ;; Parsing routines
 (defun url-http-clean-headers ()
   "Remove trailing \r from header lines.

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11  2:52                                         ` Dmitry Gutov
@ 2016-08-11  8:53                                           ` Ted Zlatanov
  2016-08-11 12:31                                             ` Dmitry Gutov
  2016-08-11 11:05                                           ` Lars Ingebrigtsen
                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Ted Zlatanov @ 2016-08-11  8:53 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, schwab, larsi, 24117

On Thu, 11 Aug 2016 05:52:42 +0300 Dmitry Gutov <dgutov@yandex.ru> wrote: 

DG> I'd like to wait for Lar's response now, but someone will have to make an
DG> executive decision. Both patches (this and the set-multibyte-buffer-p 
DG> one), work in the cases I've tested.

DG> This one seems more conservative, but it'll require a manual merge to master.
DG> The other one is very trivial, will merge automatically, but might cause
DG> problems for potential less-careful uses of url-generic-parse-url.

Could you add to your patch the cases you've tested? There's a specific
place for URL parsing tests in test/lisp/url/url-parse-tests.el that
would help everyone.

Thanks
Ted





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11  2:52                                         ` Dmitry Gutov
  2016-08-11  8:53                                           ` Ted Zlatanov
@ 2016-08-11 11:05                                           ` Lars Ingebrigtsen
  2016-08-11 14:47                                           ` Eli Zaretskii
  2016-08-13  0:30                                           ` Sho Takemori
  3 siblings, 0 replies; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-11 11:05 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, schwab, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

> This one seems more conservative, but it'll require a manual merge to
> master. The other one is very trivial, will merge automatically, but
> might cause problems for potential less-careful uses of
> url-generic-parse-url.

Yes, the fix here should be in url-http-create-request, not in the URL
parsing functions.  The main issue here is that the URL request buffer
is a multibyte buffer and (as with all network connection buffers), it
shouldn't be.  (Or, rather, that function just creates a string instead
of a buffer, but the same principle applies.)

But I think this fix looks OK:

> -	 (host (url-host url-http-target-url))
> +	 (host (url-http--encode-string (url-host url-http-target-url)))
>  	 (auth (if (cdr-safe (assoc "Authorization" url-http-extra-headers))

(etc)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11  8:53                                           ` Ted Zlatanov
@ 2016-08-11 12:31                                             ` Dmitry Gutov
  2016-08-11 12:57                                               ` Ted Zlatanov
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-11 12:31 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: stakemorii, larsi, schwab, 24117

On 08/11/2016 11:53 AM, Ted Zlatanov wrote:

> Could you add to your patch the cases you've tested? There's a specific
> place for URL parsing tests in test/lisp/url/url-parse-tests.el that
> would help everyone.

Sure, but only one of the patches affects URL parsing (and Lars prefers 
the other one).





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 12:31                                             ` Dmitry Gutov
@ 2016-08-11 12:57                                               ` Ted Zlatanov
  2016-08-11 13:00                                                 ` Lars Ingebrigtsen
  2017-05-08 13:36                                                 ` Dmitry Gutov
  0 siblings, 2 replies; 62+ messages in thread
From: Ted Zlatanov @ 2016-08-11 12:57 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, Lars Ingebrigtsen, schwab, 24117

On Thu, 11 Aug 2016 15:31:11 +0300 Dmitry Gutov <dgutov@yandex.ru> wrote: 

DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote:
>> Could you add to your patch the cases you've tested? There's a specific
>> place for URL parsing tests in test/lisp/url/url-parse-tests.el that
>> would help everyone.

DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the
DG> other one).

Maybe the tests should be in a separate patch then. Neither your Russian
example nor Lars' example have a parallel in the tests AFAICS. I'd also
add the example hostname that Katsumi Yamaoka gave from the w3m source.

Somewhat related: it would be nice if the URL parser also listed the
non-ASCII scripts used in the domain name. Then eww and other programs
could do one of the typical defenses: either ensure only one script is
used; or allow only scripts that match the user's locale; or catch any
non-ASCII domain names. Typically they'd use Punycode to display such
suspicious domain names:
https://en.wikipedia.org/wiki/IDN_homograph_attack

I bring it up since explicitly allowing non-ASCII domain names
automatically opens up these security concerns, and it's a bit hard to
collect the confusables externally:
https://elpa.gnu.org/packages/uni-confusables.html

On Thu, 11 Aug 2016 13:05:12 +0200 Lars Ingebrigtsen <larsi@gnus.org> wrote: 

LI> Yes, the fix here should be in url-http-create-request, not in the URL
LI> parsing functions.  The main issue here is that the URL request buffer
LI> is a multibyte buffer and (as with all network connection buffers), it
LI> shouldn't be.  (Or, rather, that function just creates a string instead
LI> of a buffer, but the same principle applies.)

I think this is correct: the URL parsing should not care about the
provenance or potential use of that URL to make a HTTP request or
otherwise. But maybe the URL parsing can be smart enough to return both
the IDNA version and the original domain name, plus some parsing
information like the list of scripts I suggested above, to save user
agents from doing that extra work?

Ted





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 12:57                                               ` Ted Zlatanov
@ 2016-08-11 13:00                                                 ` Lars Ingebrigtsen
  2016-08-11 13:18                                                   ` Ted Zlatanov
  2017-05-08 13:36                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2016-08-11 13:00 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, schwab, 24117

Ted Zlatanov <tzz@lifelogs.com> writes:

> Somewhat related: it would be nice if the URL parser also listed the
> non-ASCII scripts used in the domain name. Then eww and other programs
> could do one of the typical defenses: either ensure only one script is
> used; or allow only scripts that match the user's locale; or catch any
> non-ASCII domain names. Typically they'd use Punycode to display such
> suspicious domain names:
> https://en.wikipedia.org/wiki/IDN_homograph_attack

This is implemented in puny and eww:

  ;; Check whether the domain only uses "Highly Restricted" Unicode
  ;; IDNA characters.  If not, transform to punycode to indicate that
  ;; there may be funny business going on.
  (let ((parsed (url-generic-parse-url url)))
    (unless (puny-highly-restrictive-domain-p (url-host parsed))
      (setf (url-host parsed) (puny-encode-domain (url-host parsed)))
      (setq url (url-recreate-url parsed))))


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 13:00                                                 ` Lars Ingebrigtsen
@ 2016-08-11 13:18                                                   ` Ted Zlatanov
  0 siblings, 0 replies; 62+ messages in thread
From: Ted Zlatanov @ 2016-08-11 13:18 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, schwab, 24117, Dmitry Gutov

On Thu, 11 Aug 2016 15:00:55 +0200 Lars Ingebrigtsen <larsi@gnus.org> wrote: 

LI> Ted Zlatanov <tzz@lifelogs.com> writes:
>> Somewhat related: it would be nice if the URL parser also listed the
>> non-ASCII scripts used in the domain name. Then eww and other programs
>> could do one of the typical defenses: either ensure only one script is
>> used; or allow only scripts that match the user's locale; or catch any
>> non-ASCII domain names. Typically they'd use Punycode to display such
>> suspicious domain names:
>> https://en.wikipedia.org/wiki/IDN_homograph_attack

LI> This is implemented in puny and eww:

LI>   ;; Check whether the domain only uses "Highly Restricted" Unicode
LI>   ;; IDNA characters.  If not, transform to punycode to indicate that
LI>   ;; there may be funny business going on.
LI>   (let ((parsed (url-generic-parse-url url)))
LI>     (unless (puny-highly-restrictive-domain-p (url-host parsed))
LI>       (setf (url-host parsed) (puny-encode-domain (url-host parsed)))
LI>       (setq url (url-recreate-url parsed))))

Awesome! Thanks for pointing this out, and sorry for digressing.

Ted





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11  2:52                                         ` Dmitry Gutov
  2016-08-11  8:53                                           ` Ted Zlatanov
  2016-08-11 11:05                                           ` Lars Ingebrigtsen
@ 2016-08-11 14:47                                           ` Eli Zaretskii
  2016-08-11 14:59                                             ` Dmitry Gutov
  2016-08-13  0:30                                           ` Sho Takemori
  3 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-11 14:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, schwab, 24117

> Cc: stakemorii@gmail.com, larsi@gnus.org, schwab@linux-m68k.org,
>  24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 11 Aug 2016 05:52:42 +0300
> 
> >> On the master branch, host is passed through IDNA encoding, but
> >> real-fname is untouched. On emacs-25, I think we should convert both
> >> to unibyte.
> >
> > Not sure I understand why there should be a difference between the two
> > branches.  Encoding an ASCII string doesn't do any harm.
> 
> Since it's ASCII, using utf-8 there seems misleading to me. It's a 
> question of readability.

But AFAIU it doesn't have to be ASCII, it could include non-ASCII
characters, no?

> As a bonus, using us-ascii will validate that the strings indeed do
> not contain any unexpected characters.

If we did allow non-ASCII characters until now, we will definitely
hear from someone who'd say this is a regression.

> Thanks. Attaching a patch, it seems to work well enough.

LGTM, modulo the considerations about the encoding.

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 14:47                                           ` Eli Zaretskii
@ 2016-08-11 14:59                                             ` Dmitry Gutov
  2016-08-11 15:31                                               ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-11 14:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, schwab, 24117

On 08/11/2016 05:47 PM, Eli Zaretskii wrote:

> But AFAIU it doesn't have to be ASCII, it could include non-ASCII
> characters, no?

Not the values we pass to url-http--encode-string in this patch, no. As 
you said, when the request hits the wire, it has to be unibyte.

When the strings reach url-http-create-request, they either contain no 
multibyte characters, or url-http-create-request has to convert the 
strings in some meaningful fashion, and url-http--encode-string is not that.

In master, it uses puny-encode-domain. We can safely assume that 
internationalized domain names don't work in emacs-25.

>> As a bonus, using us-ascii will validate that the strings indeed do
>> not contain any unexpected characters.
>
> If we did allow non-ASCII characters until now, we will definitely
> hear from someone who'd say this is a regression.

I find that highly unlikely. HTTP URLs with multibyte characters need to 
be encoded in a specific way.

> LGTM, modulo the considerations about the encoding.

Thanks. Can we designate this bug as a blocker?





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 14:59                                             ` Dmitry Gutov
@ 2016-08-11 15:31                                               ` Eli Zaretskii
  2016-08-11 18:07                                                 ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-11 15:31 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, schwab, 24117

> Cc: stakemorii@gmail.com, larsi@gnus.org, schwab@linux-m68k.org,
>  24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 11 Aug 2016 17:59:16 +0300
> 
> On 08/11/2016 05:47 PM, Eli Zaretskii wrote:
> 
> > But AFAIU it doesn't have to be ASCII, it could include non-ASCII
> > characters, no?
> 
> Not the values we pass to url-http--encode-string in this patch, no. As 
> you said, when the request hits the wire, it has to be unibyte.

Unibyte != ASCII.

> > LGTM, modulo the considerations about the encoding.
> 
> Thanks. Can we designate this bug as a blocker?

Why not just push the fix to emacs-25?





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 15:31                                               ` Eli Zaretskii
@ 2016-08-11 18:07                                                 ` Dmitry Gutov
  2016-08-11 19:47                                                   ` Eli Zaretskii
  2016-08-12 21:44                                                   ` John Wiegley
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-11 18:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stakemorii, larsi, schwab, 24117

On 08/11/2016 06:31 PM, Eli Zaretskii wrote:

>> Not the values we pass to url-http--encode-string in this patch, no. As
>> you said, when the request hits the wire, it has to be unibyte.
>
> Unibyte != ASCII.

Indeed. And the above function converts the latter to the former.

>> Thanks. Can we designate this bug as a blocker?
>
> Why not just push the fix to emacs-25?

Because John likes to question commits in emacs-25 that do not reference 
bugs marked as blocking. See also the recent message in the emacs-devel 
thread.

Anyway, pushed. I'll add some tests to master a bit later.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 18:07                                                 ` Dmitry Gutov
@ 2016-08-11 19:47                                                   ` Eli Zaretskii
  2016-08-12 21:44                                                   ` John Wiegley
  1 sibling, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-11 19:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, schwab, 24117

> Cc: stakemorii@gmail.com, larsi@gnus.org, schwab@linux-m68k.org,
>  24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 11 Aug 2016 21:07:50 +0300
> 
> > Why not just push the fix to emacs-25?
> 
> Because John likes to question commits in emacs-25 that do not reference 
> bugs marked as blocking. See also the recent message in the emacs-devel 
> thread.
> 
> Anyway, pushed. I'll add some tests to master a bit later.

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 18:07                                                 ` Dmitry Gutov
  2016-08-11 19:47                                                   ` Eli Zaretskii
@ 2016-08-12 21:44                                                   ` John Wiegley
  1 sibling, 0 replies; 62+ messages in thread
From: John Wiegley @ 2016-08-12 21:44 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, schwab, larsi, 24117

>>>>> "DG" == Dmitry Gutov <dgutov@yandex.ru> writes:

>>> Thanks. Can we designate this bug as a blocker?
>> 
>> Why not just push the fix to emacs-25?

GD> Because John likes to question commits in emacs-25 that do not reference
DG> bugs marked as blocking. See also the recent message in the emacs-devel
DG> thread.

Yes, you know me well. :)  In fact, I came to this thread after seeing your
commit for exactly this reason, so I appreciate this comment.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11  2:52                                         ` Dmitry Gutov
                                                             ` (2 preceding siblings ...)
  2016-08-11 14:47                                           ` Eli Zaretskii
@ 2016-08-13  0:30                                           ` Sho Takemori
  2016-08-13  7:02                                             ` Eli Zaretskii
  3 siblings, 1 reply; 62+ messages in thread
From: Sho Takemori @ 2016-08-13  0:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: schwab, 24117, larsi

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

Is this bug fixed in emacs-25? With "emacs -Q", I do not see this error
anymore. Thank you very much.

But I set the language environment by (set-language-environment "Japanese")
in ~/.emacs.d/init.el.
With the configration, I still see the same error when using anaconda-mode.

And url-mime-charset-string in url-http-create-request is a multibyte stirng
and is bound to "euc-jp;q=1, iso-2022-jp;q=0.5, shift_jis;q=0.5,
euc-jis-2004;q=0.5, iso-2022-jp-2004;q=0.5, utf-8;q=0.5, iso-8859-1;q=0.5,
big5;q=0.5, iso-2022-jp-2;q=0.5, gb2312;q=0.5, euc-tw;q=0.5, euc-kr;q=0.5,
us-ascii;q=0.5, utf-7;q=0.5, hz-gb-2312;q=0.5, big5-hkscs;q=0.5, gbk;q=0.5,
gb18030;q=0.5, iso-8859-5;q=0.5, koi8-r;q=0.5, koi8-u;q=0.5, cp866;q=0.5,
koi8-t;q=0.5, windows-1251;q=0.5, cp855;q=0.5, iso-8859-2;q=0.5,
iso-8859-3;q=0.5, iso-8859-4;q=0.5, iso-8859-9;q=0.5, iso-8859-10;q=0.5,
iso-8859-13;q=0.5, iso-8859-14;q=0.5, iso-8859-15;q=0.5,
windows-1250;q=0.5, windows-1252;q=0.5, windows-1254;q=0.5,
windows-1257;q=0.5, cp775;q=0.5, cp850;q=0.5, cp852;q=0.5, cp857;q=0.5,
cp858;q=0.5, cp860;q=0.5, cp861;q=0.5, cp863;q=0.5, cp865;q=0.5,
cp437;q=0.5, macintosh;q=0.5, next;q=0.5, hp-roman8;q=0.5,
adobe-standard-encoding;q=0.5, iso-8859-16;q=0.5, iso-8859-7;q=0.5,
windows-1253;q=0.5, cp737;q=0.5, cp851;q=0.5, cp869;q=0.5,
iso-8859-8;q=0.5, windows-1255;q=0.5, cp862;q=0.5, cp874;q=0.5,
iso-8859-11;q=0.5, viscii;q=0.5, windows-1258;q=0.5, iso-8859-6;q=0.5,
windows-1256;q=0.5, iso-2022-cn;q=0.5, iso-2022-cn-ext;q=0.5,
iso-2022-kr;q=0.5, utf-16le;q=0.5, utf-16be;q=0.5, utf-16;q=0.5,
x-ctext;q=0.5".

[-- Attachment #2: Type: text/html, Size: 1898 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13  0:30                                           ` Sho Takemori
@ 2016-08-13  7:02                                             ` Eli Zaretskii
  2016-08-13  7:31                                               ` Sho Takemori
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-13  7:02 UTC (permalink / raw)
  To: Sho Takemori; +Cc: schwab, larsi, 24117, dgutov

> From: Sho Takemori <stakemorii@gmail.com>
> Date: Sat, 13 Aug 2016 09:30:35 +0900
> Cc: Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org, schwab@linux-m68k.org, 
> 	24117@debbugs.gnu.org
> 
> But I set the language environment by (set-language-environment "Japanese") in ~/.emacs.d/init.el.
> With the configration, I still see the same error when using anaconda-mode.
> 
> And url-mime-charset-string in url-http-create-request is a multibyte stirng

I guess we need to run url-mime-charset-string through
url-http--encode-string as well.  Can you try that?

Thanks.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13  7:02                                             ` Eli Zaretskii
@ 2016-08-13  7:31                                               ` Sho Takemori
  2016-08-13  8:31                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Sho Takemori @ 2016-08-13  7:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, Lars Ingebrigtsen, 24117, Brief Busters


[-- Attachment #1.1: Type: text/plain, Size: 798 bytes --]

> I guess we need to run url-mime-charset-string through
> url-http--encode-string as well.  Can you try that?

Yes. I attached a patch.

2016-08-13 16:02 GMT+09:00 Eli Zaretskii <eliz@gnu.org>:

> > From: Sho Takemori <stakemorii@gmail.com>
> > Date: Sat, 13 Aug 2016 09:30:35 +0900
> > Cc: Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org, schwab@linux-m68k.org,
> >       24117@debbugs.gnu.org
> >
> > But I set the language environment by (set-language-environment
> "Japanese") in ~/.emacs.d/init.el.
> > With the configration, I still see the same error when using
> anaconda-mode.
> >
> > And url-mime-charset-string in url-http-create-request is a multibyte
> stirng
>
> I guess we need to run url-mime-charset-string through
> url-http--encode-string as well.  Can you try that?
>
> Thanks.
>

[-- Attachment #1.2: Type: text/html, Size: 1524 bytes --]

[-- Attachment #2: diff.patch --]
[-- Type: text/x-patch, Size: 763 bytes --]

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 860e652..5840a08 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -313,9 +313,10 @@ url-http-create-request
 			  (setq url-mime-encoding-string "gzip")))
                  (concat
                   "Accept-encoding: " url-mime-encoding-string "\r\n"))
-             (if url-mime-charset-string
-                 (concat
-                  "Accept-charset: " url-mime-charset-string "\r\n"))
+             (url-http--encode-string
+              (if url-mime-charset-string
+                  (concat
+                   "Accept-charset: " url-mime-charset-string "\r\n")))
              ;; Languages we understand
              (if url-mime-language-string
                  (concat

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13  7:31                                               ` Sho Takemori
@ 2016-08-13  8:31                                                 ` Eli Zaretskii
  2016-08-13 13:02                                                   ` Sho Takemori
  2016-08-13 15:32                                                   ` Dmitry Gutov
  0 siblings, 2 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-13  8:31 UTC (permalink / raw)
  To: Sho Takemori; +Cc: schwab, larsi, 24117, dgutov

> From: Sho Takemori <stakemorii@gmail.com>
> Date: Sat, 13 Aug 2016 16:31:56 +0900
> Cc: Brief Busters <dgutov@yandex.ru>, Lars Ingebrigtsen <larsi@gnus.org>, schwab@linux-m68k.org, 
> 	24117@debbugs.gnu.org
> 
> > I guess we need to run url-mime-charset-string through
> > url-http--encode-string as well. Can you try that?
> 
> Yes. I attached a patch.

Thanks.  I pushed a slightly different fix, please test it.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13  8:31                                                 ` Eli Zaretskii
@ 2016-08-13 13:02                                                   ` Sho Takemori
  2016-08-13 13:11                                                     ` Eli Zaretskii
  2016-08-13 15:32                                                   ` Dmitry Gutov
  1 sibling, 1 reply; 62+ messages in thread
From: Sho Takemori @ 2016-08-13 13:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, Lars Ingebrigtsen, 24117, Brief Busters

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

>
> Thanks. I pushed a slightly different fix, please test it.
>
anaconda-mode works fine. Thanks!

2016-08-13 17:31 GMT+09:00 Eli Zaretskii <eliz@gnu.org>:

> > From: Sho Takemori <stakemorii@gmail.com>
> > Date: Sat, 13 Aug 2016 16:31:56 +0900
> > Cc: Brief Busters <dgutov@yandex.ru>, Lars Ingebrigtsen <larsi@gnus.org>,
> schwab@linux-m68k.org,
> >       24117@debbugs.gnu.org
> >
> > > I guess we need to run url-mime-charset-string through
> > > url-http--encode-string as well. Can you try that?
> >
> > Yes. I attached a patch.
>
> Thanks.  I pushed a slightly different fix, please test it.
>

[-- Attachment #2: Type: text/html, Size: 1348 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13 13:02                                                   ` Sho Takemori
@ 2016-08-13 13:11                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-13 13:11 UTC (permalink / raw)
  To: Sho Takemori; +Cc: schwab, larsi, 24117, dgutov

> From: Sho Takemori <stakemorii@gmail.com>
> Date: Sat, 13 Aug 2016 22:02:17 +0900
> Cc: Brief Busters <dgutov@yandex.ru>, Lars Ingebrigtsen <larsi@gnus.org>, schwab@linux-m68k.org, 
> 	24117@debbugs.gnu.org
> 
>  Thanks. I pushed a slightly different fix, please test it.
> 
> anaconda-mode works fine. Thanks!

Great, thanks for testing.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13  8:31                                                 ` Eli Zaretskii
  2016-08-13 13:02                                                   ` Sho Takemori
@ 2016-08-13 15:32                                                   ` Dmitry Gutov
  2016-08-13 15:56                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2016-08-13 15:32 UTC (permalink / raw)
  To: Eli Zaretskii, Sho Takemori; +Cc: schwab, larsi, 24117

On 08/13/2016 11:31 AM, Eli Zaretskii wrote:

> I pushed a slightly different fix, please test it.

Thanks, Eli. Sorry I didn't get to it first.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-13 15:32                                                   ` Dmitry Gutov
@ 2016-08-13 15:56                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2016-08-13 15:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, larsi, schwab, 24117

> Cc: schwab@linux-m68k.org, larsi@gnus.org, 24117@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 13 Aug 2016 18:32:36 +0300
> 
> On 08/13/2016 11:31 AM, Eli Zaretskii wrote:
> 
> > I pushed a slightly different fix, please test it.
> 
> Thanks, Eli. Sorry I didn't get to it first.

No sweat.  You make up for it many times over, when I'm not available.





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2016-08-11 12:57                                               ` Ted Zlatanov
  2016-08-11 13:00                                                 ` Lars Ingebrigtsen
@ 2017-05-08 13:36                                                 ` Dmitry Gutov
  2017-05-08 20:57                                                   ` Lars Ingebrigtsen
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Gutov @ 2017-05-08 13:36 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: stakemorii, Lars Magne Ingebrigtsen, schwab, 24117

On 11.08.2016 15:57, Ted Zlatanov wrote:

> DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote:
>>> Could you add to your patch the cases you've tested? There's a specific
>>> place for URL parsing tests in test/lisp/url/url-parse-tests.el that
>>> would help everyone.
> 
> DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the
> DG> other one).
> 
> Maybe the tests should be in a separate patch then. Neither your Russian
> example nor Lars' example have a parallel in the tests AFAICS. I'd also
> add the example hostname that Katsumi Yamaoka gave from the w3m source.

Just got around to this. The test I came up with looks like this:

(ert-deftest url-generic-parse-url/multibyte-host-and-path ()
   (should (equal (url-generic-parse-url "http://банки.рф/фыва/")
                  (url-parse-make-urlobj "http" nil nil "банки.рф" nil
                                         "/фыва/" nil nil t))))

But! What behavior would this test? If we're making sure here that 
url-generic-parse-url can cope with multibyte characters anywhere in the 
URL, the encode-coding-string/decode-coding-string logic in 
url-encode-url is extraneous. I'm not sure that it is, or is there are 
some edge cases (are they fixable? should we add tests for them?).

So if this test goes in, it should be accompanied with the 
simplification of url-encode-url.

Lars, what do you think?





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2017-05-08 13:36                                                 ` Dmitry Gutov
@ 2017-05-08 20:57                                                   ` Lars Ingebrigtsen
  2017-05-10  0:40                                                     ` Dmitry Gutov
  0 siblings, 1 reply; 62+ messages in thread
From: Lars Ingebrigtsen @ 2017-05-08 20:57 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: stakemorii, Ted Zlatanov, schwab, 24117

Dmitry Gutov <dgutov@yandex.ru> writes:

> Just got around to this. The test I came up with looks like this:
>
> (ert-deftest url-generic-parse-url/multibyte-host-and-path ()
>   (should (equal (url-generic-parse-url "http://банки.рф/фыва/")
>                  (url-parse-make-urlobj "http" nil nil "банки.рф" nil
>                                         "/фыва/" nil nil t))))

That looks like the correct decomposition of this URL, so
url-generic-parse-url does the right thing.

> But! What behavior would this test? If we're making sure here that
> url-generic-parse-url can cope with multibyte characters anywhere in
> the URL, the encode-coding-string/decode-coding-string logic in
> url-encode-url is extraneous. I'm not sure that it is, or is there are
> some edge cases (are they fixable? should we add tests for them?).

(url-encode-url "http://банки.рф/фыва/")
=> "http://банки.рф/%D1%84%D1%8B%D0%B2%D0%B0/"

It is perhaps debatable whether the host name should be encoded (with
punycode) here, but this is otherwise correct.

> So if this test goes in, it should be accompanied with the
> simplification of url-encode-url.
>
> Lars, what do you think?

The utf-8 encoding does seem superfluous, especially since
url-hexify-string also does the encoding...

(url-hexify-string "фыва")
=> "%D1%84%D1%8B%D0%B2%D0%B0"

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 62+ messages in thread

* bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
  2017-05-08 20:57                                                   ` Lars Ingebrigtsen
@ 2017-05-10  0:40                                                     ` Dmitry Gutov
  0 siblings, 0 replies; 62+ messages in thread
From: Dmitry Gutov @ 2017-05-10  0:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: stakemorii, Ted Zlatanov, schwab, 24117-done

On 08.05.2017 23:57, Lars Ingebrigtsen wrote:

> The utf-8 encoding does seem superfluous, especially since
> url-hexify-string also does the encoding...

Simplified and pushed. With that, I am closing the bug.

Thanks all.





^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2017-05-10  0:40 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-31  8:26 bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Sho Takemori
2016-07-31 14:31 ` Eli Zaretskii
2016-07-31 23:21   ` Sho Takemori
2016-08-01 13:17     ` Eli Zaretskii
2016-08-02  0:52       ` Dmitry Gutov
2016-08-02 15:25         ` Eli Zaretskii
2016-08-03  2:39           ` Dmitry Gutov
2016-08-04 17:02             ` Eli Zaretskii
2016-08-08  1:56               ` Dmitry Gutov
2016-08-08 13:32                 ` Ted Zlatanov
2016-08-08 23:48                   ` Katsumi Yamaoka
2016-08-08 15:33                 ` Eli Zaretskii
2016-08-08 15:52                 ` Lars Ingebrigtsen
2016-08-08 15:54                 ` Lars Ingebrigtsen
2016-08-08 16:14                   ` Eli Zaretskii
2016-08-08 16:18                     ` Lars Ingebrigtsen
2016-08-08 16:33                       ` Eli Zaretskii
2016-08-08 17:11                         ` Andreas Schwab
2016-08-08 17:30                           ` Eli Zaretskii
2016-08-08 19:16                             ` Andreas Schwab
2016-08-09  2:32                               ` Eli Zaretskii
2016-08-09  8:05                                 ` Andreas Schwab
2016-08-09 14:50                                   ` Eli Zaretskii
2016-08-10  7:12                                     ` Dmitry Gutov
2016-08-10 14:35                                       ` Eli Zaretskii
2016-08-11  2:52                                         ` Dmitry Gutov
2016-08-11  8:53                                           ` Ted Zlatanov
2016-08-11 12:31                                             ` Dmitry Gutov
2016-08-11 12:57                                               ` Ted Zlatanov
2016-08-11 13:00                                                 ` Lars Ingebrigtsen
2016-08-11 13:18                                                   ` Ted Zlatanov
2017-05-08 13:36                                                 ` Dmitry Gutov
2017-05-08 20:57                                                   ` Lars Ingebrigtsen
2017-05-10  0:40                                                     ` Dmitry Gutov
2016-08-11 11:05                                           ` Lars Ingebrigtsen
2016-08-11 14:47                                           ` Eli Zaretskii
2016-08-11 14:59                                             ` Dmitry Gutov
2016-08-11 15:31                                               ` Eli Zaretskii
2016-08-11 18:07                                                 ` Dmitry Gutov
2016-08-11 19:47                                                   ` Eli Zaretskii
2016-08-12 21:44                                                   ` John Wiegley
2016-08-13  0:30                                           ` Sho Takemori
2016-08-13  7:02                                             ` Eli Zaretskii
2016-08-13  7:31                                               ` Sho Takemori
2016-08-13  8:31                                                 ` Eli Zaretskii
2016-08-13 13:02                                                   ` Sho Takemori
2016-08-13 13:11                                                     ` Eli Zaretskii
2016-08-13 15:32                                                   ` Dmitry Gutov
2016-08-13 15:56                                                     ` Eli Zaretskii
2016-08-08 16:21                     ` Lars Ingebrigtsen
2016-08-08 16:33                       ` Eli Zaretskii
2016-08-08 16:58                         ` Lars Ingebrigtsen
2016-08-08 17:11                           ` Eli Zaretskii
2016-08-08 19:46                   ` Dmitry Gutov
2016-08-08 20:19                     ` Lars Ingebrigtsen
2016-08-08 20:35                       ` Dmitry Gutov
2016-08-08 20:36                         ` Lars Ingebrigtsen
2016-08-09  2:13                           ` Dmitry Gutov
2016-08-09  9:39                             ` Lars Ingebrigtsen
2016-08-10  6:50                               ` Dmitry Gutov
2016-08-11  1:31                                 ` Dmitry Gutov
2016-08-02  3:26       ` Sho Takemori

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).