* [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
@ 2017-01-22 23:31 Danny Milosavljevic
2017-01-23 3:55 ` Maxim Cournoyer
2017-01-23 16:30 ` Marius Bakke
0 siblings, 2 replies; 7+ messages in thread
From: Danny Milosavljevic @ 2017-01-22 23:31 UTC (permalink / raw)
To: guix-devel
* gnu/packages/python.scm (python-2.7)[arguments]: Modify.
---
gnu/packages/python.scm | 1 +
1 file changed, 1 insertion(+)
diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
index fd423d311..6caaeaaf8 100644
--- a/gnu/packages/python.scm
+++ b/gnu/packages/python.scm
@@ -170,6 +170,7 @@
(list "--enable-shared" ;allow embedding
"--with-system-ffi" ;build ctypes
"--with-ensurepip=install" ;install pip and setuptools
+ "--enable-unicode=ucs4"
(string-append "LDFLAGS=-Wl,-rpath="
(assoc-ref %outputs "out") "/lib"))
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-22 23:31 [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding Danny Milosavljevic
@ 2017-01-23 3:55 ` Maxim Cournoyer
2017-01-23 16:30 ` Marius Bakke
1 sibling, 0 replies; 7+ messages in thread
From: Maxim Cournoyer @ 2017-01-23 3:55 UTC (permalink / raw)
To: Danny Milosavljevic; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 809 bytes --]
Danny Milosavljevic <dannym@scratchpost.org> writes:
Hi Danny!
> * gnu/packages/python.scm (python-2.7)[arguments]: Modify.
> ---
> gnu/packages/python.scm | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index fd423d311..6caaeaaf8 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -170,6 +170,7 @@
> (list "--enable-shared" ;allow embedding
> "--with-system-ffi" ;build ctypes
> "--with-ensurepip=install" ;install pip and setuptools
> + "--enable-unicode=ucs4"
> (string-append "LDFLAGS=-Wl,-rpath="
> (assoc-ref %outputs "out") "/lib"))
>
LGTM!
Maxim
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-22 23:31 [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding Danny Milosavljevic
2017-01-23 3:55 ` Maxim Cournoyer
@ 2017-01-23 16:30 ` Marius Bakke
2017-01-23 22:41 ` Ludovic Courtès
1 sibling, 1 reply; 7+ messages in thread
From: Marius Bakke @ 2017-01-23 16:30 UTC (permalink / raw)
To: Danny Milosavljevic, guix-devel
[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]
Danny Milosavljevic <dannym@scratchpost.org> writes:
> * gnu/packages/python.scm (python-2.7)[arguments]: Modify.
> ---
> gnu/packages/python.scm | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index fd423d311..6caaeaaf8 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -170,6 +170,7 @@
> (list "--enable-shared" ;allow embedding
> "--with-system-ffi" ;build ctypes
> "--with-ensurepip=install" ;install pip and setuptools
> + "--enable-unicode=ucs4"
> (string-append "LDFLAGS=-Wl,-rpath="
> (assoc-ref %outputs "out") "/lib"))
>
Hi Danny,
Can you push this to 'core-updates' instead?
It will cause a rebuild of more than 2000 packages on 'python-tests' and
invalidate almost all substitutes, and I would like to merge it sooner
rather than later.
Otherwise LGTM. I checked some other distros and they seem to have this
enabled. Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-23 16:30 ` Marius Bakke
@ 2017-01-23 22:41 ` Ludovic Courtès
2017-01-23 23:46 ` Danny Milosavljevic
0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2017-01-23 22:41 UTC (permalink / raw)
To: Marius Bakke; +Cc: guix-devel
Marius Bakke <mbakke@fastmail.com> skribis:
> Danny Milosavljevic <dannym@scratchpost.org> writes:
>
>> * gnu/packages/python.scm (python-2.7)[arguments]: Modify.
>> ---
>> gnu/packages/python.scm | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
>> index fd423d311..6caaeaaf8 100644
>> --- a/gnu/packages/python.scm
>> +++ b/gnu/packages/python.scm
>> @@ -170,6 +170,7 @@
>> (list "--enable-shared" ;allow embedding
>> "--with-system-ffi" ;build ctypes
>> "--with-ensurepip=install" ;install pip and setuptools
>> + "--enable-unicode=ucs4"
>> (string-append "LDFLAGS=-Wl,-rpath="
>> (assoc-ref %outputs "out") "/lib"))
>>
>
> Hi Danny,
>
> Can you push this to 'core-updates' instead?
>
> It will cause a rebuild of more than 2000 packages on 'python-tests' and
> invalidate almost all substitutes, and I would like to merge it sooner
> rather than later.
>
> Otherwise LGTM. I checked some other distros and they seem to have this
> enabled. Thanks!
That means that strings are internally UCS-4-encoded, right? What’s the
rationale, and what happens when this flag is omitted?
(I’m not objecting to the patch, just trying to educate myself. :-))
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-23 22:41 ` Ludovic Courtès
@ 2017-01-23 23:46 ` Danny Milosavljevic
2017-01-24 8:27 ` Hartmut Goebel
2017-01-24 21:08 ` Ludovic Courtès
0 siblings, 2 replies; 7+ messages in thread
From: Danny Milosavljevic @ 2017-01-23 23:46 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Hi Ludo,
> > Otherwise LGTM. I checked some other distros and they seem to have this
> > enabled. Thanks!
>
> That means that strings are internally UCS-4-encoded, right? What’s the
> rationale, and what happens when this flag is omitted?
The CPython C interface changes depending on the flag and some Python extensions don't work with the narrow UTF-16 Unicode - which is what it would use if you don't specify.
The default, UTF-16, is basically just historical baggage from when Unicode had fewer than 65536 codepoints in the standard.
The max codepoint used nowadays is 1114111.
UCS-4 encoding means that just one 32-bit word encodes one Unicode codepoint (it's 1:1). It's the most straightforward encoding if you don't care about size wastage.
If you *do* care about size wastage, you use UTF-8.
Only if you are tied down by some kind of backward compatibility constraints you use UTF-16 or UCS-2 (the latter doesn't even have some way to encode codepoints over 65535 AT ALL - but UTF-16 uses a variable-length encoding to represent those).
Python Unicode string builds on Microsoft Windows and Mac OS X usually use UTF-16 while on GNU Linux distributions we usually use UCS-4.
Python 3 does the obvious thing and has only one string class and switches the internal string encoding depending on what codepoints are used. That way the user is none the wiser and it still saves space.
But Python 2.7 still has "strings" and "unicode strings" which are disjunct with no such optimizations.
So this patch basically just makes sure that we do the same as other distributions so that all the Python 2.7 extensions work.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-23 23:46 ` Danny Milosavljevic
@ 2017-01-24 8:27 ` Hartmut Goebel
2017-01-24 21:08 ` Ludovic Courtès
1 sibling, 0 replies; 7+ messages in thread
From: Hartmut Goebel @ 2017-01-24 8:27 UTC (permalink / raw)
To: guix-devel
[-- Attachment #1: Type: text/plain, Size: 315 bytes --]
Hi Danny,
thanks for the explanation. I wondered about this since I stepped over
it the first time (but did not bother investigating it.)
--
Regards
Hartmut Goebel
| Hartmut Goebel | h.goebel@crazy-compilers.com |
| www.crazy-compilers.com | compilers which you thought are impossible |
[-- Attachment #2: 0xBF773B65.asc --]
[-- Type: application/pgp-keys, Size: 14855 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding.
2017-01-23 23:46 ` Danny Milosavljevic
2017-01-24 8:27 ` Hartmut Goebel
@ 2017-01-24 21:08 ` Ludovic Courtès
1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2017-01-24 21:08 UTC (permalink / raw)
To: Danny Milosavljevic; +Cc: guix-devel
Hi Danny,
Danny Milosavljevic <dannym@scratchpost.org> skribis:
>> > Otherwise LGTM. I checked some other distros and they seem to have
>> > this enabled. Thanks!
>> That means that strings are internally UCS-4-encoded, right?
>> What’s the rationale, and what happens when this flag is omitted?
>
> The CPython C interface changes depending on the flag and some Python
> extensions don't work with the narrow UTF-16 Unicode - which is what
> it would use if you don't specify.
>
> The default, UTF-16, is basically just historical baggage from when
> Unicode had fewer than 65536 codepoints in the standard.
[...]
Thanks for the explanation, it makes a lot of sense!
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-24 21:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-22 23:31 [PATCH python-tests] gnu: python-2.7: Enable UCS-4 Unicode encoding Danny Milosavljevic
2017-01-23 3:55 ` Maxim Cournoyer
2017-01-23 16:30 ` Marius Bakke
2017-01-23 22:41 ` Ludovic Courtès
2017-01-23 23:46 ` Danny Milosavljevic
2017-01-24 8:27 ` Hartmut Goebel
2017-01-24 21:08 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).