From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Marko Rauhamaa Newsgroups: gmane.lisp.guile.user Subject: Re: Running script from directory with UTF-8 characters Date: Thu, 24 Dec 2015 00:20:55 +0200 Message-ID: <87poxwajso.fsf@elektro.pacujo.net> References: <87twnbfkzb.fsf@elektro.pacujo.net> <20151222003447.198ea945@bother.homenet> <87io3rffo5.fsf@elektro.pacujo.net> <20151222142125.17ba7368@bother.homenet> <87bn9ieaup.fsf@elektro.pacujo.net> <20151222201240.3a66fd94@bother.homenet> <87oadicjbc.fsf@elektro.pacujo.net> <83wps6p5d2.fsf@gnu.org> <87d1tycgdr.fsf@elektro.pacujo.net> <83vb7pnhnt.fsf@gnu.org> <877fk5as8r.fsf@elektro.pacujo.net> <83poxxnenn.fsf@gnu.org> <874mf8hlx1.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1450909269 17396 80.91.229.3 (23 Dec 2015 22:21:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 23 Dec 2015 22:21:09 +0000 (UTC) Cc: guile-user@gnu.org To: David Kastrup Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Dec 23 23:21:09 2015 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aBrmO-0003b1-9o for guile-user@m.gmane.org; Wed, 23 Dec 2015 23:21:08 +0100 Original-Received: from localhost ([::1]:58127 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBrmN-0007Np-3D for guile-user@m.gmane.org; Wed, 23 Dec 2015 17:21:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55930) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBrmE-0007NZ-Ue for guile-user@gnu.org; Wed, 23 Dec 2015 17:20:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aBrmE-0003oI-4M for guile-user@gnu.org; Wed, 23 Dec 2015 17:20:58 -0500 Original-Received: from [2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508] (port=44830 helo=pacujo.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBrmD-0003nz-T1 for guile-user@gnu.org; Wed, 23 Dec 2015 17:20:58 -0500 Original-Received: from elektro.pacujo.net (192.168.1.200) by elektro.pacujo.net; Thu, 24 Dec 2015 00:20:55 +0200 Original-Received: by elektro.pacujo.net (sSMTP sendmail emulation); Thu, 24 Dec 2015 00:20:55 +0200 In-Reply-To: <874mf8hlx1.fsf@fencepost.gnu.org> (David Kastrup's message of "Wed, 23 Dec 2015 22:53:14 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:1bc8:1a0:5384:7a2b:cbff:fe9f:e508 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:12277 Archived-At: David Kastrup : > That's more economical than Python's method which uses the encodings > of surrogate words not allowed in properly encoded UTF-8, taking > 3=C2=A0bytes rather than the 2 Emacs makes do with. Using high codepoints > above the Unicode space would even take 4=C2=A0bytes. Actually, CPython represents strings internally even less "economically:" it uses single-byte strings if it can (Latin-1). If it can't, it uses all-two-byte strings (UCS-2). If it can't do even that, it uses all-four-byte strings (UCS-4). Thus, even a single code point above 65535 will cause the whole string to consist of 4-byte integers. Marko