From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: how to calculate the size of string in bytes? Date: Tue, 18 Aug 2015 22:49:58 +0300 Message-ID: <831tf0l6l5.fsf@gnu.org> References: <20150818101352.GA6744@tuxteam.de> <83mvxoll2g.fsf@gnu.org> <20150818144530.GB15783@tuxteam.de> <83k2sslk0d.fsf@gnu.org> <20150818160145.GA18309@tuxteam.de> <83fv3glfm0.fsf@gnu.org> <20150818193049.GA24519@tuxteam.de> NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1439927444 32160 80.91.229.3 (18 Aug 2015 19:50:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Aug 2015 19:50:44 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Aug 18 21:50:36 2015 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZRmtu-0004re-VZ for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 21:50:27 +0200 Original-Received: from localhost ([::1]:59314 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRmtu-0007zj-4B for geh-help-gnu-emacs@m.gmane.org; Tue, 18 Aug 2015 15:50:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52121) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRmtj-0007vH-6L for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 15:50:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZRmtf-0000Bh-4U for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 15:50:15 -0400 Original-Received: from mtaout28.012.net.il ([80.179.55.184]:51992) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRmte-0000BA-TV for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 15:50:11 -0400 Original-Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NTA00900N127A00@mtaout28.012.net.il> for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 22:50:05 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NTA002CZN3H4G60@mtaout28.012.net.il> for help-gnu-emacs@gnu.org; Tue, 18 Aug 2015 22:50:05 +0300 (IDT) In-reply-to: <20150818193049.GA24519@tuxteam.de> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.184 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:106684 Archived-At: > Date: Tue, 18 Aug 2015 21:30:49 +0200 > Cc: help-gnu-emacs@gnu.org > From: > > I was having difficulties in understanding you Sorry about that. It's a complex issue to explain in a few words. > Now I understand: Emacs's internal (raw) coding system can represent > "characters not expressible in utf-8". More accurately, it can represent characters outside the Unicode code space. And please don't call that "raw"; the internal representation of characters used by Emacs is known as 'utf-8-emacs'. > The function encode-coding-string passes those bytes silently > through, outputting an invalid utf-8 sequence. Yes. Although in interactive functions Emacs will normally complain and ask for a better encoding. > So I venture the guess that when the Emacs buffer contains something > epressible as valid utf-8, 'utf-8 and 'raw are equivalent Yes. > (what about combining characters?) Emacs doesn't normalize/compose/decompose characters when it encodes text (with a notable exception of the utf-8-hfs encoding). Applications that want this should do that themselves, e.g. using the facilities in ucs-normalize.el.