From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: string-bytes and coding systems Date: Thu, 09 Mar 2017 18:01:37 +0200 Message-ID: <83tw72fnha.fsf@gnu.org> References: <87r327nyto.fsf@ericabrahamsen.net> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1489076118 26872 195.159.176.226 (9 Mar 2017 16:15:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 9 Mar 2017 16:15:18 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 09 17:15:10 2017 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm0iW-0005jd-Ii for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 17:15:04 +0100 Original-Received: from localhost ([::1]:35001 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm0ic-0004Jn-FA for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 11:15:10 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm0ho-0004BF-RK for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 11:14:33 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cm0hZ-0001xF-6i for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 11:14:20 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:49550) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm0hZ-0001ig-4S for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 11:14:05 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2253 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cm0Vr-0000Oq-LW for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 11:02:00 -0500 In-reply-to: <87r327nyto.fsf@ericabrahamsen.net> (message from Eric Abrahamsen on Wed, 08 Mar 2017 15:17:07 -0800) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:112491 Archived-At: > From: Eric Abrahamsen > Date: Wed, 08 Mar 2017 15:17:07 -0800 > > I'm essentially taking the `string-bytes' of each line, and if it's too > long, popping characters off the end until it's fewer than 75 bytes. > > My understanding/assumption is that `string-bytes' returns the number of > bytes according to Emacs' internal coding system Yes. > which is close enough to utf-8 to make no difference. No. The deviations from UTF-8 could be significant in some cases, with some exotic characters and with raw bytes. > When this text gets written to file it will also be encoded as > utf-8, ergo testing string lengths with `string-bytes' is going to > always produce the right results in the final file. I suggest to use filepos-to-bufferpos to find where to break text into lines.