From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Re: string-bytes and coding systems Date: Thu, 09 Mar 2017 09:35:24 -0800 Message-ID: <87tw72mjz7.fsf@ericabrahamsen.net> References: <87r327nyto.fsf@ericabrahamsen.net> <83tw72fnha.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1489080994 6785 195.159.176.226 (9 Mar 2017 17:36:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 9 Mar 2017 17:36:34 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 09 18:36:27 2017 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm1z9-0000PN-U9 for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 18:36:20 +0100 Original-Received: from localhost ([::1]:35508 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm1zF-0005HS-Pq for geh-help-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 12:36:25 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54360) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm1yi-0005HJ-Vz for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 12:35:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cm1ye-0005dU-Lh for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 12:35:52 -0500 Original-Received: from [195.159.176.226] (port=57234 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cm1ye-0005cX-E2 for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 12:35:48 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1cm1yN-0004MB-E3 for help-gnu-emacs@gnu.org; Thu, 09 Mar 2017 18:35:31 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 30 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:yZPPa1ZlIwFNtl72+OajdABezUA= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:112492 Archived-At: Eli Zaretskii writes: >> From: Eric Abrahamsen >> Date: Wed, 08 Mar 2017 15:17:07 -0800 >> >> I'm essentially taking the `string-bytes' of each line, and if it's too >> long, popping characters off the end until it's fewer than 75 bytes. >> >> My understanding/assumption is that `string-bytes' returns the number of >> bytes according to Emacs' internal coding system > > Yes. > >> which is close enough to utf-8 to make no difference. > > No. The deviations from UTF-8 could be significant in some cases, > with some exotic characters and with raw bytes. Good to know. >> When this text gets written to file it will also be encoded as >> utf-8, ergo testing string lengths with `string-bytes' is going to >> always produce the right results in the final file. > > I suggest to use filepos-to-bufferpos to find where to break text into > lines. I'll look into that. Thank you! Eric