From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM Date: Sun, 03 Jul 2022 16:26:54 +0300 Message-ID: <83k08u9vj5.fsf@gnu.org> References: <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d@yahoo.de> <46c6dd22-ecff-aa7d-e019-1784060574c2@yahoo.de> <83zgx265bm.fsf@gnu.org> <83sg2u5zz5.fsf@gnu.org> <87fsyur1rx.fsf@gnus.org> <87fsyu4jof.fsf@igel.home> <83lf8m5x2c.fsf@gnu.org> <35838176-9518-6c4a-eb71-25ce7cb0ec4e@yahoo.de> <83k0o65vf9.fsf@gnu.org> <87bl9i4g7m.fsf@igel.home> <83cztx5vey.fsf@gnu.org> <83tun82h9k.fsf@gnu.org> <87y1xbxzio.fsf@gnus.org> <83k08vbhe4.fsf@gnu.org> <87pmimbgiz.fsf@gnus.org> <87letabdrk.fsf@gnus.org> <83pmim9wqo.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8635"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org To: larsi@gnus.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jul 03 15:28:10 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o7zdy-00023s-Nm for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Jul 2022 15:28:10 +0200 Original-Received: from localhost ([::1]:54522 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o7zdx-0004Z3-E5 for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 03 Jul 2022 09:28:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46814) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7zdq-0004X4-HR for bug-gnu-emacs@gnu.org; Sun, 03 Jul 2022 09:28:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:49722) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o7zdq-0000tD-8f for bug-gnu-emacs@gnu.org; Sun, 03 Jul 2022 09:28:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1o7zdq-0007Ya-5G for bug-gnu-emacs@gnu.org; Sun, 03 Jul 2022 09:28:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 03 Jul 2022 13:28:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48324 X-GNU-PR-Package: emacs Original-Received: via spool by 48324-submit@debbugs.gnu.org id=B48324.165685483528991 (code B ref 48324); Sun, 03 Jul 2022 13:28:02 +0000 Original-Received: (at 48324) by debbugs.gnu.org; 3 Jul 2022 13:27:15 +0000 Original-Received: from localhost ([127.0.0.1]:43619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7zd5-0007XX-CJ for submit@debbugs.gnu.org; Sun, 03 Jul 2022 09:27:15 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:33538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7zd3-0007XJ-8S for 48324@debbugs.gnu.org; Sun, 03 Jul 2022 09:27:13 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37730) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7zcx-0000pU-Nm; Sun, 03 Jul 2022 09:27:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=BUubQiBni3GCRqa0OKQJrMP8hQ5CMd+LA2pr49nhEyo=; b=aNJ+ceioBrwtdcA4Wywg GTaaPPLbKngc3zshSULVI43lX2f2V/Nxy0XdXmg2f6vGtez0Knm2+ZP6vfpzG4uMTiWpCU7PjTTlZ XuyMTUVW+aFhWhJ31M0kHoli4Ah2A0zQxlcJvH0K63vCWC3a7adXxWZFCmFjLqJr0gTkzDIQEuphw +XNa12axQFpJ9N2SwzX9At655Q9tnpnFYUfR83SQqDtpATgKzU3nQo+jLL98Cd9XgDWx5jlsS32uU Hxr664hXpiyGZqTgqguVz7sKtMbWiyXfwUt3DwHGdIkkayynu6VttsIEAWMmsn39644lpOmy+yqg5 2dOBP48WHWR5gQ==; Original-Received: from [87.69.77.57] (port=4516 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7zcx-0000DD-4X; Sun, 03 Jul 2022 09:27:07 -0400 In-Reply-To: <83pmim9wqo.fsf@gnu.org> (message from Eli Zaretskii on Sun, 03 Jul 2022 16:00:47 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:235997 Archived-At: > Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org > Date: Sun, 03 Jul 2022 16:00:47 +0300 > From: Eli Zaretskii > > > From: Lars Ingebrigtsen > > Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org > > Date: Sun, 03 Jul 2022 14:07:43 +0200 > > > > Lars Ingebrigtsen writes: > > > > > Hm... I guess the only reliable solution across all coding systems is > > > (like your comment in the code says) to drop the encode-every-char and > > > try encoding strings, and then see whether the result is short enough. > > > That could be done somewhat efficiently using a binary search. I'll > > > have a go at it... > > > > And while I was at it, I changed it to return complete glyphs, not just > > complete code points. > > > > There's a behavioural change, though. This: > > > > (string-limit "foóá" 6 t 'utf-16) > > > > Now returns a string with a BOM, whereas previously it didn't. > > So you get 6 characters + the BOM? I see that it's actually 6 bytes _including_ the BOM. So I think this is confusing: if we are going to return a string with the BOM, we should not count the BOM as part of the LENGTH bytes. Because if I requested to get characters which fit into N bytes, I should get those N bytes of payload. Or maybe we should have an optional argument to control whether LENGTH includes or excludes the BOM. In any case, we should mention this aspect in the doc string, I think.