From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM Date: Mon, 04 Jul 2022 14:31:01 +0300 Message-ID: <831qv19ksq.fsf@gnu.org> References: <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d@yahoo.de> <46c6dd22-ecff-aa7d-e019-1784060574c2@yahoo.de> <83zgx265bm.fsf@gnu.org> <83sg2u5zz5.fsf@gnu.org> <87fsyur1rx.fsf@gnus.org> <87fsyu4jof.fsf@igel.home> <83lf8m5x2c.fsf@gnu.org> <35838176-9518-6c4a-eb71-25ce7cb0ec4e@yahoo.de> <83k0o65vf9.fsf@gnu.org> <87bl9i4g7m.fsf@igel.home> <83cztx5vey.fsf@gnu.org> <83tun82h9k.fsf@gnu.org> <87y1xbxzio.fsf@gnus.org> <83k08vbhe4.fsf@gnu.org> <87pmimbgiz.fsf@gnus.org> <87letabdrk.fsf@gnus.org> <83pmim9wqo.fsf@gnu.org> <83k08u9vj5.fsf@gnu.org> <87wnct88ui.fsf@gnus.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1966"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jul 04 13:37:49 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o8KOi-0000JW-Rp for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 04 Jul 2022 13:37:48 +0200 Original-Received: from localhost ([::1]:58916 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o8KOh-0004tu-HR for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 04 Jul 2022 07:37:47 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39460) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8KJD-0005nv-Gy for bug-gnu-emacs@gnu.org; Mon, 04 Jul 2022 07:32:08 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52436) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o8KJ7-0008K3-Sh for bug-gnu-emacs@gnu.org; Mon, 04 Jul 2022 07:32:05 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1o8KJ7-00008s-Oh for bug-gnu-emacs@gnu.org; Mon, 04 Jul 2022 07:32:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 04 Jul 2022 11:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 48324 X-GNU-PR-Package: emacs Original-Received: via spool by 48324-submit@debbugs.gnu.org id=B48324.165693429831959 (code B ref 48324); Mon, 04 Jul 2022 11:32:01 +0000 Original-Received: (at 48324) by debbugs.gnu.org; 4 Jul 2022 11:31:38 +0000 Original-Received: from localhost ([127.0.0.1]:46333 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8KIj-0008Id-1d for submit@debbugs.gnu.org; Mon, 04 Jul 2022 07:31:38 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:54390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8KIT-000809-9d for 48324@debbugs.gnu.org; Mon, 04 Jul 2022 07:31:35 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:51218) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8KIM-0008FK-Ir; Mon, 04 Jul 2022 07:31:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=ugd3Wc9gk1tlKWLkEp5zJVe3c0XkVUIuJV9795U/OVE=; b=mHxiqdbBlM9h ZcZUfUaV2KH2nB76hGMd5ine6bL2v4zCsaNvpYjudb9YzYwDNuecELCqPKbJEinF8BP9wbUnazL38 dvWupoz/wjdDGgVXGh5dph7StMqzT1pNsS0JjuMTk4e/lEQGWlPnyb5+4VnHIQlARm1yenyLz2Bv/ 8y/THuC2b2JZ+37y9h58V9rgd15X5mjHn/RMkuBueIQ3W9dU9JfWoMccIo/aydkgUASdZJLhkPGd+ AE7Toj6IbzMz7k5qPsdZ7uHysnF2kjPbJvFdIMJuG3hF3YXmfAt0fvc2gJzz6wmvrrdjs5x0JgJsm GZbGBZg4HLhrkhnqEEwXzQ==; Original-Received: from [87.69.77.57] (port=2208 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8KIK-0005SL-40; Mon, 04 Jul 2022 07:31:12 -0400 In-Reply-To: <87wnct88ui.fsf@gnus.org> (message from Lars Ingebrigtsen on Mon, 04 Jul 2022 12:34:29 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:236054 Archived-At: > From: Lars Ingebrigtsen > Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org > Date: Mon, 04 Jul 2022 12:34:29 +0200 > > Eli Zaretskii writes: > > > I see that it's actually 6 bytes _including_ the BOM. So I think this > > is confusing: if we are going to return a string with the BOM, we > > should not count the BOM as part of the LENGTH bytes. Because if I > > requested to get characters which fit into N bytes, I should get those > > N bytes of payload. Or maybe we should have an optional argument to > > control whether LENGTH includes or excludes the BOM. > > It the caller has asked for a max number of bytes in a coding system > that includes a BOM, then the BOM has to be counted -- otherwise the > bytes won't fit into whatever field the protocol they're using limits > the string to. You obviously have a very specific use case in mind. But there are others. Moreover, UTF and BOM is a special case, where the prefix is known in advance. Other encodings, notably from the ISO-2022 family, are harder because the exact shift-ion sequence is not always easy to guess. Which is why I thought a way to control this aspect could be needed. But we could just document the subtlety and wait for someone to come up with a practical scenario where it would be needed. > (And we don't have a -without-signature variant, do we?) We do: utf-16le and utf-16be. > > In any case, we should mention this aspect in the doc string, I think. > > Yes. But should we have -without-signature variants for utf-16? Then > the doc string could recommend using that if the caller wants BOM-less > bytes. See above.