From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#60750: 29.0.60; encode-coding-char fails for utf-8-auto coding system Date: Thu, 12 Jan 2023 14:32:52 +0200 Message-ID: <83fscgaq6j.fsf@gnu.org> References: <87zgaof7cg.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35748"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 60750@debbugs.gnu.org To: Robert Pluim Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 12 13:34:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pFwmn-00095u-Fi for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 12 Jan 2023 13:34:25 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pFwmc-0001Ew-SZ; Thu, 12 Jan 2023 07:34:14 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFwmQ-0001E7-Ck for bug-gnu-emacs@gnu.org; Thu, 12 Jan 2023 07:34:07 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pFwmQ-0001gK-2E for bug-gnu-emacs@gnu.org; Thu, 12 Jan 2023 07:34:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pFwmP-00088N-Ii for bug-gnu-emacs@gnu.org; Thu, 12 Jan 2023 07:34:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 12 Jan 2023 12:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60750 X-GNU-PR-Package: emacs Original-Received: via spool by 60750-submit@debbugs.gnu.org id=B60750.167352678431185 (code B ref 60750); Thu, 12 Jan 2023 12:34:01 +0000 Original-Received: (at 60750) by debbugs.gnu.org; 12 Jan 2023 12:33:04 +0000 Original-Received: from localhost ([127.0.0.1]:44935 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pFwlU-00086v-1m for submit@debbugs.gnu.org; Thu, 12 Jan 2023 07:33:04 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:52674) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pFwlR-00086R-LO for 60750@debbugs.gnu.org; Thu, 12 Jan 2023 07:33:03 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFwlL-0001YO-DA; Thu, 12 Jan 2023 07:32:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=sF0TtGvoCyDao4RNT+3MTZ9hs9PTeoTzz109MT7+rPY=; b=ECYmunldZ24X7aBLv84+ 7Nh2qHVQFo9A4tyfGGPRveR4hP94yFbKY+N3zo/vVroWELciXEyMbhAzZDBPBslw2J4fVo+dcNHmz rbH1kpEzoOwnFTwyKv0WUyZHcoQ6Retz1efN16KgxTLkykIJ+bALr92lvXIMU6fZWYMOLxD/vWJjq MnnDPOWiriVV68/H7JtTZueufg3n4GRfMjAtNgKNtxCS9CyD5hAGvd8ELHqdA1mcN226VQGWaOiip vE979MB+Rx9QCtRxbrZDO4ny3PI6c2VtsnJfJN7nuUN0TIciqKv3W1VnAwjnPbp/WyX1k+LWvPTvI O13BTQt8flrFfA==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFwlK-00011L-TM; Thu, 12 Jan 2023 07:32:55 -0500 In-Reply-To: <87zgaof7cg.fsf@gmail.com> (message from Robert Pluim on Thu, 12 Jan 2023 10:08:31 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:253223 Archived-At: > From: Robert Pluim > Date: Thu, 12 Jan 2023 10:08:31 +0100 > > > src/emacs -Q > M-x toggle-debug-on-error > M-: (setq buffer-file-coding-system 'utf-8-auto) > C-b > C-u C-x = > > => > Debugger entered--Lisp error: (args-out-of-range "))" 3 1) > encode-coding-char(41 utf-8-auto ascii) > describe-char(189) > what-cursor-position((4)) > > This is because utf-8-auto has a non-nil :bom property: > > (define-coding-system 'utf-8-auto > "UTF-8 (auto-detect signature (BOM))" > :coding-type 'utf-8 > :mnemonic ?U > :charset-list '(unicode) > :bom '(utf-8-with-signature . utf-8)) Right. This is a very old bug in encoding with utf-8 family of encoding which has a :bom property that is a cons cell. The fix is simple, but I wonder what will this break out there. So: > Iʼm not sure if this needs fixing, but it was surprising, and the > docstring of `define-coding-system' didnʼt make it clear to me whether > a BOM should have been produced here or not. Actually, the doc string is clear: If the value is a cons cell, on decoding, check the first two bytes. If they are 0xFE 0xFF, use the car part coding system of the value. If they are 0xFF 0xFE, use the cdr part coding system of the value. Otherwise, treat them as bytes for a normal character. On encoding, produce BOM bytes according to the value of ‘:endian’. Note the last sentence: it should unconditionally produce the BOM on encoding. Which is what we do in your scenario. > (Iʼm willing to be told that buffer-file-coding-system shouldnʼt be > 'utf-8-auto, but I never set that explicitly as far as I know 😀) Who does set utf-8-auto? where did you originally bump into this? This is an obscure coding-system, and the fix to make it work as documented will produce an incompatible change in behavior. So before I decide whether to make the change and on what branch, I'd like to know how in the world did you encounter this. Thanks.