From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Coding warning attributes to wrong char Date: Sat, 17 Jun 2023 09:30:51 +0300 Message-ID: <834jn6siqs.fsf@gnu.org> References: <87bkhebtvp.fsf@ypei.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35348"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Yuchen Pei Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jun 17 08:31:35 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qAPTD-00092p-2T for ged-emacs-devel@m.gmane-mx.org; Sat, 17 Jun 2023 08:31:35 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qAPSY-0003ax-OQ; Sat, 17 Jun 2023 02:30:55 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qAPSX-0003ah-IX for emacs-devel@gnu.org; Sat, 17 Jun 2023 02:30:53 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qAPSX-00079m-5V; Sat, 17 Jun 2023 02:30:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=poGX0rxRi8D1osDRxEtM2wJk0w3vxEI3zPV6Gf4Fdj8=; b=ik0uLxgds32txcvPqELw tEFxXK87sjqncAVh6DwMh0I2SeiQdAxMWV0ZfiayKQFNnCmbvxl4VlZtvaTIHi6pYzYtPYFuvj+R2 FgmjhWr2wn9yDF5RkH/XhCyvSezujaDhbPfDTyFpjsrjyQF1aLjmmxRe8C8Y53EX9HosfWsDfESpt 3kPkaISfDG/u7DunCfUmb9Llb9orRJI+nr6uzh43stjqgLH+Alf5e8P5zoVszj2Dm7jheXTVCTMqc RZ6vJF4syoJhnW5N/2U4wlAf2uxh/GlupUwK5pgR54X6VRCG7n7t2OKx0gqLmGR7XpPf5cVhBQeA5 unqaQugp80tqFQ==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qAPSW-0000LO-B8; Sat, 17 Jun 2023 02:30:52 -0400 In-Reply-To: <87bkhebtvp.fsf@ypei.org> (message from Yuchen Pei on Sat, 17 Jun 2023 14:22:18 +1000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:306855 Archived-At: > From: Yuchen Pei > Date: Sat, 17 Jun 2023 14:22:18 +1000 > > These default coding systems were tried to encode the following > problematic characters in the buffer ‘encoding.txt’: > Coding System Pos Codepoint Char > utf-8-unix 23 #x3FFFE2 \342 > 24 #x3FFF80 \200 > 25 #x3FFF99 \231 > > However, each of them encountered characters it couldn’t encode: > utf-8-unix cannot encode these: \342 \200 \231 > > Click on a character (or switch to this window by ‘C-x o’ > and select the characters by RET) to jump to the place it appears, > where ‘C-u C-x =’ will give information about it. > > Select one of the safe coding systems listed below, > or cancel the writing with C-g and edit the buffer > to remove or modify the problematic characters, > or specify any other coding system (and risk losing > the problematic characters). > > raw-text no-conversion > --8<---------------cut here---------------end--------------->8--- > > Despite the warning, the correct fix is to remove the nul character. > > This can be quite misleading, especially when one wants to fix encoding > issues in big text files. What is your proposal for better dealing with this situation? The basic problem here is that Emacs cannot know whether the null characters are or aren't supposed to be in the file. You as the user do know, presumably because you know where this file came from or what is its purpose. But Emacs doesn't know. It also cannot easily know that removing the null character would solve all the other problems, since it examines each such character individually.