From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "R. Diez" Newsgroups: gmane.emacs.help Subject: Re: Text copied from *grep* buffer has NUL (0x00) characters Date: Sun, 9 May 2021 23:13:36 +0200 Message-ID: <634e880a-b43b-8c59-eb4c-b0c07813bb12@yahoo.de> References: <83bl9k8buk.fsf@gnu.org> <3e892a2e-1d04-7712-d129-e4f59382457b@yahoo.de> <83a6p37n15.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22443"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii , Stefan Monnier Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 09 23:14:08 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lfqkZ-0005ij-LJ for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 23:14:07 +0200 Original-Received: from localhost ([::1]:51526 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lfqkX-0002uS-Pa for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 17:14:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40872) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfqkE-0002uK-5D for help-gnu-emacs@gnu.org; Sun, 09 May 2021 17:13:46 -0400 Original-Received: from sonic303-21.consmr.mail.ir2.yahoo.com ([77.238.178.202]:40785) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lfqkB-0000Eb-TC for help-gnu-emacs@gnu.org; Sun, 09 May 2021 17:13:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s2048; t=1620594820; bh=HTiy2nXPYtuRq1R21Z9VTz+diVQ9qc4DGXfk16JbK2c=; h=Subject:To:References:From:Cc:Date:In-Reply-To:From:Subject:Reply-To; b=aL/5XqSy02kjb5LQ3NbpipA2+a1XHw2AoALbrSm1TwNXNZQJGFBuqgcy/IZmCQhJlt72QDIlXX1gXDkvDhGpY0cDIn99ybb9eS/8AWMSbmmmqdD4kuEKWO4CRYqWSpUyvVBvIzjdDUJSelIBtmjmbLvITRhWcyVVsi/xnZvv9HsBaJFbcnKniIi7Evwg7n9o+mntsQXQR0WZCEYgnr9d7s2Q/RhgZEN0geK7eYahQQKVF2pZnsS97//jOvvyhUdt8JKzesIafPcq4/azoOwOSgwE5ALMRIWxq4dji9JidKW9zvz+pe8tKk+KxYs0EcnKBIxJCsx2OIHKuQpE3VOL0w== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1620594820; bh=CCQRuV047LQdL8VG40IIlzsEFmmupcmrO8XV8piK8SP=; h=X-Sonic-MF:Subject:To:From:Date:From:Subject; b=kBWDrEDq7gAsaKuumGCZYb2+AMLDNTCVP6V41a7IPWDG35lx99/0E6QT0640l3FquglNFbChcK61EPs+TDxG3lLqj6Re+ogBxEDkPakZRtGDdoNB43kV9Y95Ez5CCjz3y1qK535PTWIU7m+MOVi6lKAJGZ/sNYB913Y4SVjeEsUwdDmaYJjb5ILOhCLvA3PTbPrjD8toy2D/EkalGVSlDiAqt8yn/eKyb3pUUFJeuQJBHkD+ja6FDi70xcwbPiTwm6weEc98V00XrB3TcXffl4/w+0NGUxD3gr5QIlhlLMqTCZ76NVEl22su4Vdlg9VV70LAQX+VMJW1oy7T1tEVuQ== X-YMail-OSG: o887NAMVM1kyG0baAlKOkOs9UtwM6ZfE9WRxqIY2dw7CbspiWmEW01YSVnYCRUp VQaPm4Idr0cbVd9cfVpuqYQPwNn9irMnvDljUrH9ON20r37WG53J0fFbKFMkoOF46cGPkmb86kU. whsqLsF7_3ND_ISMPBHxoIj1Zy7rVPiZZMRrXzfOz17o2HaHxlVDI0.6jx9vNXWdcAtQi7WKuANN sI2VWEtOU_lrswY.yAoFNn2opRjrIcDKKBf8h_BUM1d0RvUiTfuz6H4XqWEMlxYslgcavrTvHLtg X_b8PJuLwgJBLQa4BmKVEusuU9Phr4dpGbDKnD06C1Lyx_tqwbV3zKIFHMCAJVswun85km0kTwJg jmVXnFGp3.IEcqzOWipGXJqtTJsGHx4C1JlhbL0suppsCe7Pt76skRF5C5SRrax5P8gC2BJfM6lj ch2yK8R7U1pmfKl7IruIkZCMlRXJQBXPwT.3KlKNy0VO9KBao8I825A9MtWgTzlSw8iygVNRH1GS yNenS9kbUH7w_4bdkQlcqUmoNaHBrNonwQK7WqqBEe5otdmzFksGBnkiPA4uCEg5hO810CQkpDYh D3vx9FJnQuD266gVvgFd_UuiFn1oPINQbp1ECHh.8D7vcehuTL4Krj0lMplhwJKq8xWK7iJIJi89 lCAjnoGZ4VWwjes.nVlqLigRN0tEv8XoMYteSHaBYN0K5PXtlF5lvkVQ9D4SoedFqSvA2nl6pmn_ yzEaKzPmZuDVAdJlYSz31mkwpEpVjAleHvY1pl1r.nFHCc8lXb.IypbfvDu44a22lBs6QPfVUd2w JuF98aOjQ0IEJqXKOQ3rA3sMPQxZzPEeW14aXHhkqA X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.ir2.yahoo.com with HTTP; Sun, 9 May 2021 21:13:40 +0000 Original-Received: by kubenode512.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID e5ed51db0baa01cbe8cc3da2bfdbdd2e; Sun, 09 May 2021 21:13:37 +0000 (UTC) In-Reply-To: <83a6p37n15.fsf@gnu.org> Content-Language: en-GB X-Mailer: WebService/1.1.18231 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Apache-HttpAsyncClient/4.1.4 (Java/16) Received-SPF: pass client-ip=77.238.178.202; envelope-from=ruben10post@yahoo.de; helo=sonic303-21.consmr.mail.ir2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:129641 Archived-At: EZ> That's not the same. the warning you saw is triggered by a failure to EZ> convert to the external encoding, so it consumes no extra CPU cycles. But it could be, from my (admittedly naive) point of view: (convert-to-external-encoding but-with-some-extra-flag-to-warn-about-NUL-chars) EZ> Null bytes will not fail anything, so you should test for them EZ> explicitly (and in some encodings, like UTF-16, they are necessary and EZ> cannot be avoided). I didn't know that about UTF-16, but I could not find any information about it either. Why is a NUL char necessary in UTF-16 and not UTF-8? Or do you mean that UTF-16 tends to have many interleaved zero bytes? In this case, I would have thought that the problem would be the 16-bit NUL character, I mean 0x0000. That is the character to watch out for in UTF-16. Encodings like UTF-16, that always need more than one byte pro character, are uncommon, won't work with many text editors or tools like 'grep', and most people will expect problems with them anyway. So I wouldn't worry too much about them. The NUL char issue (the unexpected problems I talked about), that you are likely to run into sooner or later, will probably only affect the popular, single-byte-oriented formats like ASCII, ISO/IEC 8859-1 and UTF-8. SM> I do think there's a real plain bug here, tho, if you change your SM> "recipe" to `uft-8` instead `utf-8-with-signature`: take a utf-8 text SM> file (in a UTF-8 locale), add a NUL byte to it, save, close, and SM> re-open: you now get a unibyte buffer showing the bytes rather than SM> the chars. SM> SM> Emacs should generally try and warn you when saving a file with a coding SM> system different than the one it would guess when later re-opening the file. SM> The problem doesn't show up with `utf-8-with-signature` because SM> apparently the BOM is given more weight than the NUL byte in determining SM> which coding system to use. Thanks for pointing that out. That is why I think that NUL may be a valid character, perfectly fine in theory, but it even easily trips up Emacs itself. This is why I would make Emacs smarter and warn about it, either on paste, or on save. There may be one more quirk in this area, because my text file had somehow lost the UTF-8 BOM too, and I only edit it with Emacs. I cannot invest more time into this issue at the moment. I hope these posts provide enough information if somebody is interested in the future. Regards, rdiez