From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "R. Diez" Newsgroups: gmane.emacs.help Subject: Re: Text copied from *grep* buffer has NUL (0x00) characters Date: Sun, 9 May 2021 20:47:28 +0200 Message-ID: <3e892a2e-1d04-7712-d129-e4f59382457b@yahoo.de> References: <83bl9k8buk.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1607"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii , monnier@iro.umontreal.ca Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 09 20:48:38 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lfoTl-0000K5-IH for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 20:48:37 +0200 Original-Received: from localhost ([::1]:47828 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lfoTk-0005Rg-Jt for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 14:48:36 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47372) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lfoSn-000589-Om for help-gnu-emacs@gnu.org; Sun, 09 May 2021 14:47:37 -0400 Original-Received: from sonic304-22.consmr.mail.ir2.yahoo.com ([77.238.179.147]:46636) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lfoSi-0000kW-Rr for help-gnu-emacs@gnu.org; Sun, 09 May 2021 14:47:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s2048; t=1620586050; bh=LJQhr4g0cEeJF5bDxV3RtIOMaMTnUZPc+iiRHO4/cnw=; h=Subject:To:References:From:Cc:Date:In-Reply-To:From:Subject:Reply-To; b=Ziby54T0Gz+15n3t7IcgovBK6PFenneeroUuvJB9Uq9Z4JeGyeKiPpRE27I8qJMiydds+Hx9qRorp2WuoV7HGKUFRSYWEujhaZZQ7gnnvzaRHieV1EVavRUPlZO8A5NRJlNHSYsyTePwEuat5U32+5DNW0zOkdcCOkEpEhFjcDhQodQXSG+N4nMXEofLGmgSXCGIk8U1tDTXqIYOdzRV+qDhwNM+0HGSCmxUxsRnOB0J5WYdhU98bDlPxNF5AfAhOVBr04lnkFtGQgxTBIb4A5AhK70gzpJ8cSwO5RvGxodAHMCf/vM33EP5UGja2DwXyK6oKp4asEXhGvl/ggi+ZQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1620586050; bh=3iwM6wDkU+hXzKjd2t3MMAwDtXqjWUiM6LwfW9k2ski=; h=X-Sonic-MF:Subject:To:From:Date:From:Subject; b=Kc/Zsq4EqFDnn54kDdZ0q7ggpbyozfMgQQo1mQT/zV/aQ4NMqIOBPo4avJdhlZ/UA1pxQvvkiG0S+vIZwvqo4Z3LKykcXPcIoYupaUYCzXwpgi87nyyTIsLRIUYkblKYln0kfDVhMjJIlnsOWyjbcXlTdOBCnJURAzP8qOCvTxNHkpA74hcT5/CrkF0HNQ4ThMoIBWDKYMsB4kBSb9AV/XZ5lV0ivd33HMeXOq1wcDGTkBfG/iwtLnS6cC9D3DKgF7Z0Bq1O9944FQx2yoWwd/mivL9H9hosXmX03j7nsrqEezy/b7iICCZMEkj2sFqMi7GlUyjPMQMfMoHVtC6FUA== X-YMail-OSG: vDrH7ZAVM1l0bUpYBva0Ov.ppKOolMHH36j8ZM0DjRTHTfLapwIaCxfQqmV261l kQzqyXU1YqJz5H1MdFas.mLGaZxWrB.fpJJyQvmvd7GACN_t2g_uGnq5O_HmIPVAksPra1PJfdPz bh7xuT.BFg24DpoSgIS4cwCmokY8zNb2f.9Ddj7jsVI8DCEnDVThQNTKXuNQ09kbqZP7Z_nnzv_T XOUdXpRrAHPJw4.mYtcrwSNPkeIsoup4y8XlfgRH7HUBX3FPUqY_q_BuAzkjWjKCMskQkS2J1vOu xRb.OzT7j1gcHzpCis5nVla_h1Ch6KghGShAqFLxq8_qKdzGyyb1oS2uCesu3POMDOFkGHICOm.1 PBlnYQq8vOuWuVK6xSOUiE1DtAdemoBNG6hTGN.GYjYwVZb.X4USdI0xtlr1TC7us12vxCv_w0uX 8rpErNebsnB3heHP1CnrMNkhRD2MhSVh_VlPycYZBDxM9e.LAt2Yk09aXqeEZ0XceWYuigV8Id1c DkcwZ6vI8m3fuAYYDJFg.UxtCLntuwb4TKkXYBkHBVWORDye0aKRzSLcz15vG6hqVvt23jnnsFWS UMamgOtoDtzR7uaQPxp_Nti9ixiuLlim2yY3YL.ceyFZeMK.W9zl2i83_e3PYHHhBaTqJtfPV2_4 cFWnhoJOE3751N.TwNg9R0Rtky6uDZigPcFq8LGKaKRaDWJfHQacBlViRWKBdxlR936CXVf74UhX enQIvtgukGIyLi_6EKG9GKkGEFSSDhI9SiyF4m9H7XTcMVztnUsLhlfw2opCeZB85oYtYnwFC6.D EhY0p_mlepRKvWbX9f1e9ilSMO_Gge8G1P3BpXbqcd X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.ir2.yahoo.com with HTTP; Sun, 9 May 2021 18:47:30 +0000 Original-Received: by kubenode514.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 8a25b26ddd1b1704395dabdb0f614340; Sun, 09 May 2021 18:47:29 +0000 (UTC) In-Reply-To: <83bl9k8buk.fsf@gnu.org> Content-Language: en-GB X-Mailer: WebService/1.1.18231 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Apache-HttpAsyncClient/4.1.4 (Java/16) Received-SPF: pass client-ip=77.238.179.147; envelope-from=ruben10post@yahoo.de; helo=sonic304-22.consmr.mail.ir2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:129636 Archived-At: > There's nothing wrong with null bytes in a UTF-8 encoded file, not in > general. Well, that's true by the book. I already mentioned Meld and Pluma. Xfce's text editor, Mousepad, refuses= too to open UTF-8 files with BOM if they contain a NUL character. gedit at least used to have the same problem: https://superuser.com/questions/246014/use-gedit-to-open-file-with-null-c= haracters Geany truncates the file at the first NUL. So it is a problem in practice. But we could of course insist on everyone switching to a proper text edit= or when they try to open our UTF-8 files with embedded NULs. That will su= rely=20 make us even more popular... ]8-) > We could have an optional warning about null bytes (when? > when you save the buffer?). But I see no reason to do that by > default, especially since such a feature would require a costly search > of the entire buffer. Some terminal emulators warn when pasting suspicious text. Emacs is already checking all bytes on save. I inserted an invalid sequen= ce and got this warning on save: -------------8<-------------8<------------- These default coding systems were tried to encode text in the buffer =E2=80=98Test3.txt=E2=80=99: (utf-8-with-signature-dos (11 . 4194176) (12 . 4194239)) However, each of them encountered characters it couldn=E2=80=99t encode: utf-8-with-signature-dos cannot encode these: \200 \277 Click on a character (or switch to this window by =E2=80=98C-x o=E2=80=99= and select the characters by RET) to jump to the place it appears, where =E2=80=98M-x universal-argument C-x =3D=E2=80=99 will give informat= ion about it. Select one of the safe coding systems listed below, or cancel the writing with C-g and edit the buffer to remove or modify the problematic characters, or specify any other coding system (and risk losing the problematic characters). raw-text no-conversion -------------8<-------------8<------------- Therefore, I don't think it would cost too much to check for NULs at the = same time, and give users the choice. > This is easy to fix: customize the Grep command to not include > "--null". That switch is mainly for systems that allow newlines in > file names, which MS-Windows doesn't allow, so if this switch causes > trouble in your usage, simply remove it. I am using Linux. Of course, now that I know what the issue is, I can jus= t remove --null from the grep command and be done with it. That would qui= etly=20 fix the problem for me. The reason I wrote a long e-mail is to illustrate my head scratching when= I got hit several times, because it is not obvious where the problem is = coming from. I'll post again if I manage to reproduce a more serious variant of this i= ssue where the file started to show Chinese characters in other editors, = while Emacs decided to start showing ^M at the end of the lines. My guess= is that it was a similar gotcha, because I have been copying from the *g= rep*=20 buffer a few times in the last days. I believe that this NUL gotcha is going to hit many people, who will then= think "this is just another Emacs quirk". After all, the grep --null is = a=20 relatively recent change in Emacs 26.1 . And many log files have embedded= NUL characters too, so you may inadvertently copy NUL characters along. > For the detection of NULs in UTF-8 files, you could also ask for such > a feature via `M-x report-emacs-bug` but it should be pretty easy to g= et > something comparable with something like: > [...] I don't think it is desirable for users to install such Lisp hooks to dea= l with such corner cases. My opinion is that Emacs should be more helpful= =20 here by default. But maybe this mailing list post is enough, if users fac= ing such "corruption" or character encoding problems manage to enter the = right search terms. > This "what you see in NOT what you get" is indeed undesirable. I'm no= t > sure it's easy to fix in a reliable way in Emacs (beside not using > `--null` as Eli points out), but I suggest you `M-x report-emacs-bug`.= > Maybe grep-mode can add a `filter-buffer-substring-function` that > converts those NUL into `:`. That seems fair. I'll report that as a bug. Regards, rdiez