From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "R. Diez" Newsgroups: gmane.emacs.help Subject: Text copied from *grep* buffer has NUL (0x00) characters Date: Sun, 9 May 2021 11:19:38 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15214"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sun May 09 11:20:37 2021 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lffc4-0003p6-4Q for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 11:20:36 +0200 Original-Received: from localhost ([::1]:55346 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lffc3-00015g-3J for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 09 May 2021 05:20:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52288) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lffbG-00014A-2A for help-gnu-emacs@gnu.org; Sun, 09 May 2021 05:19:46 -0400 Original-Received: from sonic302-20.consmr.mail.ir2.yahoo.com ([87.248.110.83]:39557) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lffbD-0007zk-Nh for help-gnu-emacs@gnu.org; Sun, 09 May 2021 05:19:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s2048; t=1620551979; bh=L9R6BGu+TXk8OI3iLpL/1OWbgAzQ/zQuZji+wlbNNv0=; h=From:Subject:To:Date:References:From:Subject:Reply-To; b=qk4tVYNbZrzMB9fXCSC1IsHWdOJkRfscFuwbbCD/4k5MbmAcTnph3kxGiLNY+UCHEqUO7S+feItbOxzGlUwi/GEfUH0CbT7FHLna9dkffseyoFsWdmX9oAtbOFSFGxyO/q1zGx+LPxg+JuYkWT6YlCNQmHQ6egetOTBpRLOU+MYZtY+7JrHXUbVhAbMYZhtj9bGSudB8IkoczE3sg1GoowMARnJGT/I3eazg6DJ654k0pTxxnxXvp14Vte6udyJcXZt9Nhw2br9ry5XW3e6RntXo0dBtoeZZIgFYKU7vdLBSBSPv+aC9qQupCQxgypPNtKwIHmwGcEvjwwP9uXHtOA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1620551979; bh=pUuLbHV7in16dsc/HpFBF32/7wfd7Vh3XQDLSZhsk8k=; h=X-Sonic-MF:From:Subject:To:Date:From:Subject; b=XIXhebW7YaxkcBYdF4kyLZ9lAlb4wzJbz3mCF1IPlD2pFJfbfIAd/UqiwlZgYiBnc/D+uvWSoqvh5diEKFn/IC1xjjFe0PAkQ1wvsvvfY6tGqKOVi7oZ3MCivDeOhGFurFTzMWYiEziTMGht2carKSmfAFVoEqJPKWlQqtUOJW7AmXoxfaZ3qRQ6DH1ZaAui/ImogCXO92fPr2iPA8qQFDxEV8HdzbIBcpJF+wpLF4Tux0Hq6LvAswbakZcBcxMvM5Qt9kkLzp6tl0LeAosNjl2Awi9bMiiZTIyDGmu/vJNJqBHz75aBW7HDVZvMsmhFkWtleRzAi1Dug6S8Dhf4Uw== X-YMail-OSG: q.62RMUVM1nc73JxZLKsqrwdeXv5VjKIMqgkgXDrGHAnlxF6MSEK8dGZF_DimRG Mq_RPW1BIhOTHJsffssGq1TwuJcgrqgt_b_ypAmaP8Pw9kfzpz6AfCTiz4LBVah_m17.VDQU8eHt uolQZVYKAILrPDHRzCNs.Fq5QJReIf6OYi3rLwaLrx5mtBOxI39pjrLvzMDyMBwQOpjdVrn4LTAk 4F7eJGyT9L_p9WUql3XkIwwhbKZmT9bwThRiokD1EeemjLaQM2IlFS5WyIrxGmfFDCgwpfNSQT.W jAN6dBLSczxMU_TvVxMRqWI.KC39Z3U38sOrBPhf4Rc4yjaG0u9BbUI0ER3BtA32G7hoG3Ad4xm0 jBl1dJ0wdlV_w_DOTM5t7uYMbWWnu14GxI71SxpNQbRxFYKrPLSoD33Y8UuQlE50RixceGd0bY5_ ZCFH.eiw2luY.aWJfjazdZ5c4BI8Md54kLu6EY34qXtmBbmdEGekZy.OU9YbnbcQPugbQY.vR6m4 GfLyUTGI01btYQMI5s1T4vzVqY0DZNWdPSp8ypn8cZSlFtfhm_uDy6Fhqx1DDpzdATfNzGs1ASso DrufGgAJC2KbNf_aaW4GU2pQax_vcwiMdSa_OFkrObqvT3T6IvvZiqVgogzd049WvddkvN4lPtFR Sz0Fx483l4QNztRyjfAme1YE249fZbRJ17OIYySXXuwCXEG5EHZgwGAZqXMhjvupcGfBexPaSbbP XCoBvXqPaU6Gs7PpjVRM8_I.AcrdMaly_CNK_gMJhnIsnoYOOAuW5FBaz1cwMwWX9KWIiZyN6Y.K NCQuTKhs9kE0If6wdYV9Uo0ONf1VOtQ_0IdzoN7sYG X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic302.consmr.mail.ir2.yahoo.com with HTTP; Sun, 9 May 2021 09:19:39 +0000 Original-Received: by kubenode520.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 724f7e67d89881e51984d356dae5a99c; Sun, 09 May 2021 09:19:39 +0000 (UTC) Content-Language: en-GB X-Mailer: WebService/1.1.18231 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Apache-HttpAsyncClient/4.1.4 (Java/16) Received-SPF: pass client-ip=87.248.110.83; envelope-from=ruben10post@yahoo.de; helo=sonic302-20.consmr.mail.ir2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:129608 Archived-At: Hi all: I have been using encoding utf-8-with-signature-dos for years with my main notes.txt file, because it is very portable. Even ancient versions of Windows Notepad honour the UTF-8 BOM correctly. Recently, my notes.txt became corrupt a few times. I started seeing ^M characters at the end of each line, and other text editors started complaining about invalid UTF-8 sequences inside. I thought my network connection was unreliable, or maybe my local disk, or Emacs had a bug. Restoring the notes.txt file wasn't easy, because it was not obvious what was wrong with it. I couldn't find a command-line tool that would easily replace any invalid UTF-8 sequences with their hex code equivalents, but I must admit that I did not actually invest much time looking. After all, I have automated backups. Yesterday, I remembered exactly what I had done last: I had copied text from the *grep* buffer after using 'rgrep'. After some investigation, it turns out Emacs' default "Grep Command" is "grep --color -nH --null -e ", which includes option "--null". This means that grep is embedding an ASCII NUL character (a binary 0x00) after the filenames. This is what an rgrep text search occurrence looks like in the *grep* buffer: ./some/file.txt:123:some text line The first ':' is actually a binary null, but the *grep* buffer hides this fact. If you copy that text line to an Emacs text file buffer, it then looks like this: ./some/file.txt^@123:some text line The ^@ is the representation for the binary null. With my preliminary testing, I could not reproduce the kind of text file "corruption" I had seen before, but other text editors started complaining again about an invalid UTF-8 sequence or the like. For example, the MATE Desktop text editor, Pluma, complained about an "incomplete multibyte sequence in input". Pluma refuses to open short files with embedded NUL characters because it cannot detect the character encoding, or because it claims that it looks like a binary file. Merge tool 'Meld' also complained about invalid characters. I would say that Emacs has 2 issues here: 1) If a text file encoding is utf-8-with-signature-dos, I do not think that it is a good idea for Emacs to allow binary zeros without any warning. A character sequence like ^@ is easy to miss in the middle of long text lines, as it is not coloured in red and does not have any other visible hint. A 0x00 may well be a valid UTF-8 character, but it is probably going to cause problems in many places. This kind of problem is not new, see also "modified UTF-8". I think that I have seen warnings from Emacs before about characters that could not be encoded in the current buffer encoding. I would welcome such a warning for binary zeros. 2) Copying text from a *grep* buffer that looks like ":" should not suddenly deliver a NUL character instead. That's just unexpected and prone to problems down the line. Regards, rdiez