bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters
       [not found] <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d.ref@yahoo.de>
@ 2021-05-09 19:14 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-09 21:38   ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-11 12:53   ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
  0 siblings, 2 replies; 28+ messages in thread
From: R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-09 19:14 UTC (permalink / raw)
  To: 48321

Emacs' default "Grep Command" is "grep --color -nH --null -e ", which
includes option "--null". This means that grep is embedding an ASCII NUL
character (a binary 0x00) after the filenames.

This is what an rgrep text search occurrence looks like in the *grep* buffer:

./some/file.txt:123:some text line

The first ':' is actually a binary null, but the *grep* buffer hides this fact.

If you copy that text line to an Emacs text file buffer, it then looks like
this:

./some/file.txt^@123:some text line

The ^@ is the representation for the binary null, but that is easy to
miss in long text lines.

A text file with an embedded NUL character causes problems
everywhere. There are errors or warnings with Meld, Pluma, Geany,
Mousepad, and probably many more.

In my opinion, copying text from a *grep* buffer that looks like ":"
should not suddenly deliver a NUL character instead. That's just
unexpected and prone to problems down the line.

Stefan Monnier suggested the following:

----8<----8<----8<----
This "what you see in NOT what you get" is indeed undesirable.  I'm not
sure it's easy to fix in a reliable way in Emacs (beside not using
`--null` as Eli points out), but I suggest you `M-x report-emacs-bug`.
Maybe grep-mode can add a `filter-buffer-substring-function` that
converts those NUL into `:`.
----8<----8<----8<----

For more information, see the discussion starting with this mailing list
message:

Text copied from *grep* buffer has NUL (0x00) characters
https://lists.gnu.org/archive/html/help-gnu-emacs/2021-05/msg00360.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-09 21:38   ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-10 14:17     ` Eli Zaretskii
  2021-05-11 12:53   ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
  1 sibling, 1 reply; 28+ messages in thread
From: R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-09 21:38 UTC (permalink / raw)
  To: 48324

I think that hexl-mode has problems with the UTF-8 BOM byte sequence at the beginning of a text file. The steps to reproduce this issue are:

Create a text file with a single line with 3 characters: 123

Do a (set-buffer-file-coding-system 'utf-8-with-signature-dos) and save the file.

The file should now have the following contents (8 bytes):

ef bb bf 31 32 33 0d 0a

That is the UTF-8 BOM (ef bb bf), the ASCII digits 1, 2 and 3, and end-of-line sequence (CR LF).

Now change to hexl-mode, place the cursor at the '1' character (31 in hex), call hexl-insert-hex-char, and enter 00 in order to replace the '1' with a 
binary zero (NUL character).

The result is puzzling. Instead of replacing the '1' (31) with NUL (00), the UTF-8 BOM is duplicated, the characters '1' and '2' and '3' have been 
overwritten with the new copy of BOM, character CR has been replaced with NUL, and character LF is intact:

ef bb bf ef bb bf 00 0a

If you save, close and reload the file, it gains one byte, but that is probably not important, just a consequence of having lost the CR character:

ef bb bf ef bb bf 00 0d 0a

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-09 21:38   ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-10 14:17     ` Eli Zaretskii
       [not found]       ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-10 14:17 UTC (permalink / raw)
  To: R. Diez; +Cc: 48324

> Date: Sun, 9 May 2021 23:38:18 +0200
> From:  "R. Diez" via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I think that hexl-mode has problems with the UTF-8 BOM byte sequence at the beginning of a text file. The steps to reproduce this issue are:
> 
> Create a text file with a single line with 3 characters: 123
> 
> Do a (set-buffer-file-coding-system 'utf-8-with-signature-dos) and save the file.
> 
> The file should now have the following contents (8 bytes):
> 
> ef bb bf 31 32 33 0d 0a
> 
> That is the UTF-8 BOM (ef bb bf), the ASCII digits 1, 2 and 3, and end-of-line sequence (CR LF).
> 
> Now change to hexl-mode, place the cursor at the '1' character (31 in hex), call hexl-insert-hex-char, and enter 00 in order to replace the '1' with a 
> binary zero (NUL character).
> 
> The result is puzzling. Instead of replacing the '1' (31) with NUL (00), the UTF-8 BOM is duplicated, the characters '1' and '2' and '3' have been 
> overwritten with the new copy of BOM, character CR has been replaced with NUL, and character LF is intact:
> 
> ef bb bf ef bb bf 00 0a
> 
> If you save, close and reload the file, it gains one byte, but that is probably not important, just a consequence of having lost the CR character:
> 
> ef bb bf ef bb bf 00 0d 0a

I cannot reproduce this.  Are you sure you are using hexl executable
which came with Emacs 27.2 and not some older/incompatible version?
Are you sure your hexl.el is the one which came with Emacs 27.2?

And on what OS is this (you have omitted all the information collected
by report-emacs-bug, so I cannot know that)?

Thanks.





^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>]

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
       [not found]       ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
@ 2021-05-10 16:13         ` Eli Zaretskii
  2021-05-10 16:28           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-10 16:13 UTC (permalink / raw)
  To: R. Diez; +Cc: 48324

[Please use Reply All to keep the bug address on the CC list.]

> From: "R. Diez" <rdiezmail-emacs@yahoo.de>
> Date: Mon, 10 May 2021 16:36:45 +0200
> 
> > I cannot reproduce this.  Are you sure you are using hexl executable
> > which came with Emacs 27.2 and not some older/incompatible version?
> > Are you sure your hexl.el is the one which came with Emacs 27.2?
> 
> I am running Ubuntu MATE 20.04.2, but I built Emacs myself.
> 
> When I ask for help on hexl-mode and follow the link, I end up in this file:
> 
> ~/rdiez/LocalSoftware/Emacs/emacs-27.2-bin/share/emacs/27.2/lisp/hexl.el.gz
> 
> There is no hexl executable on the PATH as far as I can tell, but there is one here:
> 
> /home/rdiez/rdiez/LocalSoftware/Emacs/emacs-27.2-bin/libexec/emacs/27.2/x86_64-pc-linux-gnu/hexl

Strange.  All of the above sounds fine, and yet I cannot reproduce the
problem here.

Is anyone else able to reproduce it?

> The full system information is:
> 
> In GNU Emacs 27.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0)
>   of 2021-05-08 built on rdiez4
> Windowing system distributor 'The X.Org Foundation', version 11.0.12009000
> System Description: Ubuntu 20.04.2 LTS
> 
> Recent messages:
> Mark set
> Mark saved where search started
> Mark set
> Making completion list... [2 times]
> Quit [2 times]
> user-error: No window up from selected window
> user-error: You didn’t specify a function symbol
> Type C-x 1 to delete the help window, C-M-v to scroll help.
> Mark set [3 times]
> Making completion list...
> 
> Configured using:
>   'configure 'CFLAGS=-g3 -O3 -march=native -flto' --with-x-toolkit=gtk3
>   --with-cairo --with-xwidgets
>   --prefix=/home/rdiez/rdiez/LocalSoftware/Emacs/emacs-27.2-bin'
> 
> Configured features:
> XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM DBUS GSETTINGS GLIB NOTIFY
> INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF
> ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS XWIDGETS
> LIBSYSTEMD JSON PDUMPER LCMS2 GMP
> 
> Important settings:
>    value of $LC_MONETARY: de_DE.UTF-8
>    value of $LC_NUMERIC: de_DE.UTF-8
>    value of $LC_TIME: de_DE.UTF-8
>    value of $LANG: en_US.UTF-8
>    locale-coding-system: utf-8-unix
> 
> Major mode: Term
> 
> Minor modes in effect:
>    hexl-follow-ascii: t
>    global-undo-tree-mode: t
>    save-place-mode: t
>    which-key-mode: t
>    hes-mode: t
>    tabbar-mwheel-mode: t
>    tabbar-mode: t
>    shell-dirtrack-mode: t
>    recentf-mode: t
>    xterm-mouse-mode: t
>    savehist-mode: t
>    dtrt-indent-global-mode: t
>    override-global-mode: t
>    delete-selection-mode: t
>    show-paren-mode: t
>    tooltip-mode: t
>    global-eldoc-mode: t
>    electric-indent-mode: t
>    mouse-wheel-mode: t
>    menu-bar-mode: t
>    file-name-shadow-mode: t
>    global-font-lock-mode: t
>    font-lock-mode: t
>    blink-cursor-mode: t
>    auto-composition-mode: t
>    auto-encryption-mode: t
>    auto-compression-mode: t
>    buffer-read-only: t
>    column-number-mode: t
>    line-number-mode: t
> 
> Load-path shadows:
> None found.
> 
> Features:
> (shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa
> derived epg epg-config gnus-util rmail rmail-loaddefs
> text-property-search mm-decode mm-bodies mm-encode mail-parse rfc2231
> mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
> mm-util mail-prsvr mail-utils jka-compr eieio-opt speedbar sb-image
> ezimage dframe find-func help-fns radix-tree hexl texinfo dired-aux
> find-dired ffap thingatpt grep misearch multi-isearch pp vc-git
> diff-mode perl-mode bm cc-mode cc-fonts cc-guess cc-menus cc-cmds etags
> fileloop generator xref project dired-single undo-tree diff iso-transl
> multi-term saveplace which-key highlight-escape-sequences cc-styles
> cc-align cc-engine cc-vars cc-defs dired dired-loaddefs compile term
> disp-table ehelp tabbar tab-line tramp-cache tramp-sh tramp
> tramp-loaddefs trampver tramp-integration files-x tramp-compat shell
> pcomplete comint ansi-color parse-time iso8601 time-date ls-lisp
> format-spec recentf tree-widget xt-mouse savehist auto-package-update
> dash paradox paradox-menu paradox-commit-list hydra ring lv cus-edit
> wid-edit paradox-execute paradox-github paradox-core spinner pod-mode
> edmacro kmacro cl dtrt-indent advice cl-extra help-mode ascii server
> windmove diminish use-package use-package-ensure use-package-delight
> use-package-diminish use-package-bind-key bind-key easy-mmode
> use-package-core finder-inf delsel paren display-fill-column-indicator
> cua-base cus-start cus-load info package easymenu browse-url
> url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs
> eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv
> bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric
> uniquify ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
> term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
> tabulated-list replace newcomment text-mode elisp-mode lisp-mode
> prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
> select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
> term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
> misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
> cp51932 hebrew greek romanian slovak czech european ethiopic indian
> cyrillic chinese composite charscript charprop case-table epa-hook
> jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
> button faces cus-face macroexp files text-properties overlay sha1 md5
> base64 format env code-pages mule custom widget hashtable-print-readable
> backquote threads dbusbind inotify lcms2 dynamic-setting
> system-font-setting font-render-setting xwidget-internal cairo
> move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)
> 
> Memory information:
> ((conses 16 455117 44468)
>   (symbols 48 25447 5)
>   (strings 32 111754 4403)
>   (string-bytes 1 3303255)
>   (vectors 16 43189)
>   (vector-slots 8 1337820 193116)
>   (floats 8 264 219)
>   (intervals 56 17564 0)
>   (buffers 1000 30))
> 
> Regards,
>    rdiez
> 
> 





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 16:13         ` Eli Zaretskii
@ 2021-05-10 16:28           ` Lars Ingebrigtsen
  2021-05-10 16:50             ` Andreas Schwab
  2021-05-10 17:06             ` Eli Zaretskii
  0 siblings, 2 replies; 28+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-10 16:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: R. Diez, 48324

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> Is anyone else able to reproduce it?

Yes, the recipe reproduces fine here (Debian/bullseye on the trunk).
Before:


[-- Attachment #2: Type: image/png, Size: 22648 bytes --]

[-- Attachment #3: Type: text/plain, Size: 31 bytes --]


Then inserting 00 on the 31:


[-- Attachment #4: Type: image/png, Size: 19253 bytes --]

[-- Attachment #5: Type: text/plain, Size: 167 bytes --]


Doubled UTF-8 BOM, and then 00 over the 0d instead of the 31.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 16:28           ` Lars Ingebrigtsen
@ 2021-05-10 16:50             ` Andreas Schwab
  2021-05-10 17:16               ` Eli Zaretskii
  2021-05-10 17:06             ` Eli Zaretskii
  1 sibling, 1 reply; 28+ messages in thread
From: Andreas Schwab @ 2021-05-10 16:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: R. Diez, 48324

On Mai 10 2021, Lars Ingebrigtsen wrote:

> Doubled UTF-8 BOM, and then 00 over the 0d instead of the 31.

That only happens when you call hexl-mode with the decoded file
contents.  With hexl-find-file it doesn't happen, presumably because it
doesn't decode the file contents.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 16:50             ` Andreas Schwab
@ 2021-05-10 17:16               ` Eli Zaretskii
  2021-05-10 17:43                 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-10 17:16 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: rdiezmail-emacs, larsi, 48324

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  "R. Diez" <rdiezmail-emacs@yahoo.de>,
>   48324@debbugs.gnu.org
> Date: Mon, 10 May 2021 18:50:40 +0200
> 
> On Mai 10 2021, Lars Ingebrigtsen wrote:
> 
> > Doubled UTF-8 BOM, and then 00 over the 0d instead of the 31.
> 
> That only happens when you call hexl-mode with the decoded file
> contents.  With hexl-find-file it doesn't happen, presumably because it
> doesn't decode the file contents.

Ah, so maybe I didn't use the exact recipe.  I did try hexl-mode as
well as hexl-find-file, but maybe I missed something.  Could someone
please post an exact recipe, step by step?





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 17:16               ` Eli Zaretskii
@ 2021-05-10 17:43                 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-10 17:51                   ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-10 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, Andreas Schwab, 48324

> Ah, so maybe I didn't use the exact recipe.  I did try hexl-mode as
> well as hexl-find-file, but maybe I missed something.  Could someone
> please post an exact recipe, step by step?

I'll try again:

- I created an empty file with Caja (the MATE Desktop file manager) named Test7.txt . That empty file is 0 bytes long.
- I then dragged the file to Emacs in order to open it.
The default encoding is utf-8-unix (visible on Emacs' status line).
- I pressed my keyboard shortcut for (eval-expression).
- I changed the encoding by manually evaluating this expression:
(set-buffer-file-coding-system 'utf-8-with-signature-dos)
- I then typed in the buffer for Text7.txt the characters "123".
- I saved the buffer with menu "File", option "Save".
- I ran in the minibuffer command hexl-mode, which gives me the hex view for that file:
ef bb bf 31 32 33 0d 0a
- I moved the cursor with the arrow keys to the byte with value "31".
- I ran in the minibuffer command hexl-insert-hex-char, in order to overwrite the 31 with a new value.
- I typed in the minibuffer the hex value "00" (a binary null) and pressed enter.
- In the hex view, the BOM is now duplicated.

Best regards,
   rdiez





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 17:43                 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-10 17:51                   ` Eli Zaretskii
  2021-05-10 18:05                     ` Andreas Schwab
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-10 17:51 UTC (permalink / raw)
  To: R. Diez; +Cc: larsi, schwab, 48324

> Cc: larsi@gnus.org, 48324@debbugs.gnu.org,
>  Andreas Schwab <schwab@linux-m68k.org>
> From: "R. Diez" <rdiezmail-emacs@yahoo.de>
> Date: Mon, 10 May 2021 19:43:20 +0200
> 
> - I created an empty file with Caja (the MATE Desktop file manager) named Test7.txt . That empty file is 0 bytes long.
> - I then dragged the file to Emacs in order to open it.
> The default encoding is utf-8-unix (visible on Emacs' status line).
> - I pressed my keyboard shortcut for (eval-expression).
> - I changed the encoding by manually evaluating this expression:
> (set-buffer-file-coding-system 'utf-8-with-signature-dos)
> - I then typed in the buffer for Text7.txt the characters "123".
> - I saved the buffer with menu "File", option "Save".
> - I ran in the minibuffer command hexl-mode, which gives me the hex view for that file:
> ef bb bf 31 32 33 0d 0a
> - I moved the cursor with the arrow keys to the byte with value "31".
> - I ran in the minibuffer command hexl-insert-hex-char, in order to overwrite the 31 with a new value.
> - I typed in the minibuffer the hex value "00" (a binary null) and pressed enter.
> - In the hex view, the BOM is now duplicated.

Thanks, I see it now.

FTR, here's a shorter and easier recipe:

  emacs -Q
  C-x C-f foo.txt RET
  C-x RET f utf-8-with-signature-dos RET
  1 2 3
  C-x C-s
  M-x hexl-mode RET
  M-x hexl-insert-hex-char RET 00 RET






^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 17:51                   ` Eli Zaretskii
@ 2021-05-10 18:05                     ` Andreas Schwab
  2021-05-11 12:04                       ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Schwab @ 2021-05-10 18:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: R. Diez, larsi, 48324

On Mai 10 2021, Eli Zaretskii wrote:

> FTR, here's a shorter and easier recipe:
>
>   emacs -Q
>   C-x C-f foo.txt RET
>   C-x RET f utf-8-with-signature-dos RET
>   1 2 3
>   C-x C-s
>   M-x hexl-mode RET
>   M-x hexl-insert-hex-char RET 00 RET

I guess the gist is that hexl-mode not only needs to account for the EOL
type, but also for the signature when computing original-point.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 18:05                     ` Andreas Schwab
@ 2021-05-11 12:04                       ` Eli Zaretskii
  2021-05-11 20:37                         ` Glenn Morris
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-11 12:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: rdiezmail-emacs, larsi, 48324

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: "R. Diez" <rdiezmail-emacs@yahoo.de>,  larsi@gnus.org,
>   48324@debbugs.gnu.org
> Date: Mon, 10 May 2021 20:05:33 +0200
> 
> On Mai 10 2021, Eli Zaretskii wrote:
> 
> > FTR, here's a shorter and easier recipe:
> >
> >   emacs -Q
> >   C-x C-f foo.txt RET
> >   C-x RET f utf-8-with-signature-dos RET
> >   1 2 3
> >   C-x C-s
> >   M-x hexl-mode RET
> >   M-x hexl-insert-hex-char RET 00 RET
> 
> I guess the gist is that hexl-mode not only needs to account for the EOL
> type, but also for the signature when computing original-point.

Actually, it turned out that wasn't the main problem.  (It was still a
problem, but the same problem happened in a buffer produced by
hexl-find-file.)  The main problems were that (a) hexl.el handled null
bytes as characters that need to be encoded before inserting them (as
if they were non-ASCII characters), and (b) its handling of non-ASCII
characters when the encoding of the original file used a BOM was
incorrect (because encode-coding-char didn't remove the BOM from the
encoded byte sequence).  By contrast, hexl-find-file visits the file
literally, so its encoding of a null byte was trivially correct.

This should be now fixed on the master branch.

The capability of inserting multibyte characters via Hexl is somewhat
problematic, so I made a point of describing the issues in the
relevant doc strings (because the problems are intrinsic and IMO hard
or impossible to solve in general).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-11 12:04                       ` Eli Zaretskii
@ 2021-05-11 20:37                         ` Glenn Morris
  2021-05-12 13:50                           ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Glenn Morris @ 2021-05-11 20:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rdiezmail-emacs, larsi, Andreas Schwab, 48324

Eli Zaretskii wrote:

> This should be now fixed on the master branch.

The change to encode-coding-char in f3f1947e5b5b causes
test subr-string-limit-coding to fail. Ref eg
https://hydra.nixos.org/build/142879118





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-11 20:37                         ` Glenn Morris
@ 2021-05-12 13:50                           ` Eli Zaretskii
  2022-07-02 16:14                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-12 13:50 UTC (permalink / raw)
  To: Glenn Morris, Lars Ingebrigtsen; +Cc: schwab, 48324

> From: Glenn Morris <rgm@gnu.org>
> Cc: Andreas Schwab <schwab@linux-m68k.org>,  48324@debbugs.gnu.org,  rdiezmail-emacs@yahoo.de,  larsi@gnus.org
> Date: Tue, 11 May 2021 16:37:51 -0400
> 
> Eli Zaretskii wrote:
> 
> > This should be now fixed on the master branch.
> 
> The change to encode-coding-char in f3f1947e5b5b causes
> test subr-string-limit-coding to fail. Ref eg
> https://hydra.nixos.org/build/142879118

Thanks, I fixed that.

The original test results seemed strange, to say the least: it's as if
we shoot first and draw the target later so that it fits.  E.g., how
can the last 4 bytes of encoding "foóá" with UTF-16 be
"\376\377\000\341", with the 2 first bytes coming from the BOM?

This actually reveals a design flaw in string-limit: we cannot simply
use encode-coding-char to encode the characters one by one.  I added a
FIXME comment to explain why, as I don't currently have any clever
ideas for how to implement it more correctly, except by iterations,
which is inelegant.  Ideas welcome.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-12 13:50                           ` Eli Zaretskii
@ 2022-07-02 16:14                             ` Lars Ingebrigtsen
  2022-07-02 16:37                               ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-02 16:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Glenn Morris, schwab, 48324

Eli Zaretskii <eliz@gnu.org> writes:

> This actually reveals a design flaw in string-limit: we cannot simply
> use encode-coding-char to encode the characters one by one.  I added a
> FIXME comment to explain why, as I don't currently have any clever
> ideas for how to implement it more correctly, except by iterations,
> which is inelegant.  Ideas welcome.

Hm...  do we have some way of knowing that the coding system we're using
is one that should have a BOM?  And a function to remove the BOM?

If we had both, then we could strip the BOM from the individual chars,
and add one to the front.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-02 16:14                             ` Lars Ingebrigtsen
@ 2022-07-02 16:37                               ` Eli Zaretskii
  2022-07-03 11:08                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-07-02 16:37 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: rgm, schwab, 48324

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Glenn Morris <rgm@gnu.org>,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 18:14:39 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This actually reveals a design flaw in string-limit: we cannot simply
> > use encode-coding-char to encode the characters one by one.  I added a
> > FIXME comment to explain why, as I don't currently have any clever
> > ideas for how to implement it more correctly, except by iterations,
> > which is inelegant.  Ideas welcome.
> 
> Hm...  do we have some way of knowing that the coding system we're using
> is one that should have a BOM?  And a function to remove the BOM?

The problem is not just with BOM.  The problem will happen with any
coding-system that produces prefix and/or suffix bytes when it encodes
strings.  The FIXME I added mentions ISO-2022 7-bit encodings as
another example.

And then there are coding-system's with pre-write-conversion, and
those can produce any additions they like.

> If we had both, then we could strip the BOM from the individual chars,
> and add one to the front.

AFAIR, what we have now already handles BOM in coding-system's that
are known to produce a BOM.  See encode-coding-char.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-02 16:37                               ` Eli Zaretskii
@ 2022-07-03 11:08                                 ` Lars Ingebrigtsen
  2022-07-03 12:07                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-03 11:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, schwab, 48324

Eli Zaretskii <eliz@gnu.org> writes:

> The problem is not just with BOM.  The problem will happen with any
> coding-system that produces prefix and/or suffix bytes when it encodes
> strings.  The FIXME I added mentions ISO-2022 7-bit encodings as
> another example.
>
> And then there are coding-system's with pre-write-conversion, and
> those can produce any additions they like.
>
>> If we had both, then we could strip the BOM from the individual chars,
>> and add one to the front.
>
> AFAIR, what we have now already handles BOM in coding-system's that
> are known to produce a BOM.  See encode-coding-char.

Ah, OK, it uses (coding-system-get coding-system :bom) and then
special-cases utf-8 and -16 to remove the BOM.

Hm...  I guess the only reliable solution across all coding systems is
(like your comment in the code says) to drop the encode-every-char and
try encoding strings, and then see whether the result is short enough.
That could be done somewhat efficiently using a binary search.  I'll
have a go at it...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 11:08                                 ` Lars Ingebrigtsen
@ 2022-07-03 12:07                                   ` Lars Ingebrigtsen
  2022-07-03 13:00                                     ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-03 12:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, schwab, 48324

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Hm...  I guess the only reliable solution across all coding systems is
> (like your comment in the code says) to drop the encode-every-char and
> try encoding strings, and then see whether the result is short enough.
> That could be done somewhat efficiently using a binary search.  I'll
> have a go at it...

And while I was at it, I changed it to return complete glyphs, not just
complete code points.

There's a behavioural change, though.  This: 

(string-limit "foóá" 6 t 'utf-16)

Now returns a string with a BOM, whereas previously it didn't.  I think
that's what callers would want, though (the use case here is really
IRC -- you have to limit the max encoded length, but I think if you're
talking utf-16, you want the BOM).

But it's debatable.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 12:07                                   ` Lars Ingebrigtsen
@ 2022-07-03 13:00                                     ` Eli Zaretskii
  2022-07-03 13:26                                       ` Eli Zaretskii
  2022-07-03 13:28                                       ` Lars Ingebrigtsen
  0 siblings, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-07-03 13:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: rgm, schwab, 48324

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: rgm@gnu.org,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> Date: Sun, 03 Jul 2022 14:07:43 +0200
> 
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> 
> > Hm...  I guess the only reliable solution across all coding systems is
> > (like your comment in the code says) to drop the encode-every-char and
> > try encoding strings, and then see whether the result is short enough.
> > That could be done somewhat efficiently using a binary search.  I'll
> > have a go at it...
> 
> And while I was at it, I changed it to return complete glyphs, not just
> complete code points.
> 
> There's a behavioural change, though.  This: 
> 
> (string-limit "foóá" 6 t 'utf-16)
> 
> Now returns a string with a BOM, whereas previously it didn't.

So you get 6 characters + the BOM?





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 13:00                                     ` Eli Zaretskii
@ 2022-07-03 13:26                                       ` Eli Zaretskii
  2022-07-03 13:48                                         ` Andreas Schwab
  2022-07-04 10:34                                         ` Lars Ingebrigtsen
  2022-07-03 13:28                                       ` Lars Ingebrigtsen
  1 sibling, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-07-03 13:26 UTC (permalink / raw)
  To: larsi; +Cc: rgm, schwab, 48324

> Cc: rgm@gnu.org, schwab@linux-m68k.org, 48324@debbugs.gnu.org
> Date: Sun, 03 Jul 2022 16:00:47 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Cc: rgm@gnu.org,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> > Date: Sun, 03 Jul 2022 14:07:43 +0200
> > 
> > Lars Ingebrigtsen <larsi@gnus.org> writes:
> > 
> > > Hm...  I guess the only reliable solution across all coding systems is
> > > (like your comment in the code says) to drop the encode-every-char and
> > > try encoding strings, and then see whether the result is short enough.
> > > That could be done somewhat efficiently using a binary search.  I'll
> > > have a go at it...
> > 
> > And while I was at it, I changed it to return complete glyphs, not just
> > complete code points.
> > 
> > There's a behavioural change, though.  This: 
> > 
> > (string-limit "foóá" 6 t 'utf-16)
> > 
> > Now returns a string with a BOM, whereas previously it didn't.
> 
> So you get 6 characters + the BOM?

I see that it's actually 6 bytes _including_ the BOM.  So I think this
is confusing: if we are going to return a string with the BOM, we
should not count the BOM as part of the LENGTH bytes.  Because if I
requested to get characters which fit into N bytes, I should get those
N bytes of payload.  Or maybe we should have an optional argument to
control whether LENGTH includes or excludes the BOM.

In any case, we should mention this aspect in the doc string, I think.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 13:26                                       ` Eli Zaretskii
@ 2022-07-03 13:48                                         ` Andreas Schwab
  2022-07-03 13:51                                           ` Eli Zaretskii
  2022-07-04 10:34                                         ` Lars Ingebrigtsen
  1 sibling, 1 reply; 28+ messages in thread
From: Andreas Schwab @ 2022-07-03 13:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, larsi, 48324

On Jul 03 2022, Eli Zaretskii wrote:

> Or maybe we should have an optional argument to control whether LENGTH
> includes or excludes the BOM.

utf-8-with-signature?

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 13:48                                         ` Andreas Schwab
@ 2022-07-03 13:51                                           ` Eli Zaretskii
  0 siblings, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-07-03 13:51 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: rgm, larsi, 48324

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: larsi@gnus.org,  rgm@gnu.org,  48324@debbugs.gnu.org
> Date: Sun, 03 Jul 2022 15:48:54 +0200
> 
> On Jul 03 2022, Eli Zaretskii wrote:
> 
> > Or maybe we should have an optional argument to control whether LENGTH
> > includes or excludes the BOM.
> 
> utf-8-with-signature?

No, I mean when the CODING-SYSTEM argument requires a BOM (or a
shift-in and shift-out sequences).





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 13:26                                       ` Eli Zaretskii
  2022-07-03 13:48                                         ` Andreas Schwab
@ 2022-07-04 10:34                                         ` Lars Ingebrigtsen
  2022-07-04 11:31                                           ` Eli Zaretskii
  1 sibling, 1 reply; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-04 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, schwab, 48324

Eli Zaretskii <eliz@gnu.org> writes:

> I see that it's actually 6 bytes _including_ the BOM.  So I think this
> is confusing: if we are going to return a string with the BOM, we
> should not count the BOM as part of the LENGTH bytes.  Because if I
> requested to get characters which fit into N bytes, I should get those
> N bytes of payload.  Or maybe we should have an optional argument to
> control whether LENGTH includes or excludes the BOM.

It the caller has asked for a max number of bytes in a coding system
that includes a BOM, then the BOM has to be counted -- otherwise the
bytes won't fit into whatever field the protocol they're using limits
the string to.

However, utf-16 is in a slightly special situation here, since the byte
order is often implied, and people use utf-16 instead of
utf-16be-with-signature (or something), and utf-16 (in Emacs) is defined
to have a BOM.  (And we don't have a -without-signature variant, do we?)

> In any case, we should mention this aspect in the doc string, I think.

Yes.  But should we have -without-signature variants for utf-16?  Then
the doc string could recommend using that if the caller wants BOM-less
bytes.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-04 10:34                                         ` Lars Ingebrigtsen
@ 2022-07-04 11:31                                           ` Eli Zaretskii
  2022-07-05 11:08                                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-07-04 11:31 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: rgm, schwab, 48324

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: rgm@gnu.org,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> Date: Mon, 04 Jul 2022 12:34:29 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I see that it's actually 6 bytes _including_ the BOM.  So I think this
> > is confusing: if we are going to return a string with the BOM, we
> > should not count the BOM as part of the LENGTH bytes.  Because if I
> > requested to get characters which fit into N bytes, I should get those
> > N bytes of payload.  Or maybe we should have an optional argument to
> > control whether LENGTH includes or excludes the BOM.
> 
> It the caller has asked for a max number of bytes in a coding system
> that includes a BOM, then the BOM has to be counted -- otherwise the
> bytes won't fit into whatever field the protocol they're using limits
> the string to.

You obviously have a very specific use case in mind.  But there are
others.  Moreover, UTF and BOM is a special case, where the prefix is
known in advance.  Other encodings, notably from the ISO-2022 family,
are harder because the exact shift-ion sequence is not always easy to
guess.

Which is why I thought a way to control this aspect could be needed.
But we could just document the subtlety and wait for someone to come
up with a practical scenario where it would be needed.

> (And we don't have a -without-signature variant, do we?)

We do: utf-16le and utf-16be.

> > In any case, we should mention this aspect in the doc string, I think.
> 
> Yes.  But should we have -without-signature variants for utf-16?  Then
> the doc string could recommend using that if the caller wants BOM-less
> bytes.

See above.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-04 11:31                                           ` Eli Zaretskii
@ 2022-07-05 11:08                                             ` Lars Ingebrigtsen
  0 siblings, 0 replies; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-05 11:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, schwab, 48324

Eli Zaretskii <eliz@gnu.org> writes:

> You obviously have a very specific use case in mind.  But there are
> others.

I don't see any other use cases for requesting a specific number of
bytes than having some restrictions for the usage of that selection of
bytes. 

>> (And we don't have a -without-signature variant, do we?)
>
> We do: utf-16le and utf-16be.

I've now mentioned this in the doc string.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2022-07-03 13:00                                     ` Eli Zaretskii
  2022-07-03 13:26                                       ` Eli Zaretskii
@ 2022-07-03 13:28                                       ` Lars Ingebrigtsen
  1 sibling, 0 replies; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-03 13:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rgm, schwab, 48324

Eli Zaretskii <eliz@gnu.org> writes:

>> There's a behavioural change, though.  This: 
>> 
>> (string-limit "foóá" 6 t 'utf-16)
>> 
>> Now returns a string with a BOM, whereas previously it didn't.
>
> So you get 6 characters + the BOM?

Two characters and the BOM (i.e., six bytes).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
  2021-05-10 16:28           ` Lars Ingebrigtsen
  2021-05-10 16:50             ` Andreas Schwab
@ 2021-05-10 17:06             ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2021-05-10 17:06 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: rdiezmail-emacs, 48324

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: "R. Diez" <rdiezmail-emacs@yahoo.de>,  48324@debbugs.gnu.org
> Date: Mon, 10 May 2021 18:28:50 +0200
> 
> > Is anyone else able to reproduce it?
> 
> Yes, the recipe reproduces fine here (Debian/bullseye on the trunk).

Then I guess you or someone else will have to debug that.  Since the
OS upgrade on fencepost, I cannot run Emacs there, and cannot build a
new one.  I have no idea when this will be fixed (sysadmin for now
thinks my request is not valid), but until then I'm limited to what I
see on Windows.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters
  2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-09 21:38   ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-11 12:53   ` Lars Ingebrigtsen
  2022-07-02 15:59     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 28+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-11 12:53 UTC (permalink / raw)
  To: R. Diez; +Cc: 48321

"R. Diez" <rdiezmail-emacs@yahoo.de> writes:

> In my opinion, copying text from a *grep* buffer that looks like ":"
> should not suddenly deliver a NUL character instead. That's just
> unexpected and prone to problems down the line.

Yup.  This is cleverly done by this bit in `grep-regexp-alist':

     nil nil
     (3 '(face nil display ":")))

That is -- the "highlight" we're applying is a `display' spec that says
that the separator should be displayed as ":".

Stefan's suggestion to transform the nul character in the filter in grep
wouldn't quite work, I think -- the filtering is done before the
matching.  So instead we need the thing that processes
`compilation-error-regexp-alist' to do the transformation after the
matches, I think?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 28+ messages in thread

* bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters
  2021-05-11 12:53   ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
@ 2022-07-02 15:59     ` Lars Ingebrigtsen
  0 siblings, 0 replies; 28+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-02 15:59 UTC (permalink / raw)
  To: R. Diez; +Cc: 48321

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan's suggestion to transform the nul character in the filter in grep
> wouldn't quite work, I think -- the filtering is done before the
> matching.  So instead we need the thing that processes
> `compilation-error-regexp-alist' to do the transformation after the
> matches, I think?

I've instead used kill-transform-function in Emacs 29 to translate the
nul chars to : chars.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-07-05 11:08 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <0ed1c9c7-26c1-b801-1910-6d5bb50dec3d.ref@yahoo.de>
2021-05-09 19:14 ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-09 21:38   ` bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 14:17     ` Eli Zaretskii
     [not found]       ` <e250f934-6f7b-bec3-9df4-d2b242599a45@yahoo.de>
2021-05-10 16:13         ` Eli Zaretskii
2021-05-10 16:28           ` Lars Ingebrigtsen
2021-05-10 16:50             ` Andreas Schwab
2021-05-10 17:16               ` Eli Zaretskii
2021-05-10 17:43                 ` R. Diez via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-10 17:51                   ` Eli Zaretskii
2021-05-10 18:05                     ` Andreas Schwab
2021-05-11 12:04                       ` Eli Zaretskii
2021-05-11 20:37                         ` Glenn Morris
2021-05-12 13:50                           ` Eli Zaretskii
2022-07-02 16:14                             ` Lars Ingebrigtsen
2022-07-02 16:37                               ` Eli Zaretskii
2022-07-03 11:08                                 ` Lars Ingebrigtsen
2022-07-03 12:07                                   ` Lars Ingebrigtsen
2022-07-03 13:00                                     ` Eli Zaretskii
2022-07-03 13:26                                       ` Eli Zaretskii
2022-07-03 13:48                                         ` Andreas Schwab
2022-07-03 13:51                                           ` Eli Zaretskii
2022-07-04 10:34                                         ` Lars Ingebrigtsen
2022-07-04 11:31                                           ` Eli Zaretskii
2022-07-05 11:08                                             ` Lars Ingebrigtsen
2022-07-03 13:28                                       ` Lars Ingebrigtsen
2021-05-10 17:06             ` Eli Zaretskii
2021-05-11 12:53   ` bug#48321: 27.2; Text copied from *grep* buffer has NUL (0x00) characters Lars Ingebrigtsen
2022-07-02 15:59     ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).