bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
@ 2018-04-04 18:26 Charles A. Roelli
  2018-04-04 19:20 ` Eli Zaretskii
  2021-09-02  8:52 ` Lars Ingebrigtsen
  0 siblings, 2 replies; 6+ messages in thread
From: Charles A. Roelli @ 2018-04-04 18:26 UTC (permalink / raw)
  To: 31062

(This test case assumes a locale-coding-system of utf-8-unix, and
LANG: en_GB.UTF-8 or anything similar.)

emacs -q
C-x b test RET
M-: (insert-byte 195 1) RET
M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
C-x C-s /tmp/foo RET		> the path is irrelevant

There's this warning:

These default coding systems were tried to encode text
in the buffer ‘test’:
  (utf-8-unix (1 . 4194243) (2 . 4194236))
However, each of them encountered characters it couldn’t encode:
  utf-8-unix cannot encode these: \303 \274

Is the text "(1 . 4194243) (2 . 4194236)" useful here?  It looks like
it's there by accident.  If it is helpful, could someone please
explain what it means?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
  2018-04-04 18:26 bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text Charles A. Roelli
@ 2018-04-04 19:20 ` Eli Zaretskii
  2018-04-05 18:27   ` Charles A. Roelli
  2021-09-02  8:52 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2018-04-04 19:20 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 31062

> Date: Wed, 04 Apr 2018 20:26:52 +0200
> From: charles@aurox.ch (Charles A. Roelli)
> 
> emacs -q
> C-x b test RET
> M-: (insert-byte 195 1) RET
> M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
> C-x C-s /tmp/foo RET		> the path is irrelevant
> 
> There's this warning:
> 
> These default coding systems were tried to encode text
> in the buffer ‘test’:
>   (utf-8-unix (1 . 4194243) (2 . 4194236))
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \303 \274
> 
> Is the text "(1 . 4194243) (2 . 4194236)" useful here?

It shows the positions and the codepoints of the offending characters,
and the coding-system that was tried.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
  2018-04-04 19:20 ` Eli Zaretskii
@ 2018-04-05 18:27   ` Charles A. Roelli
  2018-04-05 18:47     ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Charles A. Roelli @ 2018-04-05 18:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 31062

> Date: Wed, 04 Apr 2018 22:20:55 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > There's this warning:
> > 
> > These default coding systems were tried to encode text
> > in the buffer ‘test’:
> >   (utf-8-unix (1 . 4194243) (2 . 4194236))
> > However, each of them encountered characters it couldn’t encode:
> >   utf-8-unix cannot encode these: \303 \274
> > 
> > Is the text "(1 . 4194243) (2 . 4194236)" useful here?
> 
> It shows the positions and the codepoints of the offending characters,
> and the coding-system that was tried.

Thank you for clarifying.  Could we write something like,

> These default coding systems were tried to encode text in the buffer
> 'test', but failed for the listed (POSITION . CODEPOINT) elements:

to make that clear to the user?





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
  2018-04-05 18:27   ` Charles A. Roelli
@ 2018-04-05 18:47     ` Eli Zaretskii
  2018-04-08  9:49       ` Charles A. Roelli
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2018-04-05 18:47 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 31062

> Date: Thu, 05 Apr 2018 20:27:15 +0200
> From: charles@aurox.ch (Charles A. Roelli)
> CC: 31062@debbugs.gnu.org
> 
> > These default coding systems were tried to encode text in the buffer
> > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
> 
> to make that clear to the user?

Feel free to suggest a patch, but the list includes the coding-systems
tried, not just positions and codepoints.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
  2018-04-05 18:47     ` Eli Zaretskii
@ 2018-04-08  9:49       ` Charles A. Roelli
  0 siblings, 0 replies; 6+ messages in thread
From: Charles A. Roelli @ 2018-04-08  9:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 31062

> Date: Thu, 05 Apr 2018 21:47:11 +0300
> From: Eli Zaretskii <eliz@gnu.org>
>
> > > These default coding systems were tried to encode text in the buffer
> > > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
> > 
> > to make that clear to the user?
> 
> Feel free to suggest a patch, but the list includes the coding-systems
> tried, not just positions and codepoints.

That's true, but after looking at the code of
select-safe-coding-system-interactively, it seems that the "rejected"
list is also printed in the same run as "unsafe", and "rejected" is
indeed a list of coding systems.

	    (insert
	     "These default coding systems were tried to encode"
	     (if (stringp from)
		 (concat " \"" (if (> (length from) 10)
				   (concat (substring from 0 10) "...\"")
				 (concat from "\"")))
	       (format-message " text\nin the buffer `%s'" bufname))
	     ":\n")
	    (let ((pos (point))
		  (fill-prefix "  "))
	      (dolist (x (append rejected unsafe)) ← "rejected" printed here
		(princ "  ") (princ x))
	      (insert "\n")
	      (fill-region-as-paragraph pos (point)))

Strangely, the "rejected" list is then printed again, if it's non-nil:

	    (when rejected
	      (insert "These safely encode the text in the buffer,
but are not recommended for encoding text in this context,
e.g., for sending an email message.\n ")
	      (dolist (x rejected)
		(princ " ") (princ x))
	      (insert "\n"))

One solution might be to only print the "rejected" list in this second
form, and in the first form explain more clearly what is the meaning
of the elements in the "unsafe" list.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
  2018-04-04 18:26 bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text Charles A. Roelli
  2018-04-04 19:20 ` Eli Zaretskii
@ 2021-09-02  8:52 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Ingebrigtsen @ 2021-09-02  8:52 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 31062

charles@aurox.ch (Charles A. Roelli) writes:

> (This test case assumes a locale-coding-system of utf-8-unix, and
> LANG: en_GB.UTF-8 or anything similar.)
>
> emacs -q
> C-x b test RET
> M-: (insert-byte 195 1) RET
> M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
> C-x C-s /tmp/foo RET		> the path is irrelevant
>
> There's this warning:
>
> These default coding systems were tried to encode text
> in the buffer ‘test’:
>   (utf-8-unix (1 . 4194243) (2 . 4194236))
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \303 \274
>
> Is the text "(1 . 4194243) (2 . 4194236)" useful here?  It looks like
> it's there by accident.  If it is helpful, could someone please
> explain what it means?

I've now made this warning more readable (and informative) by formatting
it as a table, and saying what the data means in Emacs 28.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-02  8:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-04 18:26 bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text Charles A. Roelli
2018-04-04 19:20 ` Eli Zaretskii
2018-04-05 18:27   ` Charles A. Roelli
2018-04-05 18:47     ` Eli Zaretskii
2018-04-08  9:49       ` Charles A. Roelli
2021-09-02  8:52 ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).