unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: master 58a3c54: Avoid using string-make-unibyte in select.el
       [not found] ` <20190622083525.F1CA5209DE@vcs0.savannah.gnu.org>
@ 2019-06-22  8:58   ` Lars Ingebrigtsen
  2019-06-22  9:38     ` Eli Zaretskii
  2019-06-22 13:26   ` [Emacs-diffs] " Stefan Monnier
  1 sibling, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2019-06-22  8:58 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eli Zaretskii

eliz@gnu.org (Eli Zaretskii) writes:

> +            (or (null (multibyte-string-p str))
> +                (setq str (encode-coding-string 'raw-text-unix str))))

Shouldn't that be

(encode-coding-string str 'raw-text-unix)

?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-22  8:58   ` master 58a3c54: Avoid using string-make-unibyte in select.el Lars Ingebrigtsen
@ 2019-06-22  9:38     ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-22  9:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>
> Date: Sat, 22 Jun 2019 10:58:58 +0200
> 
> > +            (or (null (multibyte-string-p str))
> > +                (setq str (encode-coding-string 'raw-text-unix str))))
> 
> Shouldn't that be
> 
> (encode-coding-string str 'raw-text-unix)

Ouch!  Of course!  Thanks for catching this; fixed.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
       [not found] ` <20190622083525.F1CA5209DE@vcs0.savannah.gnu.org>
  2019-06-22  8:58   ` master 58a3c54: Avoid using string-make-unibyte in select.el Lars Ingebrigtsen
@ 2019-06-22 13:26   ` Stefan Monnier
  2019-06-22 13:42     ` Eli Zaretskii
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-06-22 13:26 UTC (permalink / raw)
  To: emacs-devel; +Cc: Eli Zaretskii

> +            (or (null (multibyte-string-p str))
> +                (setq str (encode-coding-string 'raw-text-unix str))))

Isn't this the same as (setq str (string-to-unibyte str))?


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-22 13:26   ` [Emacs-diffs] " Stefan Monnier
@ 2019-06-22 13:42     ` Eli Zaretskii
  2019-06-22 16:44       ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-22 13:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>
> Date: Sat, 22 Jun 2019 09:26:38 -0400
> 
> > +            (or (null (multibyte-string-p str))
> > +                (setq str (encode-coding-string 'raw-text-unix str))))
> 
> Isn't this the same as (setq str (string-to-unibyte str))?

No, because the former doesn't signal an error.  (And I didn't want to
use any of those string-to/as-uni/multibyte functions anyway.)

The only thing we are supposed to do in the multibyte case is to make
sure the raw bytes are converted to their single-byte representation,
which is exactly what raw-text-unix does.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-22 13:42     ` Eli Zaretskii
@ 2019-06-22 16:44       ` Stefan Monnier
  2019-06-22 16:57         ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-06-22 16:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> > +            (or (null (multibyte-string-p str))
>> > +                (setq str (encode-coding-string 'raw-text-unix str))))
>> Isn't this the same as (setq str (string-to-unibyte str))?
> No, because the former doesn't signal an error.

Oh, right, that's yet another subtle distinction between all those alternatives.

BTW, do we actually need to convert to unibyte here?
(most place where we expect a unibyte string, we silently convert from
multibyte when needed, in a way that's basically equivalent to the
above).

> (And I didn't want to use any of those string-to/as-uni/multibyte
> functions anyway.)

I hated those functions and still do for the string-as and string-make
variety, but I'm beginning to like the string-to variety when we need to
convert the representation of a sequence of *bytes* within
encoding/decoding them as chars.

So maybe the present case argues for adding a `no-error` argument to
string-to-unibyte.  I say this because to me (encode-coding-string
'raw-text-unix str) is an oxymoron since `raw-text-unix` is a synonym of
`binary` and `no-conversion`, which basically says "do any
encoding/decoding, instead preserve bytes as bytes".

IOW coding-systems like `raw-text` make sense in places like the
`coding:` tag or in buffer-file-coding-system, where we are forced to
put some kind of coding-system and where it is hence handy to be able to
use `raw-text-unix` to basically skip the en/decoding.
But I find them confusing when passed as a constant to
`en/decode-coding-string`.

> The only thing we are supposed to do in the multibyte case is to make
> sure the raw bytes are converted to their single-byte representation,
> which is exactly what raw-text-unix does.

Right (and indeed string-make-unibyte worked in practice for the same
reason that encoding with pretty much any coding-system preserves the
bytes as well, save for a few exceptions like utf-8-emacs).


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-22 16:44       ` Stefan Monnier
@ 2019-06-22 16:57         ` Eli Zaretskii
  2019-06-23  2:45           ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-22 16:57 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Sat, 22 Jun 2019 12:44:05 -0400
> 
> So maybe the present case argues for adding a `no-error` argument to
> string-to-unibyte.

What is the use case for string-to-unibyte that cannot be satisfied by
encoding with raw-text/binary, if we also don't signal an error?

> I say this because to me (encode-coding-string 'raw-text-unix str)
> is an oxymoron since `raw-text-unix` is a synonym of `binary` and
> `no-conversion`, which basically says "do any encoding/decoding,
> instead preserve bytes as bytes".

For reasons of avoiding mental overload, I prefer not to use
no-conversion where in fact there is a conversion.  That's why I
didn't use 'binary' in this case.

> IOW coding-systems like `raw-text` make sense in places like the
> `coding:` tag or in buffer-file-coding-system, where we are forced to
> put some kind of coding-system and where it is hence handy to be able to
> use `raw-text-unix` to basically skip the en/decoding.
> But I find them confusing when passed as a constant to
> `en/decode-coding-string`.

It's the other way around here.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-22 16:57         ` Eli Zaretskii
@ 2019-06-23  2:45           ` Stefan Monnier
  2019-06-23 14:26             ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-06-23  2:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> So maybe the present case argues for adding a `no-error` argument to
>> string-to-unibyte.
> What is the use case for string-to-unibyte that cannot be satisfied by
> encoding with raw-text/binary, if we also don't signal an error?

The use case is clear code that says explicitly that this chunk of code
is not trying to convert between chars and bytes but only to convert
between two representations of a sequence of bytes.

It's also code that clearly does the reverse of string-to-multibyte
(whereas decode-doding-string doesn't do the reverse of
encode-coding-string when it comes to `raw-text`).

>> I say this because to me (encode-coding-string 'raw-text-unix str)
>> is an oxymoron since `raw-text-unix` is a synonym of `binary` and
>> `no-conversion`, which basically says "do any encoding/decoding,
>> instead preserve bytes as bytes".
>
> For reasons of avoiding mental overload, I prefer not to use
> no-conversion where in fact there is a conversion.

I also hate `no-conversion`.  But for the same reason I dislike
`raw-text` because the name gives me no intuition and since it is
about preserving bytes rather than characters, it doesn't have much to
do with "text".

> That's why I didn't use 'binary' in this case.

Binary doesn't say what the conversion does, indeed, but it does say
that it applies to binary (rather than text) contents, so I find its
name does provide the needed intuition.

>> IOW coding-systems like `raw-text` make sense in places like the
>> `coding:` tag or in buffer-file-coding-system, where we are forced to
>> put some kind of coding-system and where it is hence handy to be able to
>> use `raw-text-unix` to basically skip the en/decoding.
>> But I find them confusing when passed as a constant to
>> `en/decode-coding-string`.
>
> It's the other way around here.

I don't know what "other way around" means in this context.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-23  2:45           ` Stefan Monnier
@ 2019-06-23 14:26             ` Eli Zaretskii
  2019-06-24  3:48               ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-06-23 14:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Sat, 22 Jun 2019 22:45:07 -0400
> 
> >> So maybe the present case argues for adding a `no-error` argument to
> >> string-to-unibyte.
> > What is the use case for string-to-unibyte that cannot be satisfied by
> > encoding with raw-text/binary, if we also don't signal an error?
> 
> The use case is clear code that says explicitly that this chunk of code
> is not trying to convert between chars and bytes but only to convert
> between two representations of a sequence of bytes.

(a) I wouldn't call anything related to string-to-unibyte "clear",
because the act of converting a string to unibyte is not well defined.
(b) Encoding text can also be defined as "converting between two
representations of a sequence of bytes".

> It's also code that clearly does the reverse of string-to-multibyte
> (whereas decode-doding-string doesn't do the reverse of
> encode-coding-string when it comes to `raw-text`).

I think decode-doding-string does do the reverse.

> >> IOW coding-systems like `raw-text` make sense in places like the
> >> `coding:` tag or in buffer-file-coding-system, where we are forced to
> >> put some kind of coding-system and where it is hence handy to be able to
> >> use `raw-text-unix` to basically skip the en/decoding.
> >> But I find them confusing when passed as a constant to
> >> `en/decode-coding-string`.
> >
> > It's the other way around here.
> 
> I don't know what "other way around" means in this context.

It means that our preferences in this case are opposite.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master 58a3c54: Avoid using string-make-unibyte in select.el
  2019-06-23 14:26             ` Eli Zaretskii
@ 2019-06-24  3:48               ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2019-06-24  3:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> (a) I wouldn't call anything related to string-to-unibyte "clear",
> because the act of converting a string to unibyte is not well defined.
> (b) Encoding text can also be defined as "converting between two
> representations of a sequence of bytes".

No, encoding and decoding change the bytes, whereas string-to preserves
the bytes (just once represented as a unibyte string (which can never
be anything else than a sequence of bytes) and the other as a sequence
of chars (some of which stand for bytes)).

A sequence of bytes can be represented in many different ways:
- a unibyte string is the canonical way (because it can only do that,
  so when you receive such a thing you don't need to look for possible
  non-bytes in the sequence or for a non-proper sequence).
- a vector of integers between 0 and 255.
- a list of integers between 0 and 255.
- a multibyte string with chars within the union of the ascii charset
  and the eight-bit charset.
string-to lets you convert a given sequence of bytes between the first
and the last.

>> It's also code that clearly does the reverse of string-to-multibyte
>> (whereas decode-doding-string doesn't do the reverse of
>> encode-coding-string when it comes to `raw-text`).
> I think decode-doding-string does do the reverse.

No: decode-coding-string returns a unibyte string when called with
`raw-text` or `binary`, contrary to string-to-multibyte.

>> >> IOW coding-systems like `raw-text` make sense in places like the
>> >> `coding:` tag or in buffer-file-coding-system, where we are forced to
>> >> put some kind of coding-system and where it is hence handy to be able to
>> >> use `raw-text-unix` to basically skip the en/decoding.
>> >> But I find them confusing when passed as a constant to
>> >> `en/decode-coding-string`.
>> > It's the other way around here.
>> I don't know what "other way around" means in this context.
> It means that our preferences in this case are opposite.

AFAIK using `raw-text` or `no-conversion` in auto-coding-alist or in
`coding:` tags is not a matter of preference: you simply can't specify
string-to-*byte in there.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-24  3:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190622083524.20617.42423@vcs0.savannah.gnu.org>
     [not found] ` <20190622083525.F1CA5209DE@vcs0.savannah.gnu.org>
2019-06-22  8:58   ` master 58a3c54: Avoid using string-make-unibyte in select.el Lars Ingebrigtsen
2019-06-22  9:38     ` Eli Zaretskii
2019-06-22 13:26   ` [Emacs-diffs] " Stefan Monnier
2019-06-22 13:42     ` Eli Zaretskii
2019-06-22 16:44       ` Stefan Monnier
2019-06-22 16:57         ` Eli Zaretskii
2019-06-23  2:45           ` Stefan Monnier
2019-06-23 14:26             ` Eli Zaretskii
2019-06-24  3:48               ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).