unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* master 28bf387: Tweak Fdirectory_append for efficiency
@ 2021-07-24 16:27 Eli Zaretskii
  2021-07-24 16:29 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:27 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> --- a/src/fileio.c
> +++ b/src/fileio.c
> @@ -795,7 +795,8 @@ usage: (record DIRECTORY &rest COMPONENTS) */)
>        for (i = 0; i < nargs; i++)
>  	{
>  	  Lisp_Object arg = args[i];
> -	  if (STRING_MULTIBYTE (arg))
> +	  /* Use multibyte or all-ASCII strings as is. */
> +	  if (STRING_MULTIBYTE (arg) || SCHARS (arg) == SBYTES (arg))
>  	    elements[i] = arg;
>  	  else
>  	    elements[i] = make_multibyte_string (SSDATA (arg), SCHARS (arg),
> 

Isn't SCHARS (arg) == SBYTES (arg) true for any unibyte string, even
one that includes non-ASCII characters?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 16:27 master 28bf387: Tweak Fdirectory_append for efficiency Eli Zaretskii
@ 2021-07-24 16:29 ` Lars Ingebrigtsen
  2021-07-24 16:36   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-24 16:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Isn't SCHARS (arg) == SBYTES (arg) true for any unibyte string, even
> one that includes non-ASCII characters?

Yes, indeed.  I'll get fixing...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 16:29 ` Lars Ingebrigtsen
@ 2021-07-24 16:36   ` Eli Zaretskii
  2021-07-24 16:49     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:36 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 18:29:11 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Isn't SCHARS (arg) == SBYTES (arg) true for any unibyte string, even
> > one that includes non-ASCII characters?
> 
> Yes, indeed.  I'll get fixing...

I actually don't understand the call to make_multibyte_string in that
loop, either.  Can you tell why you needed to convert unibyte strings?
In any case, make_multibyte_string will not convert as you intended,
because you call it with the identical last two arguments, which is
only valid for all-ASCII strings.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 16:36   ` Eli Zaretskii
@ 2021-07-24 16:49     ` Lars Ingebrigtsen
  2021-07-24 16:58       ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-24 16:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I actually don't understand the call to make_multibyte_string in that
> loop, either.  Can you tell why you needed to convert unibyte strings?

So that when constructing the final string all the elements have the
same(ish) multibytedness.

> In any case, make_multibyte_string will not convert as you intended,
> because you call it with the identical last two arguments, which is
> only valid for all-ASCII strings.

Ah, yes, I meant to call Fstring_to_multibyte there...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 16:49     ` Lars Ingebrigtsen
@ 2021-07-24 16:58       ` Eli Zaretskii
  2021-07-24 17:05         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:58 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 18:49:43 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I actually don't understand the call to make_multibyte_string in that
> > loop, either.  Can you tell why you needed to convert unibyte strings?
> 
> So that when constructing the final string all the elements have the
> same(ish) multibytedness.

Why not just require that up front?  It makes no sense to append
strings of different multibytedness to make a single file name from
them.  The only cases that make sense to me are:

  . all the strings are multibyte
  . some of the strings are multibyte, and some unibyte and pure-ASCII
  . all the strings are unibyte

The last case can happen when we call this function very early during
the startup process, before we set up the file-encoding stuff, and
thus all the file names are unibyte strings.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 16:58       ` Eli Zaretskii
@ 2021-07-24 17:05         ` Lars Ingebrigtsen
  2021-07-24 17:13           ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-24 17:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> So that when constructing the final string all the elements have the
>> same(ish) multibytedness.
>
> Why not just require that up front?  It makes no sense to append
> strings of different multibytedness to make a single file name from
> them.  The only cases that make sense to me are:
>
>   . all the strings are multibyte
>   . some of the strings are multibyte, and some unibyte and pure-ASCII
>   . all the strings are unibyte
>
> The last case can happen when we call this function very early during
> the startup process, before we set up the file-encoding stuff, and
> thus all the file names are unibyte strings.

I don't understand your point here, I'm afraid.

Whenever there's

(multibyte-string-p "/tmp")
=> nil

(multibyte-string-p "bár")
=> t

Why shouldn't I be able to run

(directory-append "/tmp" "bár")
=> "/tmp/bár"

without caring about multibytedness?  This is just a string manipulation
function (geared towards file names).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 17:05         ` Lars Ingebrigtsen
@ 2021-07-24 17:13           ` Eli Zaretskii
  2021-07-25  6:38             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2021-07-24 17:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 19:05:20 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The only cases that make sense to me are:
> >
> >   . all the strings are multibyte
> >   . some of the strings are multibyte, and some unibyte and pure-ASCII
> >   . all the strings are unibyte
> >
> > The last case can happen when we call this function very early during
> > the startup process, before we set up the file-encoding stuff, and
> > thus all the file names are unibyte strings.
> 
> I don't understand your point here, I'm afraid.
> 
> Whenever there's
> 
> (multibyte-string-p "/tmp")
> => nil
> 
> (multibyte-string-p "bár")
> => t
> 
> Why shouldn't I be able to run
> 
> (directory-append "/tmp" "bár")
> => "/tmp/bár"
> 
> without caring about multibytedness?  This is just a string manipulation
> function (geared towards file names).

You should indeed be able to do that, that's my case #2.  The case
that doesn't have to work and doesn't make sense is this

  (multibyte-string-p (encode-coding-string "bár" 'latin-1))
  => nil

  (multibyte-string-p "/tmp/bár")
  => t

  (directory-append "/tmp/bár" (encode-coding-string "bár" 'latin-1))
  => "/tmp/b\303\241r/b\341r"



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-24 17:13           ` Eli Zaretskii
@ 2021-07-25  6:38             ` Lars Ingebrigtsen
  2021-07-25  7:08               ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-25  6:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> You should indeed be able to do that, that's my case #2.  The case
> that doesn't have to work and doesn't make sense is this
>
>   (multibyte-string-p (encode-coding-string "bár" 'latin-1))
>   => nil
>
>   (multibyte-string-p "/tmp/bár")
>   => t
>
>   (directory-append "/tmp/bár" (encode-coding-string "bár" 'latin-1))
>   => "/tmp/b\303\241r/b\341r"

The function isn't really concerned with charsets -- that's an
orthogonal issue.

The user may well have a unibyte string that contains non-ASCII octets
(note -- not characters), and I see no reason why a string concatenating
function should refuse to handle those.  It's up to the caller.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: master 28bf387: Tweak Fdirectory_append for efficiency
  2021-07-25  6:38             ` Lars Ingebrigtsen
@ 2021-07-25  7:08               ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2021-07-25  7:08 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sun, 25 Jul 2021 08:38:02 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > You should indeed be able to do that, that's my case #2.  The case
> > that doesn't have to work and doesn't make sense is this
> >
> >   (multibyte-string-p (encode-coding-string "bár" 'latin-1))
> >   => nil
> >
> >   (multibyte-string-p "/tmp/bár")
> >   => t
> >
> >   (directory-append "/tmp/bár" (encode-coding-string "bár" 'latin-1))
> >   => "/tmp/b\303\241r/b\341r"
> 
> The function isn't really concerned with charsets -- that's an
> orthogonal issue.

I wasn't talking about charsets.  I used encode-coding-string just to
make sure the string is unibyte and emulates the file names we get
from the filesystem before file-coding-system and friends is set up
during startup.

IOW, what you see above is what will happen when the function gets
passed COMPONENTS some of which are multibyte, and some unibyte but
non-ASCII.  I'm saying that such a situation makes no sense, and
perhaps should be even flagged as an error, because it probably means
one of the COMPONENTS wasn't decoded correctly.  If you can describe a
situation where this is legitimate, please do.

> The user may well have a unibyte string that contains non-ASCII octets
> (note -- not characters), and I see no reason why a string concatenating
> function should refuse to handle those.  It's up to the caller.

I tried to explain that above.  If you are still unconvinced, so be
it.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-07-25  7:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-24 16:27 master 28bf387: Tweak Fdirectory_append for efficiency Eli Zaretskii
2021-07-24 16:29 ` Lars Ingebrigtsen
2021-07-24 16:36   ` Eli Zaretskii
2021-07-24 16:49     ` Lars Ingebrigtsen
2021-07-24 16:58       ` Eli Zaretskii
2021-07-24 17:05         ` Lars Ingebrigtsen
2021-07-24 17:13           ` Eli Zaretskii
2021-07-25  6:38             ` Lars Ingebrigtsen
2021-07-25  7:08               ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).