unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
@ 2021-10-01 17:10 Alan Mackenzie
  2021-10-01 17:53 ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-01 17:10 UTC (permalink / raw)
  To: 50946

Hello, Emacs.

In emacs -Q in the emacs-28 branch, create the following two line file,
foobar.el, and try to load it:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defvar foo-baz "foobar-baz")
FOOBARELISP-SHORTHANDS: (("foo" . "foobar")))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

This will throw an error, but that isn't important.

What is important is that the symbol foobar-baz is created by the
elisp-shorthands facility.

This shouldn't happen since:
1/- There is no Local Variables section.
2/- There is no variable elisp-shorthands in that non-existent section.

The following errors are evident in hack-elisp-shorthands:
1/- The code doesn't check for a correctly formatted Local Variables
  section.
2/- The code, even if it did check, would only check the last 3000 bytes
  in the file.  The section can occur anywhere in the last 3000
  CHARACTERS.
3/- The code doesn't do a case-sensitive search for "elisp-shorthands".
4/- The code doesn't check for "elisp-shorthands" being a complete
  symbol.
5/- The code doesn't even check that "elisp-shorthands" is in a comment.

I would suggest that these errors be corrected.  I would also suggest
that the entire code and documentation for this new facility be
carefully reviewed by somebody who isn't the original author.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-01 17:10 bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands Alan Mackenzie
@ 2021-10-01 17:53 ` Eli Zaretskii
  2021-10-01 21:15   ` João Távora
  2021-10-02  0:48   ` João Távora
  0 siblings, 2 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-01 17:53 UTC (permalink / raw)
  To: Alan Mackenzie, João Távora; +Cc: 50946

> Date: Fri, 1 Oct 2021 17:10:57 +0000
> From: Alan Mackenzie <acm@muc.de>
> 
> In emacs -Q in the emacs-28 branch, create the following two line file,
> foobar.el, and try to load it:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (defvar foo-baz "foobar-baz")
> FOOBARELISP-SHORTHANDS: (("foo" . "foobar")))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> This will throw an error, but that isn't important.
> 
> What is important is that the symbol foobar-baz is created by the
> elisp-shorthands facility.
> 
> This shouldn't happen since:
> 1/- There is no Local Variables section.
> 2/- There is no variable elisp-shorthands in that non-existent section.
> 
> The following errors are evident in hack-elisp-shorthands:
> 1/- The code doesn't check for a correctly formatted Local Variables
>   section.
> 2/- The code, even if it did check, would only check the last 3000 bytes
>   in the file.  The section can occur anywhere in the last 3000
>   CHARACTERS.
> 3/- The code doesn't do a case-sensitive search for "elisp-shorthands".
> 4/- The code doesn't check for "elisp-shorthands" being a complete
>   symbol.
> 5/- The code doesn't even check that "elisp-shorthands" is in a comment.

Thanks.

João, could you please look into this?





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-01 17:53 ` Eli Zaretskii
@ 2021-10-01 21:15   ` João Távora
  2021-10-02  6:10     ` Eli Zaretskii
  2021-10-02  0:48   ` João Távora
  1 sibling, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-01 21:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, 50946

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

Ok, no problem.

I'm afraid I'm the original author though. Frankly, what a pityful
request...

João

On Fri, Oct 1, 2021, 18:53 Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Fri, 1 Oct 2021 17:10:57 +0000
> > From: Alan Mackenzie <acm@muc.de>
> >
> > In emacs -Q in the emacs-28 branch, create the following two line file,
> > foobar.el, and try to load it:
> >
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > (defvar foo-baz "foobar-baz")
> > FOOBARELISP-SHORTHANDS: (("foo" . "foobar")))
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >
> > This will throw an error, but that isn't important.
> >
> > What is important is that the symbol foobar-baz is created by the
> > elisp-shorthands facility.
> >
> > This shouldn't happen since:
> > 1/- There is no Local Variables section.
> > 2/- There is no variable elisp-shorthands in that non-existent section.
> >
> > The following errors are evident in hack-elisp-shorthands:
> > 1/- The code doesn't check for a correctly formatted Local Variables
> >   section.
> > 2/- The code, even if it did check, would only check the last 3000 bytes
> >   in the file.  The section can occur anywhere in the last 3000
> >   CHARACTERS.
> > 3/- The code doesn't do a case-sensitive search for "elisp-shorthands".
> > 4/- The code doesn't check for "elisp-shorthands" being a complete
> >   symbol.
> > 5/- The code doesn't even check that "elisp-shorthands" is in a comment.
>
> Thanks.
>
> João, could you please look into this?
>

[-- Attachment #2: Type: text/html, Size: 2246 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-01 17:53 ` Eli Zaretskii
  2021-10-01 21:15   ` João Távora
@ 2021-10-02  0:48   ` João Távora
  2021-10-02 10:50     ` Alan Mackenzie
  1 sibling, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-02  0:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, 50946-done

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Fri, 1 Oct 2021 17:10:57 +0000
>> From: Alan Mackenzie <acm@muc.de>
>> 
>> In emacs -Q in the emacs-28 branch, create the following two line file,
>> foobar.el, and try to load it:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> (defvar foo-baz "foobar-baz")
>> FOOBARELISP-SHORTHANDS: (("foo" . "foobar")))
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> This will throw an error, but that isn't important.
>> 
>> What is important is that the symbol foobar-baz is created by the
>> elisp-shorthands facility.
>> 
>> This shouldn't happen since:
>> 1/- There is no Local Variables section.
>> 2/- There is no variable elisp-shorthands in that non-existent section.
>> 
>> The following errors are evident in hack-elisp-shorthands:
>> 1/- The code doesn't check for a correctly formatted Local Variables
>>   section.
>> 2/- The code, even if it did check, would only check the last 3000 bytes
>>   in the file.  The section can occur anywhere in the last 3000
>>   CHARACTERS.
>> 3/- The code doesn't do a case-sensitive search for "elisp-shorthands".
>> 4/- The code doesn't check for "elisp-shorthands" being a complete
>>   symbol.
>> 5/- The code doesn't even check that "elisp-shorthands" is in a comment.
>
> Thanks.
>
> João, could you please look into this?

Done.  In the Emacs 28 branch.  All tests pass (except a strange
'seccomp' one that never did).  Let me know if some more bugs lurk.

Addressed all the points except the last one which doesn't make much
sense, since normal `hack-local-variables` also doesn't do any such
check.  In fact what I'm doing is re-using
hack-local-variables--find-variables from files.el, as I had wanted to
anyway.

João






^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-01 21:15   ` João Távora
@ 2021-10-02  6:10     ` Eli Zaretskii
  0 siblings, 0 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-02  6:10 UTC (permalink / raw)
  To: João Távora; +Cc: acm, 50946

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 1 Oct 2021 22:15:03 +0100
> Cc: Alan Mackenzie <acm@muc.de>, 50946@debbugs.gnu.org
> 
> I'm afraid I'm the original author though. Frankly, what a pityful request...

Which is why I edited out the last Alan's request ;-)

Thanks for working on the fixes.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02  0:48   ` João Távora
@ 2021-10-02 10:50     ` Alan Mackenzie
  2021-10-02 11:13       ` João Távora
  2021-10-02 11:38       ` João Távora
  0 siblings, 2 replies; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-02 10:50 UTC (permalink / raw)
  To: João Távora; +Cc: 50946-done

Hello, João.

On Sat, Oct 02, 2021 at 01:48:31 +0100, João Távora wrote:
> Eli Zaretskii <eliz@gnu.org> writes:

> >> Date: Fri, 1 Oct 2021 17:10:57 +0000
> >> From: Alan Mackenzie <acm@muc.de>

> >> In emacs -Q in the emacs-28 branch, create the following two line file,
> >> foobar.el, and try to load it:

> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >> (defvar foo-baz "foobar-baz")
> >> FOOBARELISP-SHORTHANDS: (("foo" . "foobar")))
> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

> >> This will throw an error, but that isn't important.

> >> What is important is that the symbol foobar-baz is created by the
> >> elisp-shorthands facility.

> >> This shouldn't happen since:
> >> 1/- There is no Local Variables section.
> >> 2/- There is no variable elisp-shorthands in that non-existent section.

> >> The following errors are evident in hack-elisp-shorthands:
> >> 1/- The code doesn't check for a correctly formatted Local Variables
> >>   section.
> >> 2/- The code, even if it did check, would only check the last 3000 bytes
> >>   in the file.  The section can occur anywhere in the last 3000
> >>   CHARACTERS.
> >> 3/- The code doesn't do a case-sensitive search for "elisp-shorthands".
> >> 4/- The code doesn't check for "elisp-shorthands" being a complete
> >>   symbol.
> >> 5/- The code doesn't even check that "elisp-shorthands" is in a comment.

> > Thanks.

> > João, could you please look into this?

> Done.

No you're not.  The bug is only partially fixed.  Why did you not check
your patch with me before closing the bug?  I've reopened it.

> In the Emacs 28 branch.  All tests pass (except a strange 'seccomp' one
> that never did).  Let me know if some more bugs lurk.

Not more bugs, just the original bug badly "fixed".

> Addressed all the points except the last one which doesn't make much
> sense, ....

You did address it, since hack-local-variables--find-variables checks for
the needed comment openers.

> .... since normal `hack-local-variables` also doesn't do any such
> check.  In fact what I'm doing is re-using
> hack-local-variables--find-variables from files.el, as I had wanted to
> anyway.

Why aren't you just using that function on the buffer anyway, instead of all this
clumsy messing around with temporary buffers, file-attributes, and
successive 100 bytes, and so on?

All right.  What is not fixed?  Say you have a file 3150 bytes long,
which is less than 3000 characters in Emacs.  Your function will load
only 3100 bytes, less than 3000 characters, into the temporary buffer.
It thus may fail to find a Local Variables section, even if this scenario
is highly unusual.

Have you checked that things work if the first byte in your temporary
buffer isn't at the start of a character?

Also, it is unclear whether the elisp-shorthands symbol on the first line
of a file is valid.  I think the intention is that it not be valid.
However, the current version of the code will read it from the first line
of a sufficiently small file.  You should look at this.  Possibly this
needs to be clarified in the documentation for shorthands.

I suggest you narrow the first line of the temporary buffer out in such
circumstances.  Better yet, dispense with all this clever stuff about
3000 bytes and 100 byte chunks, and just use
hack-local-variables--find-variables on the file's buffer (which you'll
have visited anyway, or shortly will visit)?

> João

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 10:50     ` Alan Mackenzie
@ 2021-10-02 11:13       ` João Távora
  2021-10-02 11:38       ` João Távora
  1 sibling, 0 replies; 37+ messages in thread
From: João Távora @ 2021-10-02 11:13 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946-done

Alan Mackenzie <acm@muc.de> writes:

> Why aren't you just using that function on the buffer anyway, instead of all this
> clumsy messing around with temporary buffers, file-attributes, and
> successive 100 bytes, and so on?

I used it to quickly and efficiently discover the value of
elisp-shorthands as soon as possible, before the 'read' takes place in
load-code-with-conversion.

Doing other wise is possible but it is an intrusive change into the
longstanding load-with-code-conversion, which I avoided during the early
days of this feature.  If you can do it and it doens't break anything,
go ahead.  Stefan is investigating this as well.

> ... even if this scenario is highly unusual.

You think?

João





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 10:50     ` Alan Mackenzie
  2021-10-02 11:13       ` João Távora
@ 2021-10-02 11:38       ` João Távora
  2021-10-02 12:38         ` Alan Mackenzie
  1 sibling, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-02 11:38 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946-done

Alan Mackenzie <acm@muc.de> writes:

> clumsy messing around with temporary buffers, file-attributes, and

Also, if you want to converse, be polite.  Your petty posturing doesn't
intimidate me.  So go insult someone else.

João





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 11:38       ` João Távora
@ 2021-10-02 12:38         ` Alan Mackenzie
  2021-10-02 12:52           ` Eli Zaretskii
  2021-10-02 14:47           ` João Távora
  0 siblings, 2 replies; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-02 12:38 UTC (permalink / raw)
  To: João Távora; +Cc: 50946

Hello, João.

On Sat, Oct 02, 2021 at 12:38:52 +0100, João Távora wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > clumsy messing around with temporary buffers, file-attributes, and

I stand by that description.  It was accurate.

> Also, if you want to converse, be polite.  Your petty posturing doesn't
> intimidate me.  So go insult someone else.

I respectfully request you to deal with the issues I have raised.  If
you want to direct disrespect at me, private email would probably be a
better place.

To be perfectly honest, I was shocked when I saw the state of the coding
in that defun.  I have not said on this list what I really think about
it, and will not do so.  I can only hope that you just threw something
together as a proof of concept, and then forgot to change it into
acceptable code before committing it.  Or something like that.  If so,
an apology would be appropriate.

You closed this bug without fixing it.  There are corner cases the "fix"
doesn't handle, which I pointed out earlier and you ignored.  You closed
the bug without even having the decency to ask me to check the patch
first.  That is not polite, and not the normal way things are done,
here.

I think that with hack-elisp-shorthands having been coded without
attention to detail, there is a good chance the rest of this feature is
similarly lacking in attention to detail, which is why I asked for an
independent person to check.  Eli seems to think this isn't a problem.

You haven't fixed this bug.  When you first closed it this afternoon, I
assumed you did so by accident, since the -done@debbugs.gnu.org was on
the Cc:.  Your recent closing of this unfixed bug was clearly
deliberate.

I'm not going to get into a childish game with you, opening and closing
this bug repeatedly.  Instead, I invite you to calm down, think it over
over the next few days, and consider whether such unfixed bugs are
really the right thing for Emacs.

> João

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 12:38         ` Alan Mackenzie
@ 2021-10-02 12:52           ` Eli Zaretskii
  2021-10-02 13:57             ` Alan Mackenzie
  2021-10-02 14:47           ` João Távora
  1 sibling, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-02 12:52 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sat, 2 Oct 2021 12:38:58 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> I think that with hack-elisp-shorthands having been coded without
> attention to detail, there is a good chance the rest of this feature is
> similarly lacking in attention to detail, which is why I asked for an
> independent person to check.  Eli seems to think this isn't a problem.

If you want me to take your critique seriously, please be specific:
what are the aspects of that function that you think lack attention to
detail, and what detail?  We _are_ having a technical discussion of a
Lisp program, and quite a short one at that, right?  If so, it
shouldn't be hard for you to provide the details of what worries you
there.  And while at that, please try to distinguish between general
problems of hack-local-variables--find-variables, which affect all of
Emacs, and hack-elisp-shorthands, which is specific to this feature.

> You haven't fixed this bug.  When you first closed it this afternoon, I
> assumed you did so by accident, since the -done@debbugs.gnu.org was on
> the Cc:.  Your recent closing of this unfixed bug was clearly
> deliberate.

Which bug (or bugs) is that?

> I'm not going to get into a childish game with you, opening and closing
> this bug repeatedly.  Instead, I invite you to calm down, think it over
> over the next few days, and consider whether such unfixed bugs are
> really the right thing for Emacs.

I don't think you are calm enough, either.  So the invitation to calm
down goes both ways here.  Please focus on technical issues and leave
ad-hominem out of this, okay?





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 12:52           ` Eli Zaretskii
@ 2021-10-02 13:57             ` Alan Mackenzie
  2021-10-02 14:19               ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-02 13:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sat, Oct 02, 2021 at 15:52:29 +0300, Eli Zaretskii wrote:
> > Date: Sat, 2 Oct 2021 12:38:58 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > I think that with hack-elisp-shorthands having been coded without
> > attention to detail, there is a good chance the rest of this feature is
> > similarly lacking in attention to detail, which is why I asked for an
> > independent person to check.  Eli seems to think this isn't a problem.

> If you want me to take your critique seriously, please be specific:
> what are the aspects of that function that you think lack attention to
> detail, and what detail?

The five aspects I enumerated on my original bug report.  Not checking
for a properly formatted Local Variables: section, not checking for
lower-case in that variable being searched for, not going back at least
3000 characters, etc.  Additionally, the original code didn't check for
a page boundary.

That, to me, is detail that should have been checked, even tested, but
clearly wasn't.

> We _are_ having a technical discussion of a Lisp program, and quite a
> short one at that, right?  If so, it shouldn't be hard for you to
> provide the details of what worries you there.

I think I did that in the original bug report.  I also stated in detail
in one of the followups why the patch the João committed didn't entirely
fix the bug.  It took me quite a bit of time to put that followup
together.

There were 6 individual bugs in hack-elisp-shorthands (the five in my
original bug report plus not checking for a page boundary).  They
weren't difficult complicated things, they were things obvious to me
just perusing the code.  They ought to have been obvious to João, too.
This was unusually bad coding.  There's been no (public) explanation for
this from João, though I suggested one to him.

What worries me is that this lack of attention to detail might extend to
the new code in lread.c, or the documentation page.  I still think
somebody who isn't João and isn't me should check this.  I think I saw
you making a small amendment to the code in lread.c, so maybe you have
already checked that bit.

I worry, to a lesser degree, it is not entirely clear whether setting
the elisp-shorthands variable in the first line of a short file should
be valid or not.  I don't think the current hack-elisp-shorthands is
careful enough about this.

> And while at that, please try to distinguish between general problems
> of hack-local-variables--find-variables, which affect all of Emacs,
> and hack-elisp-shorthands, which is specific to this feature.

I think all the things I've said in this bug report are about
hack-elisp-shorthands and the other new code/doc in this feature.

> > You haven't fixed this bug.  When you first closed it this afternoon, I
> > assumed you did so by accident, since the -done@debbugs.gnu.org was on
> > the Cc:.  Your recent closing of this unfixed bug was clearly
> > deliberate.

> Which bug (or bugs) is that?

Bug #50946.  I have pointed out unfixed edge cases in this thread.

> > I'm not going to get into a childish game with you, opening and closing
> > this bug repeatedly.  Instead, I invite you to calm down, think it over
> > over the next few days, and consider whether such unfixed bugs are
> > really the right thing for Emacs.

> I don't think you are calm enough, either.

I'm not.  When I submit a bug report, I expect it and me to be treated
with respect.  That hasn't happened in this case.

> So the invitation to calm down goes both ways here.  Please focus on
> technical issues and leave ad-hominem out of this, okay?

Is this bug to be fixed completely, or are the edge cases just to be
ignored as unimportant?  That is the current technical issue as I see
it.  I really don't want to fix the bug myself, but am prepared to do so
if nobody else is.  If you think the bug indeed needs completely fixing,
please reopen it - there's no point in me trying to reopen it again.  If
you don't, just leave things the way they are.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 13:57             ` Alan Mackenzie
@ 2021-10-02 14:19               ` Eli Zaretskii
  2021-10-02 14:45                 ` Alan Mackenzie
  2021-10-02 15:02                 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
  0 siblings, 2 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-02 14:19 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sat, 2 Oct 2021 13:57:21 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > If you want me to take your critique seriously, please be specific:
> > what are the aspects of that function that you think lack attention to
> > detail, and what detail?
> 
> The five aspects I enumerated on my original bug report.  Not checking
> for a properly formatted Local Variables: section

That is not part of the function in question, is it?  It's in
hack-local-variables--find-variables, which we use everywhere.

> not checking for lower-case in that variable being searched for

That's also in hack-local-variables--find-variables, right?

> not going back at least 3000 characters

That is now fixed, right?

> Additionally, the original code didn't check for a page boundary.

No longer relevant, right?

> What worries me is that this lack of attention to detail might extend to
> the new code in lread.c, or the documentation page.

You are welcome to point out specific issues with that.

> I still think somebody who isn't João and isn't me should check
> this.

I did that when reviewing the commit.

> I worry, to a lesser degree, it is not entirely clear whether setting
> the elisp-shorthands variable in the first line of a short file should
> be valid or not.  I don't think the current hack-elisp-shorthands is
> careful enough about this.

Why does it matter?

> > And while at that, please try to distinguish between general problems
> > of hack-local-variables--find-variables, which affect all of Emacs,
> > and hack-elisp-shorthands, which is specific to this feature.
> 
> I think all the things I've said in this bug report are about
> hack-elisp-shorthands and the other new code/doc in this feature.

See above: I don't think I agree.

> > So the invitation to calm down goes both ways here.  Please focus on
> > technical issues and leave ad-hominem out of this, okay?
> 
> Is this bug to be fixed completely, or are the edge cases just to be
> ignored as unimportant?  That is the current technical issue as I see
> it.  I really don't want to fix the bug myself, but am prepared to do so
> if nobody else is.  If you think the bug indeed needs completely fixing,
> please reopen it - there's no point in me trying to reopen it again.  If
> you don't, just leave things the way they are.

Please forget what happened before, and let's pick up from the above
points I mention.  What else needs to be fixed in shorthands.el, and
why?





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 14:19               ` Eli Zaretskii
@ 2021-10-02 14:45                 ` Alan Mackenzie
  2021-10-02 15:00                   ` Eli Zaretskii
  2021-10-02 15:02                 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-02 14:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sat, Oct 02, 2021 at 17:19:17 +0300, Eli Zaretskii wrote:
> > Date: Sat, 2 Oct 2021 13:57:21 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > If you want me to take your critique seriously, please be specific:
> > > what are the aspects of that function that you think lack attention to
> > > detail, and what detail?

> > The five aspects I enumerated on my original bug report.  Not checking
> > for a properly formatted Local Variables: section

> That is not part of the function in question, is it?  It's in
> hack-local-variables--find-variables, which we use everywhere.

It is now.  It wasn't when I raised the bug a day or two ago.

> > not checking for lower-case in that variable being searched for

> That's also in hack-local-variables--find-variables, right?

It is now.  It wasn't when I raised the bug

> > not going back at least 3000 characters

> That is now fixed, right?

No, it's not.  In certain edge cases, it will go back fewer than 3000
characters.

> > Additionally, the original code didn't check for a page boundary.

> No longer relevant, right?

No longer relevant.

> > What worries me is that this lack of attention to detail might extend to
> > the new code in lread.c, or the documentation page.

> You are welcome to point out specific issues with that.

In other words, I should do the review myself, you mean?  I really don't
want to do this.  I am not at all enthusiastic about the shorthands
feature and think somebody with more enthusiasm should do it.

> > I still think somebody who isn't João and isn't me should check
> > this.

> I did that when reviewing the commit.

Ah, good!  Thanks!

> > I worry, to a lesser degree, it is not entirely clear whether setting
> > the elisp-shorthands variable in the first line of a short file should
> > be valid or not.  I don't think the current hack-elisp-shorthands is
> > careful enough about this.

> Why does it matter?

Because the first line definition should either be valid or not valid.
Currently it works for a sufficiently small file, but not for a normal
sized file.  This, I think, is a bug.

> > > And while at that, please try to distinguish between general problems
> > > of hack-local-variables--find-variables, which affect all of Emacs,
> > > and hack-elisp-shorthands, which is specific to this feature.

> > I think all the things I've said in this bug report are about
> > hack-elisp-shorthands and the other new code/doc in this feature.

> See above: I don't think I agree.

I've largely been talking about the version of hack-elisp-shorthands I
raised the bug on.

> > > So the invitation to calm down goes both ways here.  Please focus on
> > > technical issues and leave ad-hominem out of this, okay?

> > Is this bug to be fixed completely, or are the edge cases just to be
> > ignored as unimportant?  That is the current technical issue as I see
> > it.  I really don't want to fix the bug myself, but am prepared to do so
> > if nobody else is.  If you think the bug indeed needs completely fixing,
> > please reopen it - there's no point in me trying to reopen it again.  If
> > you don't, just leave things the way they are.

> Please forget what happened before, and let's pick up from the above
> points I mention.  What else needs to be fixed in shorthands.el, and
> why?

I documented those carefully in my post to this thread this morning.
Here are some excerpts from that:

#########################################################################
Say you have a file 3150 bytes long, which is less than 3000 characters
in Emacs.  Your function will load only 3100 bytes, less than 3000
characters, into the temporary buffer.  It thus may fail to find a Local
Variables section, even if this scenario is highly unusual.

Have you checked that things work if the first byte in your temporary
buffer isn't at the start of a character?

Also, it is unclear whether the elisp-shorthands symbol on the first
line of a file is valid.  I think the intention is that it not be valid.
However, the current version of the code will read it from the first
line of a sufficiently small file.  You should look at this.  Possibly
this needs to be clarified in the documentation for shorthands.
#########################################################################

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 12:38         ` Alan Mackenzie
  2021-10-02 12:52           ` Eli Zaretskii
@ 2021-10-02 14:47           ` João Távora
  1 sibling, 0 replies; 37+ messages in thread
From: João Távora @ 2021-10-02 14:47 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946

On Sat, Oct 2, 2021 at 1:38 PM Alan Mackenzie <acm@muc.de> wrote:

> I stand by that description.  It was accurate.

You are now standing in an ignore list.  Kthanksbye.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 14:45                 ` Alan Mackenzie
@ 2021-10-02 15:00                   ` Eli Zaretskii
  2021-10-02 20:07                     ` Alan Mackenzie
  2021-10-03 12:10                     ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Alan Mackenzie
  0 siblings, 2 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-02 15:00 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sat, 2 Oct 2021 14:45:52 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > The five aspects I enumerated on my original bug report.  Not checking
> > > for a properly formatted Local Variables: section
> 
> > That is not part of the function in question, is it?  It's in
> > hack-local-variables--find-variables, which we use everywhere.
> 
> It is now.  It wasn't when I raised the bug a day or two ago.

So this issue is no longer pertinent, right?

> > > not checking for lower-case in that variable being searched for
> 
> > That's also in hack-local-variables--find-variables, right?
> 
> It is now.  It wasn't when I raised the bug

Also not pertinent, right?

> > > not going back at least 3000 characters
> 
> > That is now fixed, right?
> 
> No, it's not.  In certain edge cases, it will go back fewer than 3000
> characters.

Does the patch below solve this?

> > > I worry, to a lesser degree, it is not entirely clear whether setting
> > > the elisp-shorthands variable in the first line of a short file should
> > > be valid or not.  I don't think the current hack-elisp-shorthands is
> > > careful enough about this.
> 
> > Why does it matter?
> 
> Because the first line definition should either be valid or not valid.
> Currently it works for a sufficiently small file, but not for a normal
> sized file.  This, I think, is a bug.

No, I don't think it's a bug, at least not a bug specific to
shorthands.  That's how file-local variables work in general.

> Say you have a file 3150 bytes long, which is less than 3000 characters
> in Emacs.  Your function will load only 3100 bytes, less than 3000
> characters, into the temporary buffer.  It thus may fail to find a Local
> Variables section, even if this scenario is highly unusual.

This should be solved by the change below.

> Have you checked that things work if the first byte in your temporary
> buffer isn't at the start of a character?

I don't see why this matter, can you explain?

Here's the patch I promised:

diff --git a/lisp/shorthands.el b/lisp/shorthands.el
index b8204d6..6162efd 100644
--- a/lisp/shorthands.el
+++ b/lisp/shorthands.el
@@ -40,7 +40,10 @@ hack-elisp-shorthands
     (with-temp-buffer
       (while (and (< (buffer-size) 3000) (>= from 0))
         (insert-file-contents fullname nil from to)
-        (setq to from from (- from 100)))
+        (setq to from
+              from (cond
+                    ((= from 0) -1)
+                    (t (max 0 (- from 100))))))
       ;; FIXME: relies on the `hack-local-variables--find-variables'
       ;; detail of files.el.  That function should be exported,
       ;; possibly be refactored into two parts, since we're only





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 14:19               ` Eli Zaretskii
  2021-10-02 14:45                 ` Alan Mackenzie
@ 2021-10-02 15:02                 ` João Távora
  2021-10-04  0:14                   ` Richard Stallman
  1 sibling, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-02 15:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, Stefan Monnier

Eli,

You can reopen this bug if you want.  Eventually, hack-elisp-shorthands will
go into load-with-code-conversion or something just before it evaluates the
buffer (and thus calls 'read').  I just didn't do it yet because this is really
the most puny of issues right now, IMO.  Stefan is also thinking that l-w-c-c
needs to be updated regarding the lexical-binding local variable, too.
And doing that will also enable shorthands to be configured in other parts
of the buffer, not just the end section.

João

On Sat, Oct 2, 2021 at 3:19 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > Date: Sat, 2 Oct 2021 13:57:21 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>
> >
> > > If you want me to take your critique seriously, please be specific:
> > > what are the aspects of that function that you think lack attention to
> > > detail, and what detail?
> >
> > The five aspects I enumerated on my original bug report.  Not checking
> > for a properly formatted Local Variables: section
>
> That is not part of the function in question, is it?  It's in
> hack-local-variables--find-variables, which we use everywhere.
>
> > not checking for lower-case in that variable being searched for
>
> That's also in hack-local-variables--find-variables, right?
>
> > not going back at least 3000 characters
>
> That is now fixed, right?
>
> > Additionally, the original code didn't check for a page boundary.
>
> No longer relevant, right?
>
> > What worries me is that this lack of attention to detail might extend to
> > the new code in lread.c, or the documentation page.
>
> You are welcome to point out specific issues with that.
>
> > I still think somebody who isn't João and isn't me should check
> > this.
>
> I did that when reviewing the commit.
>
> > I worry, to a lesser degree, it is not entirely clear whether setting
> > the elisp-shorthands variable in the first line of a short file should
> > be valid or not.  I don't think the current hack-elisp-shorthands is
> > careful enough about this.
>
> Why does it matter?
>
> > > And while at that, please try to distinguish between general problems
> > > of hack-local-variables--find-variables, which affect all of Emacs,
> > > and hack-elisp-shorthands, which is specific to this feature.
> >
> > I think all the things I've said in this bug report are about
> > hack-elisp-shorthands and the other new code/doc in this feature.
>
> See above: I don't think I agree.
>
> > > So the invitation to calm down goes both ways here.  Please focus on
> > > technical issues and leave ad-hominem out of this, okay?
> >
> > Is this bug to be fixed completely, or are the edge cases just to be
> > ignored as unimportant?  That is the current technical issue as I see
> > it.  I really don't want to fix the bug myself, but am prepared to do so
> > if nobody else is.  If you think the bug indeed needs completely fixing,
> > please reopen it - there's no point in me trying to reopen it again.  If
> > you don't, just leave things the way they are.
>
> Please forget what happened before, and let's pick up from the above
> points I mention.  What else needs to be fixed in shorthands.el, and
> why?



-- 
João Távora





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 15:00                   ` Eli Zaretskii
@ 2021-10-02 20:07                     ` Alan Mackenzie
  2021-10-03 11:45                       ` Eli Zaretskii
  2021-10-03 12:10                     ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Alan Mackenzie
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-02 20:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sat, Oct 02, 2021 at 18:00:38 +0300, Eli Zaretskii wrote:
> > Date: Sat, 2 Oct 2021 14:45:52 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > > The five aspects I enumerated on my original bug report.  Not checking
> > > > for a properly formatted Local Variables: section

> > > That is not part of the function in question, is it?  It's in
> > > hack-local-variables--find-variables, which we use everywhere.

> > It is now.  It wasn't when I raised the bug a day or two ago.

> So this issue is no longer pertinent, right?

It was pertinent to my observation, but is no longer so, given that you
have reviewed the new C code.

[ .... ]

> > > > not going back at least 3000 characters

> > > That is now fixed, right?

> > No, it's not.  In certain edge cases, it will go back fewer than 3000
> > characters.

> Does the patch below solve this?

I think it does, yes.  Thanks!

> > > > I worry, to a lesser degree, it is not entirely clear whether setting
> > > > the elisp-shorthands variable in the first line of a short file should
> > > > be valid or not.  I don't think the current hack-elisp-shorthands is
> > > > careful enough about this.

> > > Why does it matter?

Otherwise we could have the scenario where somebody sets elisp-shorthands
in the first line of a file, finds it works, then types more into the
file, saves the buffer, then finds when she visits the file again that it
no longer works.  This, I think, would be a Bad Thing.

> > Because the first line definition should either be valid or not valid.
> > Currently it works for a sufficiently small file, but not for a normal
> > sized file.  This, I think, is a bug.

> No, I don't think it's a bug, at least not a bug specific to
> shorthands.  That's how file-local variables work in general.

No, not quite.  For normal file-local variables, having one set in the
first line works regardless of the length of the file.  It wouldn't for
elisp-shorthands, where it would only work for short files.

> > Say you have a file 3150 bytes long, which is less than 3000 characters
> > in Emacs.  Your function will load only 3100 bytes, less than 3000
> > characters, into the temporary buffer.  It thus may fail to find a Local
> > Variables section, even if this scenario is highly unusual.

> This should be solved by the change below.

Yes, thanks.

> > Have you checked that things work if the first byte in your temporary
> > buffer isn't at the start of a character?

> I don't see why this matters, can you explain?

It might matter, I simply don't know.  I rarely type characters into
Emacs which are longer than a single byte in UTF8.  I don't know whether
insert-file-contents does the Right Thing when there's half a character
at point-min, then insert-file-contents inserts the other half of the
character before it.  I don't know to what extent normal Emacs functions
work when there are invalid "characters" at point-min or point-max.  If I
were writing this function, I would want to check these things.

You are an expert on Unicode, so you are far likelier to know how Emacs
handles such things.

> Here's the patch I promised:

> diff --git a/lisp/shorthands.el b/lisp/shorthands.el
> index b8204d6..6162efd 100644
> --- a/lisp/shorthands.el
> +++ b/lisp/shorthands.el
> @@ -40,7 +40,10 @@ hack-elisp-shorthands
>      (with-temp-buffer
>        (while (and (< (buffer-size) 3000) (>= from 0))
>          (insert-file-contents fullname nil from to)
> -        (setq to from from (- from 100)))
> +        (setq to from
> +              from (cond
> +                    ((= from 0) -1)
> +                    (t (max 0 (- from 100))))))
>        ;; FIXME: relies on the `hack-local-variables--find-variables'
>        ;; detail of files.el.  That function should be exported,
>        ;; possibly be refactored into two parts, since we're only

Thanks!  I think it's right.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 20:07                     ` Alan Mackenzie
@ 2021-10-03 11:45                       ` Eli Zaretskii
  0 siblings, 0 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 11:45 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sat, 2 Oct 2021 20:07:26 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > > > not going back at least 3000 characters
> 
> > > > That is now fixed, right?
> 
> > > No, it's not.  In certain edge cases, it will go back fewer than 3000
> > > characters.
> 
> > Does the patch below solve this?
> 
> I think it does, yes.  Thanks!

Now installed.

> > > > > I worry, to a lesser degree, it is not entirely clear whether setting
> > > > > the elisp-shorthands variable in the first line of a short file should
> > > > > be valid or not.  I don't think the current hack-elisp-shorthands is
> > > > > careful enough about this.
> 
> > > > Why does it matter?
> 
> Otherwise we could have the scenario where somebody sets elisp-shorthands
> in the first line of a file, finds it works, then types more into the
> file, saves the buffer, then finds when she visits the file again that it
> no longer works.  This, I think, would be a Bad Thing.

I'm not sure.  If the user doesn't obey the rules, the user gets
amply punished.

> > > Because the first line definition should either be valid or not valid.
> > > Currently it works for a sufficiently small file, but not for a normal
> > > sized file.  This, I think, is a bug.
> 
> > No, I don't think it's a bug, at least not a bug specific to
> > shorthands.  That's how file-local variables work in general.
> 
> No, not quite.  For normal file-local variables, having one set in the
> first line works regardless of the length of the file.  It wouldn't for
> elisp-shorthands, where it would only work for short files.

I don't think I follow.  can you show an example of a problematic
file, so that we are sure we are talking about the same issue?

> > > Have you checked that things work if the first byte in your temporary
> > > buffer isn't at the start of a character?
> 
> > I don't see why this matters, can you explain?
> 
> It might matter, I simply don't know.  I rarely type characters into
> Emacs which are longer than a single byte in UTF8.  I don't know whether
> insert-file-contents does the Right Thing when there's half a character
> at point-min, then insert-file-contents inserts the other half of the
> character before it.

The character read in separate parts will indeed be incorrect, but how
does this affect searching for the local-variables section?  I don't
think it does, because there are only ASCII characters in the header
of that section.

> I don't know to what extent normal Emacs functions
> work when there are invalid "characters" at point-min or point-max.

They aren't invalid characters, they are raw 8-bit bytes.  Emacs
search functions can cope with them without a problem.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-02 15:00                   ` Eli Zaretskii
  2021-10-02 20:07                     ` Alan Mackenzie
@ 2021-10-03 12:10                     ` Alan Mackenzie
  2021-10-03 12:40                       ` Eli Zaretskii
  1 sibling, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-03 12:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

On Sat, Oct 02, 2021 at 18:00:38 +0300, Eli Zaretskii wrote:
> > Date: Sat, 2 Oct 2021 14:45:52 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> > Have you checked that things work if the first byte in your temporary
> > buffer isn't at the start of a character?

> I don't see why this matter, can you explain?

Yes, it does matter.  Because....

Create a file utf8-chars.txt as follows.  All the non-ascii characters
are 2-byte German UTF8 characters:
Yes, it does matter.  Because....

Create a file ~/utf8-chars.txt as follows.  All the non-ascii characters
are 2-byte German UTF8 characters.  Only the Q is an ascii character.
There is a LF at the end:

ÄäÖöQÜüß

Now, in an empty buffer,

   M-: (insert-file-contents "~/utf8-chars.txt" nil 3 15)

..  The first character of this buffer is now the Emacs encoding of the
raw byte 0xa4.

Now do

   M-: (insert-file-contents "~/utf8-chars.txt" nil 0 3)

The entire buffer, apart from the Q and the LF, now consists of raw
bytes, and the buffer is now 16 characters long.  (Is this a bug?).
Note that the Q is now further back from the end of the buffer than it
should be.

Using insert-file-contents-literally instead doesn't help.

So insert-file-contents corrupts the buffer when BEG or END is not at a
character boundary.  This matters for hack-elisp-shorthands, because this
corruption could push the Local Variables: start further back than 3000
characters.  Possibly other problems could happen, too.

My opinion, for what it's worth, is that using insert-file-contents in
hack-elisp-shorthands is a Bad Thing.  Even if it is possible to get it
working rigorously, it is surely not worth the trouble.  Why not simply
visit the file in a buffer, and check for buffer local variables in the
normal fashion?

#########################################################################

There are bugs in the documentation of insert-file-contents in the elisp
manual.  It confuses bytes with characters, and it fails to mention the
need to keep BEG and END at character boundaries.  I propose installing
the following patch to the release branch:



diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi
index 2dc808e694..c344e18c2b 100644
--- a/doc/lispref/files.texi
+++ b/doc/lispref/files.texi
@@ -556,6 +556,9 @@ Reading from Files
 
 If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
 that are byte offsets specifying the portion of the file to insert.
+Be careful to ensure that these byte positions are at character
+boundaries.  Otherwise, Emacs's input functions will corrupt the
+buffer.
 In this case, @var{visit} must be @code{nil}.  For example,
 
 @example
@@ -563,7 +566,7 @@ Reading from Files
 @end example
 
 @noindent
-inserts the first 500 characters of a file.
+inserts the characters coded by the first 500 bytes of a file.
 
 If the argument @var{replace} is non-@code{nil}, it means to replace the
 contents of the buffer (actually, just the accessible portion) with the
@@ -580,7 +583,8 @@ Reading from Files
 This function works like @code{insert-file-contents} except that it
 does not run @code{after-insert-file-functions}, and does not do
 format decoding, character code conversion, automatic uncompression,
-and so on.
+and so on.  @var{beg} and @var{end}, if non-@code{nil}, should be at
+character boundaries, as in @code{insert-file-contents}.
 @end defun
 
 If you want to pass a file name to another process so that another


The doc strings of insert-file-contents\(-literally\)? will also need to
be updated.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 12:10                     ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Alan Mackenzie
@ 2021-10-03 12:40                       ` Eli Zaretskii
  2021-10-03 13:33                         ` Alan Mackenzie
                                           ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 12:40 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sun, 3 Oct 2021 12:10:19 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> Create a file ~/utf8-chars.txt as follows.  All the non-ascii characters
> are 2-byte German UTF8 characters.  Only the Q is an ascii character.
> There is a LF at the end:
> 
> ÄäÖöQÜüß
> 
> Now, in an empty buffer,
> 
>    M-: (insert-file-contents "~/utf8-chars.txt" nil 3 15)
> 
> ..  The first character of this buffer is now the Emacs encoding of the
> raw byte 0xa4.
> 
> Now do
> 
>    M-: (insert-file-contents "~/utf8-chars.txt" nil 0 3)
> 
> The entire buffer, apart from the Q and the LF, now consists of raw
> bytes, and the buffer is now 16 characters long.  (Is this a bug?).
> Note that the Q is now further back from the end of the buffer than it
> should be.

OK, thanks.

> My opinion, for what it's worth, is that using insert-file-contents in
> hack-elisp-shorthands is a Bad Thing.  Even if it is possible to get it
> working rigorously, it is surely not worth the trouble.  Why not simply
> visit the file in a buffer, and check for buffer local variables in the
> normal fashion?

We already visit the file when we load it.

João, why didn't you simply insert

  (alist-get 'elisp-shorthands (hack-local-variables--find-variables))

in load-with-code-conversion, immediately after it calls
insert-file-contents?  Are there any problems with that, and if so,
what are they?

> There are bugs in the documentation of insert-file-contents in the elisp
> manual.  It confuses bytes with characters, and it fails to mention the
> need to keep BEG and END at character boundaries.  I propose installing
> the following patch to the release branch:

Thanks, I will review this later.  However:

> @@ -580,7 +583,8 @@ Reading from Files
>  This function works like @code{insert-file-contents} except that it
>  does not run @code{after-insert-file-functions}, and does not do
>  format decoding, character code conversion, automatic uncompression,
> -and so on.
> +and so on.  @var{beg} and @var{end}, if non-@code{nil}, should be at
> +character boundaries, as in @code{insert-file-contents}.
>  @end defun

I don't think I understand why you made this second correction:
insert-file-contents-literally deals with bytes to begin with.

> The doc strings of insert-file-contents\(-literally\)? will also need to
> be updated.

In some sense, yes.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 12:40                       ` Eli Zaretskii
@ 2021-10-03 13:33                         ` Alan Mackenzie
  2021-10-03 15:04                         ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
  2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
  2 siblings, 0 replies; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-03 13:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sun, Oct 03, 2021 at 15:40:24 +0300, Eli Zaretskii wrote:
> > Date: Sun, 3 Oct 2021 12:10:19 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> > There are bugs in the documentation of insert-file-contents in the
> > elisp manual.  It confuses bytes with characters, and it fails to
> > mention the need to keep BEG and END at character boundaries.  I
> > propose installing the following patch to the release branch:

> Thanks, I will review this later.  However:

> > @@ -580,7 +583,8 @@ Reading from Files
> >  This function works like @code{insert-file-contents} except that it
> >  does not run @code{after-insert-file-functions}, and does not do
> >  format decoding, character code conversion, automatic uncompression,
> > -and so on.
> > +and so on.  @var{beg} and @var{end}, if non-@code{nil}, should be at
> > +character boundaries, as in @code{insert-file-contents}.
> >  @end defun

> I don't think I understand why you made this second correction:
> insert-file-contents-literally deals with bytes to begin with.

OK, thanks, I think I was mistaken, there.  Raw bytes is what we want
from i-f-c-literally.  I find this difficult to see from the text in
files.texi.  It describes the result in terms of the internal processing
rather than the effect seen by the user.  Maybe I could improve that.

> > The doc strings of insert-file-contents\(-literally\)? will also need to
> > be updated.

> In some sense, yes.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers.
  2021-10-03 12:40                       ` Eli Zaretskii
  2021-10-03 13:33                         ` Alan Mackenzie
@ 2021-10-03 15:04                         ` Alan Mackenzie
  2021-10-03 15:25                           ` Eli Zaretskii
  2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
  2 siblings, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-03 15:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sun, Oct 03, 2021 at 15:40:24 +0300, Eli Zaretskii wrote:
> > Date: Sun, 3 Oct 2021 12:10:19 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

[ .... ]

> > There are bugs in the documentation of insert-file-contents in the elisp
> > manual.  It confuses bytes with characters, and it fails to mention the
> > need to keep BEG and END at character boundaries.  I propose installing
> > the following patch to the release branch:

> Thanks, I will review this later.  However:

> > @@ -580,7 +583,8 @@ Reading from Files
> >  This function works like @code{insert-file-contents} except that it
> >  does not run @code{after-insert-file-functions}, and does not do
> >  format decoding, character code conversion, automatic uncompression,
> > -and so on.
> > +and so on.  @var{beg} and @var{end}, if non-@code{nil}, should be at
> > +character boundaries, as in @code{insert-file-contents}.
> >  @end defun

> I don't think I understand why you made this second correction:
> insert-file-contents-literally deals with bytes to begin with.

> > The doc strings of insert-file-contents\(-literally\)? will also need to
> > be updated.

> In some sense, yes.

Here is an updated patch, superseding my patch from midday.  I have
amended the descriptions of the two functions, replacing "corruption" of
the buffer by "inserting raw-text characters" in the first function, and
added explanation to the second.

I wasn't able to find a suitable target for a cross-reference explaining
"raw-text".



diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi
index 2dc808e694..63eaf24769 100644
--- a/doc/lispref/files.texi
+++ b/doc/lispref/files.texi
@@ -556,14 +556,18 @@ Reading from Files
 
 If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
 that are byte offsets specifying the portion of the file to insert.
-In this case, @var{visit} must be @code{nil}.  For example,
+In this case, @var{visit} must be @code{nil}.  Be careful to ensure
+that these byte positions are at character boundaries.  Otherwise,
+Emacs's character code conversion will insert one or more raw-text
+characters into the buffer, which is probably not what you want.  For
+example,
 
 @example
 (insert-file-contents filename nil 0 500)
 @end example
 
 @noindent
-inserts the first 500 characters of a file.
+inserts the characters coded by the first 500 bytes of a file.
 
 If the argument @var{replace} is non-@code{nil}, it means to replace the
 contents of the buffer (actually, just the accessible portion) with the
@@ -577,10 +581,11 @@ Reading from Files
 @end defun
 
 @defun insert-file-contents-literally filename &optional visit beg end replace
-This function works like @code{insert-file-contents} except that it
-does not run @code{after-insert-file-functions}, and does not do
-format decoding, character code conversion, automatic uncompression,
-and so on.
+This function works like @code{insert-file-contents} except that each
+byte in the file is handled separately, being converted into a
+raw-text character if needed.  It does not run
+@code{after-insert-file-functions}, and does not do format decoding,
+character code conversion, automatic uncompression, and so on.
 @end defun
 
 If you want to pass a file name to another process so that another


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers.
  2021-10-03 15:04                         ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
@ 2021-10-03 15:25                           ` Eli Zaretskii
  2021-10-03 17:21                             ` Alan Mackenzie
  0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 15:25 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sun, 3 Oct 2021 15:04:27 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> Here is an updated patch, superseding my patch from midday.  I have
> amended the descriptions of the two functions, replacing "corruption" of
> the buffer by "inserting raw-text characters" in the first function, and
> added explanation to the second.

Thanks, see below some comments.

> I wasn't able to find a suitable target for a cross-reference explaining
> "raw-text".

I think "Coding System Basics" is where we describe that encoding.

> --- a/doc/lispref/files.texi
> +++ b/doc/lispref/files.texi
> @@ -556,14 +556,18 @@ Reading from Files
>  
>  If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
>  that are byte offsets specifying the portion of the file to insert.
> -In this case, @var{visit} must be @code{nil}.  For example,
> +In this case, @var{visit} must be @code{nil}.  Be careful to ensure
> +that these byte positions are at character boundaries.  Otherwise,
> +Emacs's character code conversion will insert one or more raw-text
> +characters into the buffer, which is probably not what you want.  For

This isn't the whole story.  The problem is mainly with the
autodetection of encoding: it can go awry if you give it only a
portion of the file.  But if you bind coding-system-for-read, that
problem goes away, and the only effect of using BEG and END arguments
is limited to the first character/byte read.  In particular, if you
read a file in chunks, the character at the boundary could end up as 2
or more raw bytes -- but as long as you bind coding-system-for-read,
no other parts are supposed to be affected.  And the problematic
sequence of raw bytes can then be converted back to the original
character with very simple Lisp.

So the text you propose is too "frightening", in that it basically
says "don't use that".  Which is too tough, because valid use cases to
use that feature do exist, and if the programmer knows what he/she is
doing it doesn't have to produce garbled buffers.  For the manual, we
need more informative text, which mentions coding-system-for-read.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 12:40                       ` Eli Zaretskii
  2021-10-03 13:33                         ` Alan Mackenzie
  2021-10-03 15:04                         ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
@ 2021-10-03 15:34                         ` João Távora
  2021-10-03 15:42                           ` João Távora
  2021-10-03 15:56                           ` Eli Zaretskii
  2 siblings, 2 replies; 37+ messages in thread
From: João Távora @ 2021-10-03 15:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

Eli Zaretskii <eliz@gnu.org> writes:

> João, why didn't you simply insert
>
>   (alist-get 'elisp-shorthands (hack-local-variables--find-variables))
>
> in load-with-code-conversion, immediately after it calls
> insert-file-contents?  Are there any problems with that, and if so,
> what are they?

There shouldn't be any big problems.  As I said, I think that is
cleaner.  However, doing it "from the outside" is safer (except for
these insert-file-contents bugs/edge cases, which frankly escape me).

Your suggestion has a very minor problem, that you're doing this stuff
in lisp/international/mule.el, which slightly icky.

A bigger problem is that hack-local-variables--find-variables isn't
defined at that point and the function will then be used to load
lisp/files.el itself (which happens to be where h-l-v--f-v is defined).

So either we change file loading order -- a bit scary -- or we setup
some kind of indirection (a hook, a variable).  I tried with a hook but
it doesn't work: it breaks a shorthands test, a most basic one.  Maybe
you can understand where the problem is?  It takes me a while to debug
cause I take this stuff in loadup.el to always need a 'make bootstrap'
after any changes.

I attach the patch I used.

(BTW this is after the agreed renaming to read-symbol-shorthands, which
I just pushed)

João

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 2a855b5673..be6d397f79 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -294,6 +294,9 @@ define-charset
 
     (apply 'define-charset-internal name (mapcar 'cdr attrs))))
 
+(defvar load-with-code-conversion-hook nil
+  "Hook run in `load-with-code-conversion'.")
+
 (defun load-with-code-conversion (fullname file &optional noerror nomessage)
   "Execute a file of Lisp code named FILE whose absolute name is FULLNAME.
 The file contents are decoded before evaluation if necessary.
@@ -328,6 +331,7 @@ load-with-code-conversion
 	      ;; Don't let deactivate-mark remain set.
 	      (let (deactivate-mark)
 		(insert-file-contents fullname))
+              (run-hooks 'load-with-code-conversion-hook)
 	      ;; If the loaded file was inserted with no-conversion or
 	      ;; raw-text coding system, make the buffer unibyte.
 	      ;; Otherwise, eval-buffer might try to interpret random
diff --git a/lisp/loadup.el b/lisp/loadup.el
index 3fb6b81328..3a55d2c805 100644
--- a/lisp/loadup.el
+++ b/lisp/loadup.el
@@ -355,7 +355,6 @@
 (load "paren")
 
 (load "shorthands")
-(setq load-source-file-function #'load-with-shorthands-and-code-conversion)
 
 (load "emacs-lisp/eldoc")
 (load "cus-start") ;Late to reduce customize-rogue (needs loaddefs.el anyway)
diff --git a/lisp/shorthands.el b/lisp/shorthands.el
index c31ef3d216..ecf04ac587 100644
--- a/lisp/shorthands.el
+++ b/lisp/shorthands.el
@@ -28,35 +28,17 @@
 (require 'files)
 (eval-when-compile (require 'cl-lib))
 
-(defun hack-read-symbol-shorthands (fullname)
-  "Return value of `read-symbol-shorthands' file-local variable in FULLNAME.
-FULLNAME is the absolute file name of an Elisp .el file which
-potentially specifies a file-local value for
-`read-symbol-shorthands'.  The Elisp code in FULLNAME isn't read
-or evaluated in any way, except for extraction of the
-buffer-local value of `read-symbol-shorthands'."
-  (let* ((size (nth 7 (file-attributes fullname)))
-         (from (max 0 (- size 3000)))
-         (to size))
-    (with-temp-buffer
-      (while (and (< (buffer-size) 3000) (>= from 0))
-        (insert-file-contents fullname nil from to)
-        (setq to from
-              from (cond
-                    ((= from 0) -1)
-                    (t (max 0 (- from 100))))))
-      ;; FIXME: relies on the `hack-local-variables--find-variables'
-      ;; detail of files.el.  That function should be exported,
-      ;; possibly be refactored into two parts, since we're only
-      ;; interested in basic "Local Variables" parsing.
-      (alist-get 'read-symbol-shorthands (hack-local-variables--find-variables)))))
-
-(defun load-with-shorthands-and-code-conversion (fullname file noerror nomessage)
-  "Like `load-with-code-conversion', but also consider Elisp shorthands.
-This function uses shorthands defined in the file FULLNAME's local
-value of `read-symbol-shorthands', when it processes that file's Elisp code."
-  (let ((read-symbol-shorthands (hack-read-symbol-shorthands fullname)))
-    (load-with-code-conversion fullname file noerror nomessage)))
+(add-hook 'load-with-code-conversion-hook #'hack-read-symbol-shorthands)
+
+(defun hack-read-symbol-shorthands ()
+  "Set `read-symbol-shorthands' from Local Variables section."
+  ;; FIXME: relies on the `hack-local-variables--find-variables'
+  ;; detail of files.el.  That function should be exported,
+  ;; possibly be refactored into two parts, since we're only
+  ;; interested in basic "Local Variables" parsing.
+  (setq-local read-symbol-shorthands
+              (alist-get 'read-symbol-shorthands
+                         (hack-local-variables--find-variables))))
 
 \f
 ;; FIXME: move this all to progmodes/elisp-mode.el?  OTOH it'd make





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
@ 2021-10-03 15:42                           ` João Távora
  2021-10-03 15:56                           ` Eli Zaretskii
  1 sibling, 0 replies; 37+ messages in thread
From: João Távora @ 2021-10-03 15:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

João Távora <joaotavora@gmail.com> writes:

> I tried with a hook but it doesn't work: it breaks a shorthands test,
> a most basic one.  Maybe you can understand where the problem is?

As usual, digging a bit after posting the broken patch brought me to an
apparent solution.  The following patch seems to work, slightly
complexifies load-with-code-conversion, but simplifies shorthands.el
very much (and should solve the problems).  But please have a look.

João

commit 9bc76f971597d42ba0c8b4655501223d03181b3b
Author: João Távora <joaotavora@gmail.com>
Date:   Sun Oct 3 16:05:40 2021 +0100

    Simplify hack-read-symbol-shorthands again (bug#50946)
    
    * lisp/loadup.el (load-source-file-function): Don't set twice.
    
    * lisp/shorthands.el (hack-read-symbol-shorthands): Simplify.
    (load-with-shorthands-and-code-conversion): Remove.
    
    * lisp/international/mule.el (load-with-code-conversion): Call
    load-with-code-coversion-hook.  Set up shorthands.

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 2a855b5673..b612ea5c77 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -294,6 +294,9 @@ define-charset
 
     (apply 'define-charset-internal name (mapcar 'cdr attrs))))
 
+(defvar load-with-code-conversion-hook nil
+  "Hook run in `load-with-code-conversion'.")
+
 (defun load-with-code-conversion (fullname file &optional noerror nomessage)
   "Execute a file of Lisp code named FILE whose absolute name is FULLNAME.
 The file contents are decoded before evaluation if necessary.
@@ -319,7 +322,8 @@ load-with-code-conversion
 	  (let ((load-true-file-name fullname)
                 (load-file-name fullname)
                 (set-auto-coding-for-load t)
-		(inhibit-file-name-operation nil))
+		(inhibit-file-name-operation nil)
+                shorthands)
 	    (with-current-buffer buffer
               ;; So that we don't get completely screwed if the
               ;; file is encoded in some complicated character set,
@@ -328,6 +332,8 @@ load-with-code-conversion
 	      ;; Don't let deactivate-mark remain set.
 	      (let (deactivate-mark)
 		(insert-file-contents fullname))
+              (run-hooks 'load-with-code-conversion-hook)
+              (setq shorthands read-symbol-shorthands)
 	      ;; If the loaded file was inserted with no-conversion or
 	      ;; raw-text coding system, make the buffer unibyte.
 	      ;; Otherwise, eval-buffer might try to interpret random
@@ -338,11 +344,13 @@ load-with-code-conversion
 		  (set-buffer-multibyte nil))
 	      ;; Make `kill-buffer' quiet.
 	      (set-buffer-modified-p nil))
-	    ;; Have the original buffer current while we eval.
-	    (eval-buffer buffer nil
-			 ;; This is compatible with what `load' does.
-                         (if dump-mode file fullname)
-			 nil t))
+	    ;; Have the original buffer current while we eval,
+            ;; but consider shorthands of the eval'ed one.
+	    (let ((read-symbol-shorthands shorthands))
+              (eval-buffer buffer nil
+			   ;; This is compatible with what `load' does.
+                           (if dump-mode file fullname)
+			   nil t)))
 	(let (kill-buffer-hook kill-buffer-query-functions)
 	  (kill-buffer buffer)))
       (do-after-load-evaluation fullname)
diff --git a/lisp/loadup.el b/lisp/loadup.el
index 3fb6b81328..3a55d2c805 100644
--- a/lisp/loadup.el
+++ b/lisp/loadup.el
@@ -355,7 +355,6 @@
 (load "paren")
 
 (load "shorthands")
-(setq load-source-file-function #'load-with-shorthands-and-code-conversion)
 
 (load "emacs-lisp/eldoc")
 (load "cus-start") ;Late to reduce customize-rogue (needs loaddefs.el anyway)
diff --git a/lisp/shorthands.el b/lisp/shorthands.el
index c31ef3d216..ecf04ac587 100644
--- a/lisp/shorthands.el
+++ b/lisp/shorthands.el
@@ -28,35 +28,17 @@
 (require 'files)
 (eval-when-compile (require 'cl-lib))
 
-(defun hack-read-symbol-shorthands (fullname)
-  "Return value of `read-symbol-shorthands' file-local variable in FULLNAME.
-FULLNAME is the absolute file name of an Elisp .el file which
-potentially specifies a file-local value for
-`read-symbol-shorthands'.  The Elisp code in FULLNAME isn't read
-or evaluated in any way, except for extraction of the
-buffer-local value of `read-symbol-shorthands'."
-  (let* ((size (nth 7 (file-attributes fullname)))
-         (from (max 0 (- size 3000)))
-         (to size))
-    (with-temp-buffer
-      (while (and (< (buffer-size) 3000) (>= from 0))
-        (insert-file-contents fullname nil from to)
-        (setq to from
-              from (cond
-                    ((= from 0) -1)
-                    (t (max 0 (- from 100))))))
-      ;; FIXME: relies on the `hack-local-variables--find-variables'
-      ;; detail of files.el.  That function should be exported,
-      ;; possibly be refactored into two parts, since we're only
-      ;; interested in basic "Local Variables" parsing.
-      (alist-get 'read-symbol-shorthands (hack-local-variables--find-variables)))))
-
-(defun load-with-shorthands-and-code-conversion (fullname file noerror nomessage)
-  "Like `load-with-code-conversion', but also consider Elisp shorthands.
-This function uses shorthands defined in the file FULLNAME's local
-value of `read-symbol-shorthands', when it processes that file's Elisp code."
-  (let ((read-symbol-shorthands (hack-read-symbol-shorthands fullname)))
-    (load-with-code-conversion fullname file noerror nomessage)))
+(add-hook 'load-with-code-conversion-hook #'hack-read-symbol-shorthands)
+
+(defun hack-read-symbol-shorthands ()
+  "Set `read-symbol-shorthands' from Local Variables section."
+  ;; FIXME: relies on the `hack-local-variables--find-variables'
+  ;; detail of files.el.  That function should be exported,
+  ;; possibly be refactored into two parts, since we're only
+  ;; interested in basic "Local Variables" parsing.
+  (setq-local read-symbol-shorthands
+              (alist-get 'read-symbol-shorthands
+                         (hack-local-variables--find-variables))))
 
 \f
 ;; FIXME: move this all to progmodes/elisp-mode.el?  OTOH it'd make





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
  2021-10-03 15:42                           ` João Távora
@ 2021-10-03 15:56                           ` Eli Zaretskii
  2021-10-03 16:02                             ` João Távora
  1 sibling, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 15:56 UTC (permalink / raw)
  To: João Távora; +Cc: 50946

> From: João Távora <joaotavora@gmail.com>
> Cc:  50946@debbugs.gnu.org
> Date: Sun, 03 Oct 2021 16:34:19 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > João, why didn't you simply insert
> >
> >   (alist-get 'elisp-shorthands (hack-local-variables--find-variables))
> >
> > in load-with-code-conversion, immediately after it calls
> > insert-file-contents?  Are there any problems with that, and if so,
> > what are they?
> 
> There shouldn't be any big problems.  As I said, I think that is
> cleaner.  However, doing it "from the outside" is safer (except for
> these insert-file-contents bugs/edge cases, which frankly escape me).
> 
> Your suggestion has a very minor problem, that you're doing this stuff
> in lisp/international/mule.el, which slightly icky.

Why "icky"?  The whole load-with-code-conversion stuff is defined
there, so that's a natural place for any things that need to be done
during loading, IMO.

> A bigger problem is that hack-local-variables--find-variables isn't
> defined at that point and the function will then be used to load
> lisp/files.el itself (which happens to be where h-l-v--f-v is defined).

You are talking about loadup?  We can easily condition the call by
that function's being fboundp, no?  Preloaded files, at least those
loaded before file.el, should not use shorthands, so bypassing the
call should not produce any problems.  Am I missing something?

> (BTW this is after the agreed renaming to read-symbol-shorthands, which
> I just pushed)

OK, thanks.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 15:56                           ` Eli Zaretskii
@ 2021-10-03 16:02                             ` João Távora
  2021-10-03 16:20                               ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-03 16:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

On Sun, Oct 3, 2021 at 4:56 PM Eli Zaretskii <eliz@gnu.org> wrote:

> Why "icky"?  The whole load-with-code-conversion stuff is defined
> there, so that's a natural place for any things that need to be done
> during loading, IMO.

"_Slightly_ icky", because that file is supposed to be about coding systems
and internationalisms, at least so I presumed.  So coupling it to a concept
such as shorthands (which is defined after) is _slightly_ icky. But not
the end of the world.

> > A bigger problem is that hack-local-variables--find-variables isn't
> > defined at that point and the function will then be used to load
> > lisp/files.el itself (which happens to be where h-l-v--f-v is defined).
>
> You are talking about loadup?  We can easily condition the call by
> that function's being fboundp, no?  Preloaded files, at least those
> loaded before file.el, should not use shorthands, so bypassing the
> call should not produce any problems.  Am I missing something?

No, fboundp also works, probably.  Do you prefer that to a hook?
A hook is, in theory, more powerful.

João





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 16:02                             ` João Távora
@ 2021-10-03 16:20                               ` Eli Zaretskii
  2021-10-03 17:05                                 ` João Távora
  0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 16:20 UTC (permalink / raw)
  To: João Távora; +Cc: 50946

> From: João Távora <joaotavora@gmail.com>
> Date: Sun, 3 Oct 2021 17:02:04 +0100
> Cc: 50946@debbugs.gnu.org
> 
> > > A bigger problem is that hack-local-variables--find-variables isn't
> > > defined at that point and the function will then be used to load
> > > lisp/files.el itself (which happens to be where h-l-v--f-v is defined).
> >
> > You are talking about loadup?  We can easily condition the call by
> > that function's being fboundp, no?  Preloaded files, at least those
> > loaded before file.el, should not use shorthands, so bypassing the
> > call should not produce any problems.  Am I missing something?
> 
> No, fboundp also works, probably.  Do you prefer that to a hook?
> A hook is, in theory, more powerful.

Well, using a hook in our own sources is IMO ... "icky".

So yes, I prefer the fboundp test.  And then shorthands.el will deal
with stuff other than loading, which I think is a Good Thing.

Thanks.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 16:20                               ` Eli Zaretskii
@ 2021-10-03 17:05                                 ` João Távora
  2021-10-03 17:56                                   ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-03 17:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

Eli Zaretskii <eliz@gnu.org> writes:


>> No, fboundp also works, probably.  Do you prefer that to a hook?
>> A hook is, in theory, more powerful.
> Well, using a hook in our own sources is IMO ... "icky".
> So yes, I prefer the fboundp test.  And then shorthands.el will deal

Hmm, I think we do use them a lot.  Or at least foo-function variables
for the same purpose.

Icky or not, the fboundp one isn't working, the hook one I gave you
earlier does.

With the fboundp, I get some recursive load error (not the first time).
I didn't investigate, maybe you can tell what's going on?  I am missing
sometehing obvious?

After my sig is the patch, followed by the error (search for "make
error".

João

The patch:

diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 2a855b5673..11d344433f 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -319,7 +319,8 @@ load-with-code-conversion
 	  (let ((load-true-file-name fullname)
                 (load-file-name fullname)
                 (set-auto-coding-for-load t)
-		(inhibit-file-name-operation nil))
+		(inhibit-file-name-operation nil)
+                (shorthands))
 	    (with-current-buffer buffer
               ;; So that we don't get completely screwed if the
               ;; file is encoded in some complicated character set,
@@ -328,6 +329,10 @@ load-with-code-conversion
 	      ;; Don't let deactivate-mark remain set.
 	      (let (deactivate-mark)
 		(insert-file-contents fullname))
+              (setq shorthands
+                    (and (fboundp 'hack-local-variables--find-variables)
+                         (alist-get 'read-symbol-shorthands
+                                    (hack-local-variables--find-variables))))
 	      ;; If the loaded file was inserted with no-conversion or
 	      ;; raw-text coding system, make the buffer unibyte.
 	      ;; Otherwise, eval-buffer might try to interpret random
@@ -338,11 +343,13 @@ load-with-code-conversion
 		  (set-buffer-multibyte nil))
 	      ;; Make `kill-buffer' quiet.
 	      (set-buffer-modified-p nil))
-	    ;; Have the original buffer current while we eval.
-	    (eval-buffer buffer nil
-			 ;; This is compatible with what `load' does.
-                         (if dump-mode file fullname)
-			 nil t))
+	    ;; Have the original buffer current while we eval,
+            ;; but consider shorthands of the eval'ed one.
+	    (let ((read-symbol-shorthands shorthands))
+              (eval-buffer buffer nil
+			   ;; This is compatible with what `load' does.
+                           (if dump-mode file fullname)
+			   nil t)))
 	(let (kill-buffer-hook kill-buffer-query-functions)
 	  (kill-buffer buffer)))
       (do-after-load-evaluation fullname)
diff --git a/lisp/loadup.el b/lisp/loadup.el
index 3fb6b81328..3a55d2c805 100644
--- a/lisp/loadup.el
+++ b/lisp/loadup.el
@@ -355,7 +355,6 @@
 (load "paren")
 
 (load "shorthands")
-(setq load-source-file-function #'load-with-shorthands-and-code-conversion)
 
 (load "emacs-lisp/eldoc")
 (load "cus-start") ;Late to reduce customize-rogue (needs loaddefs.el anyway)
diff --git a/lisp/shorthands.el b/lisp/shorthands.el
index c31ef3d216..adbbf3713b 100644
--- a/lisp/shorthands.el
+++ b/lisp/shorthands.el
@@ -25,40 +25,8 @@
 ;; Basic helpers for loading files with Shorthands.
 
 ;;; Code:
-(require 'files)
 (eval-when-compile (require 'cl-lib))
 
-(defun hack-read-symbol-shorthands (fullname)
-  "Return value of `read-symbol-shorthands' file-local variable in FULLNAME.
-FULLNAME is the absolute file name of an Elisp .el file which
-potentially specifies a file-local value for
-`read-symbol-shorthands'.  The Elisp code in FULLNAME isn't read
-or evaluated in any way, except for extraction of the
-buffer-local value of `read-symbol-shorthands'."
-  (let* ((size (nth 7 (file-attributes fullname)))
-         (from (max 0 (- size 3000)))
-         (to size))
-    (with-temp-buffer
-      (while (and (< (buffer-size) 3000) (>= from 0))
-        (insert-file-contents fullname nil from to)
-        (setq to from
-              from (cond
-                    ((= from 0) -1)
-                    (t (max 0 (- from 100))))))
-      ;; FIXME: relies on the `hack-local-variables--find-variables'
-      ;; detail of files.el.  That function should be exported,
-      ;; possibly be refactored into two parts, since we're only
-      ;; interested in basic "Local Variables" parsing.
-      (alist-get 'read-symbol-shorthands (hack-local-variables--find-variables)))))
-
-(defun load-with-shorthands-and-code-conversion (fullname file noerror nomessage)
-  "Like `load-with-code-conversion', but also consider Elisp shorthands.
-This function uses shorthands defined in the file FULLNAME's local
-value of `read-symbol-shorthands', when it processes that file's Elisp code."
-  (let ((read-symbol-shorthands (hack-read-symbol-shorthands fullname)))
-    (load-with-code-conversion fullname file noerror nomessage)))
-
-\f
 ;; FIXME: move this all to progmodes/elisp-mode.el?  OTOH it'd make
 ;; more sense there, OTOH all the elisp font-lock stuff is actually in
 ;; lisp/emacs-lisp/lisp-mode.el, which isn't right either.  So


The make error I get:

make -C lib all
make[1]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/lib'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/lib'
make -C lib-src all
make[1]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/lib-src'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/lib-src'
make -C src VCSWITNESS='$(srcdir)/../.git/logs/HEAD' BIN_DESTDIR=''/usr/local/bin/'' \
	 ELN_DESTDIR='/usr/local/lib/emacs/28.0.60/' all
make[1]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/src'
make -C ../admin/charsets all
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
make -C ../admin/unidata charscript.el
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make[2]: Nothing to be done for 'charscript.el'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make -C ../admin/unidata emoji-zwj.el
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make[2]: Nothing to be done for 'emoji-zwj.el'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make -C ../lisp autoloads EMACS="../src/bootstrap-emacs"
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/lisp'
make -C ../leim all EMACS="../src/bootstrap-emacs"
make[3]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/leim'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/leim'
make -C ../admin/grammars all EMACS="../../src/bootstrap-emacs"
make[3]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/grammars'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/grammars'
  GEN      loaddefs.el
  SCRAPE   . ./calc ./calendar ./cedet ./cedet/ede ./cedet/semantic ...
  SCRAPE   ./cedet/semantic/analyze ./cedet/semantic/bovine ...
  SCRAPE   ./cedet/semantic/decorate ./cedet/semantic/symref ...
  SCRAPE   ./cedet/semantic/wisent ./cedet/srecode ./emacs-lisp ./emulation ...
  SCRAPE   ./erc ./eshell ./gnus ./image ./international ./language ./leim ...
  SCRAPE   ./leim/ja-dic ./leim/quail ./mail ./mh-e ./net ./nxml ./org ...
  SCRAPE   ./play ./progmodes ./textmodes ./url ./vc
  INFO     Scraping files for loaddefs.el... 
  INFO     Scraping files for loaddefs.el...done
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/lisp'
make -C ../admin/unidata all EMACS="../../src/bootstrap-emacs"
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/unidata'
make -C ../admin/charsets cp51932.el
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
make[2]: Nothing to be done for 'cp51932.el'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
make -C ../admin/charsets eucjp-ms.el
make[2]: Entering directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
make[2]: Nothing to be done for 'eucjp-ms.el'.
make[2]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/admin/charsets'
LC_ALL=C ./temacs -batch  -l loadup --temacs=pdump \
	--bin-dest /usr/local/bin/ --eln-dest /usr/local/lib/emacs/28.0.60/
Loading loadup.el (source)...
Dump mode: pdump
Using load-path (/home/capitaomorte/Source/Emacs/emacs/lisp)
Loading emacs-lisp/byte-run...
Loading emacs-lisp/backquote...
Loading subr...
Loading version...
Loading widget...
Loading custom...
Loading emacs-lisp/map-ynp...
Loading international/mule...
Loading international/mule-conf...
Loading env...
Loading format...
Loading bindings...
Loading window...
Loading files...
Loading emacs-lisp/macroexp...
Loading cus-face...
Loading faces...
Loading loaddefs.el (source)...
Loading button...
Loading emacs-lisp/nadvice...
Loading emacs-lisp/cl-preloaded...
Loading obarray...
Loading abbrev...
Loading simple...
Loading help...
Loading jka-cmpr-hook...
Loading epa-hook...
Loading international/mule-cmds...
Loading case-table...
Loading international/charprop.el (source)...
Loading international/characters...
Recursive load: "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-special-lowercase.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-special-lowercase.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-special-lowercase.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-special-lowercase.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-special-lowercase.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/uni-bidi.el", "/home/capitaomorte/Source/Emacs/emacs/lisp/international/characters.elc", "/home/capitaomorte/Source/Emacs/emacs/lisp/loadup.el"
make[1]: *** [Makefile:587: emacs.pdmp] Error 255
make[1]: Leaving directory '/home/capitaomorte/Source/Emacs/emacs/src'
make: *** [Makefile:449: src] Error 2





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers.
  2021-10-03 15:25                           ` Eli Zaretskii
@ 2021-10-03 17:21                             ` Alan Mackenzie
  2021-10-03 17:36                               ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-03 17:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sun, Oct 03, 2021 at 18:25:57 +0300, Eli Zaretskii wrote:
> > Date: Sun, 3 Oct 2021 15:04:27 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > Here is an updated patch, superseding my patch from midday.  I have
> > amended the descriptions of the two functions, replacing "corruption" of
> > the buffer by "inserting raw-text characters" in the first function, and
> > added explanation to the second.

> Thanks, see below some comments.

> > I wasn't able to find a suitable target for a cross-reference explaining
> > "raw-text".

> I think "Coding System Basics" is where we describe that encoding.

OK.

> > --- a/doc/lispref/files.texi
> > +++ b/doc/lispref/files.texi
> > @@ -556,14 +556,18 @@ Reading from Files
 
> >  If @var{beg} and @var{end} are non-@code{nil}, they should be numbers
> >  that are byte offsets specifying the portion of the file to insert.
> > -In this case, @var{visit} must be @code{nil}.  For example,
> > +In this case, @var{visit} must be @code{nil}.  Be careful to ensure
> > +that these byte positions are at character boundaries.  Otherwise,
> > +Emacs's character code conversion will insert one or more raw-text
> > +characters into the buffer, which is probably not what you want.  For

> This isn't the whole story.  The problem is mainly with the
> autodetection of encoding: it can go awry if you give it only a
> portion of the file.  But if you bind coding-system-for-read, that
> problem goes away, and the only effect of using BEG and END arguments
> is limited to the first character/byte read.  In particular, if you
> read a file in chunks, the character at the boundary could end up as 2
> or more raw bytes -- but as long as you bind coding-system-for-read,
> no other parts are supposed to be affected.  And the problematic
> sequence of raw bytes can then be converted back to the original
> character with very simple Lisp.

OK, I've learnt something new.  Thanks!

> So the text you propose is too "frightening", in that it basically
> says "don't use that".  Which is too tough, because valid use cases to
> use that feature do exist, and if the programmer knows what he/she is
> doing it doesn't have to produce garbled buffers.  For the manual, we
> need more informative text, which mentions coding-system-for-read.

OK, how about this third version of my patch?



diff --git a/doc/lispref/files.texi b/doc/lispref/files.texi
index 2dc808e694..e73f53b040 100644
--- a/doc/lispref/files.texi
+++ b/doc/lispref/files.texi
@@ -563,7 +563,17 @@ Reading from Files
 @end example
 
 @noindent
-inserts the first 500 characters of a file.
+inserts the characters coded by the first 500 bytes of a file.
+
+If @var{beg} or @var{end} fails to be at a character boundary, Emacs's
+character code conversion will insert one or more raw-text characters
+(@pxref{Coding System Basics}) into the buffer.  If you want to read
+part of a file this way, you are recommended to bind
+@code{coding-system-for-read} to a suitable value around the call to
+this function (@pxref{Specifying Coding Systems}), and to write Lisp
+code which will check for raw-text characters at the boundaries, read
+the rest of these characters from the file, and convert them back to
+valid characters.
 
 If the argument @var{replace} is non-@code{nil}, it means to replace the
 contents of the buffer (actually, just the accessible portion) with the
@@ -577,10 +587,11 @@ Reading from Files
 @end defun
 
 @defun insert-file-contents-literally filename &optional visit beg end replace
-This function works like @code{insert-file-contents} except that it
-does not run @code{after-insert-file-functions}, and does not do
-format decoding, character code conversion, automatic uncompression,
-and so on.
+This function works like @code{insert-file-contents} except that each
+byte in the file is handled separately, being converted into a
+raw-text character if needed.  It does not run
+@code{after-insert-file-functions}, and does not do format decoding,
+character code conversion, automatic uncompression, and so on.
 @end defun
 
 If you want to pass a file name to another process so that another


-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers.
  2021-10-03 17:21                             ` Alan Mackenzie
@ 2021-10-03 17:36                               ` Eli Zaretskii
  2021-10-03 18:19                                 ` Alan Mackenzie
  0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 17:36 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 50946, joaotavora

> Date: Sun, 3 Oct 2021 17:21:35 +0000
> Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > So the text you propose is too "frightening", in that it basically
> > says "don't use that".  Which is too tough, because valid use cases to
> > use that feature do exist, and if the programmer knows what he/she is
> > doing it doesn't have to produce garbled buffers.  For the manual, we
> > need more informative text, which mentions coding-system-for-read.
> 
> OK, how about this third version of my patch?

LGTM, thanks.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 17:05                                 ` João Távora
@ 2021-10-03 17:56                                   ` Eli Zaretskii
  2021-10-03 18:59                                     ` João Távora
  0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 17:56 UTC (permalink / raw)
  To: João Távora; +Cc: 50946

> From: João Távora <joaotavora@gmail.com>
> Cc: 50946@debbugs.gnu.org
> Date: Sun, 03 Oct 2021 18:05:33 +0100
> 
> Icky or not, the fboundp one isn't working, the hook one I gave you
> earlier does.
> 
> With the fboundp, I get some recursive load error (not the first time).
> I didn't investigate, maybe you can tell what's going on?  I am missing
> sometehing obvious?

Not entirely sure, but I don't think the fboundp test is the culprit.
The trigger is something else:

> --- a/lisp/loadup.el
> +++ b/lisp/loadup.el
> @@ -355,7 +355,6 @@
>  (load "paren")
>  
>  (load "shorthands")
> -(setq load-source-file-function #'load-with-shorthands-and-code-conversion)

Note that previously, the shorthand searching and application was
effectively turned off until very late into the loadup procedure.  But
now, we enable it as soon as files.el is loaded, which is way
earlier.  Somewhere there is the reason for the problem.

So I think, instead of the fboundp test, introduce a variable,
say inhibit-shorthands, set it to t at the beginning of loadup, then
reset to nil after shorthands.el has been loaded.  Then in mule.el
condition the call to hack-local-variables--find-variables on that new
variable instead.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers.
  2021-10-03 17:36                               ` Eli Zaretskii
@ 2021-10-03 18:19                                 ` Alan Mackenzie
  0 siblings, 0 replies; 37+ messages in thread
From: Alan Mackenzie @ 2021-10-03 18:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946, joaotavora

Hello, Eli.

On Sun, Oct 03, 2021 at 20:36:04 +0300, Eli Zaretskii wrote:
> > Date: Sun, 3 Oct 2021 17:21:35 +0000
> > Cc: joaotavora@gmail.com, 50946@debbugs.gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > So the text you propose is too "frightening", in that it basically
> > > says "don't use that".  Which is too tough, because valid use cases to
> > > use that feature do exist, and if the programmer knows what he/she is
> > > doing it doesn't have to produce garbled buffers.  For the manual, we
> > > need more informative text, which mentions coding-system-for-read.

> > OK, how about this third version of my patch?

> LGTM, thanks.

Thanks, I've committed it to the emacs-28 branch.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 17:56                                   ` Eli Zaretskii
@ 2021-10-03 18:59                                     ` João Távora
  2021-10-03 19:51                                       ` Eli Zaretskii
  0 siblings, 1 reply; 37+ messages in thread
From: João Távora @ 2021-10-03 18:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

Eli Zaretskii <eliz@gnu.org> writes:

>> From: João Távora <joaotavora@gmail.com>
>> Cc: 50946@debbugs.gnu.org
>> Date: Sun, 03 Oct 2021 18:05:33 +0100

> Note that previously, the shorthand searching and application was
> effectively turned off until very late into the loadup procedure.  But
> now, we enable it as soon as files.el is loaded, which is way
> earlier.  Somewhere there is the reason for the problem.

Yes, I agree, this makes sense.

> So I think, instead of the fboundp test, introduce a variable,
> say inhibit-shorthands, set it to t at the beginning of loadup, then
> reset to nil after shorthands.el has been loaded.

At this point, I think that's slightly worse than introducing
hack-read-symbol-shorthands-function... or introducing a hook as I had
before.

Given you dislike hooks, the patch with
hack-read-symbol-shorthands-function is below.  Looks good?

João

commit d4416d7f2083bd55984193b94241edb76bb2879c
Author: João Távora <joaotavora@gmail.com>
Date:   Sun Oct 3 16:05:40 2021 +0100

    Simplify hack-read-symbol-shorthands again (bug#50946)
    
    * lisp/loadup.el (load-source-file-function): Don't set twice.
    
    * lisp/shorthands.el (hack-read-symbol-shorthands): Simplify.
    (load-with-shorthands-and-code-conversion): Remove.
    
    * lisp/international/mule.el (load-with-code-conversion): Call
    hack-read-symbol-shorthands-function.  Set up shorthands.
    (hack-read-symbol-shorthands-function): New variable
    
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 2a855b5673..d00e39d228 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -294,6 +294,9 @@ define-charset
 
     (apply 'define-charset-internal name (mapcar 'cdr attrs))))
 
+(defvar hack-read-symbol-shorthands-function nil
+  "Holds function to compute `read-symbol-shorthands'.")
+
 (defun load-with-code-conversion (fullname file &optional noerror nomessage)
   "Execute a file of Lisp code named FILE whose absolute name is FULLNAME.
 The file contents are decoded before evaluation if necessary.
@@ -319,7 +322,8 @@ load-with-code-conversion
 	  (let ((load-true-file-name fullname)
                 (load-file-name fullname)
                 (set-auto-coding-for-load t)
-		(inhibit-file-name-operation nil))
+		(inhibit-file-name-operation nil)
+                shorthands)
 	    (with-current-buffer buffer
               ;; So that we don't get completely screwed if the
               ;; file is encoded in some complicated character set,
@@ -328,6 +332,12 @@ load-with-code-conversion
 	      ;; Don't let deactivate-mark remain set.
 	      (let (deactivate-mark)
 		(insert-file-contents fullname))
+              (setq shorthands
+                    ;; We need this indirection because hacking local
+                    ;; variables in too early seems to have cause recursive
+                    ;; load loops (bug#50946).
+                    (and hack-read-symbol-shorthands-function
+                         (funcall hack-read-symbol-shorthands-function)))
 	      ;; If the loaded file was inserted with no-conversion or
 	      ;; raw-text coding system, make the buffer unibyte.
 	      ;; Otherwise, eval-buffer might try to interpret random
@@ -338,11 +348,13 @@ load-with-code-conversion
 		  (set-buffer-multibyte nil))
 	      ;; Make `kill-buffer' quiet.
 	      (set-buffer-modified-p nil))
-	    ;; Have the original buffer current while we eval.
-	    (eval-buffer buffer nil
-			 ;; This is compatible with what `load' does.
-                         (if dump-mode file fullname)
-			 nil t))
+	    ;; Have the original buffer current while we eval,
+            ;; but consider shorthands of the eval'ed one.
+	    (let ((read-symbol-shorthands shorthands))
+              (eval-buffer buffer nil
+			   ;; This is compatible with what `load' does.
+                           (if dump-mode file fullname)
+			   nil t)))
 	(let (kill-buffer-hook kill-buffer-query-functions)
 	  (kill-buffer buffer)))
       (do-after-load-evaluation fullname)
diff --git a/lisp/loadup.el b/lisp/loadup.el
index 3fb6b81328..3a55d2c805 100644
--- a/lisp/loadup.el
+++ b/lisp/loadup.el
@@ -355,7 +355,6 @@
 (load "paren")
 
 (load "shorthands")
-(setq load-source-file-function #'load-with-shorthands-and-code-conversion)
 
 (load "emacs-lisp/eldoc")
 (load "cus-start") ;Late to reduce customize-rogue (needs loaddefs.el anyway)
diff --git a/lisp/shorthands.el b/lisp/shorthands.el
index c31ef3d216..40a960ff7d 100644
--- a/lisp/shorthands.el
+++ b/lisp/shorthands.el
@@ -28,35 +28,15 @@
 (require 'files)
 (eval-when-compile (require 'cl-lib))
 
-(defun hack-read-symbol-shorthands (fullname)
-  "Return value of `read-symbol-shorthands' file-local variable in FULLNAME.
-FULLNAME is the absolute file name of an Elisp .el file which
-potentially specifies a file-local value for
-`read-symbol-shorthands'.  The Elisp code in FULLNAME isn't read
-or evaluated in any way, except for extraction of the
-buffer-local value of `read-symbol-shorthands'."
-  (let* ((size (nth 7 (file-attributes fullname)))
-         (from (max 0 (- size 3000)))
-         (to size))
-    (with-temp-buffer
-      (while (and (< (buffer-size) 3000) (>= from 0))
-        (insert-file-contents fullname nil from to)
-        (setq to from
-              from (cond
-                    ((= from 0) -1)
-                    (t (max 0 (- from 100))))))
-      ;; FIXME: relies on the `hack-local-variables--find-variables'
-      ;; detail of files.el.  That function should be exported,
-      ;; possibly be refactored into two parts, since we're only
-      ;; interested in basic "Local Variables" parsing.
-      (alist-get 'read-symbol-shorthands (hack-local-variables--find-variables)))))
-
-(defun load-with-shorthands-and-code-conversion (fullname file noerror nomessage)
-  "Like `load-with-code-conversion', but also consider Elisp shorthands.
-This function uses shorthands defined in the file FULLNAME's local
-value of `read-symbol-shorthands', when it processes that file's Elisp code."
-  (let ((read-symbol-shorthands (hack-read-symbol-shorthands fullname)))
-    (load-with-code-conversion fullname file noerror nomessage)))
+(defun hack-read-symbol-shorthands ()
+  "Compute `read-symbol-shorthands' from Local Variables section."
+  ;; FIXME: relies on the `hack-local-variables--find-variables'
+  ;; detail of files.el.  That function should be exported,
+  ;; possibly be refactored into two parts, since we're only
+  ;; interested in basic "Local Variables" parsing.
+  (alist-get 'read-symbol-shorthands (hack-local-variables--find-variables)))
+
+(setq hack-read-symbol-shorthands-function #'hack-read-symbol-shorthands)
 
 \f
 ;; FIXME: move this all to progmodes/elisp-mode.el?  OTOH it'd make





^ permalink raw reply related	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 18:59                                     ` João Távora
@ 2021-10-03 19:51                                       ` Eli Zaretskii
  2021-10-03 19:59                                         ` João Távora
  0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-03 19:51 UTC (permalink / raw)
  To: João Távora; +Cc: 50946

> From: João Távora <joaotavora@gmail.com>
> Cc: 50946@debbugs.gnu.org
> Date: Sun, 03 Oct 2021 19:59:24 +0100
> 
> > Note that previously, the shorthand searching and application was
> > effectively turned off until very late into the loadup procedure.  But
> > now, we enable it as soon as files.el is loaded, which is way
> > earlier.  Somewhere there is the reason for the problem.
> 
> Yes, I agree, this makes sense.
> 
> > So I think, instead of the fboundp test, introduce a variable,
> > say inhibit-shorthands, set it to t at the beginning of loadup, then
> > reset to nil after shorthands.el has been loaded.
> 
> At this point, I think that's slightly worse than introducing
> hack-read-symbol-shorthands-function... or introducing a hook as I had
> before.

Not at all: we have several inhibit-foo bits in loadup already, so one
more won't make any change.

> Given you dislike hooks, the patch with
> hack-read-symbol-shorthands-function is below.  Looks good?

I like it slightly less than my proposal, because it spreads the
arrangement over 3 files instead of just 2.  But if you want a hook so
badly, I won't fight.  Just put a comment in mule.el describing how we
arrange for this to start being in effect.

Thanks.





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands]
  2021-10-03 19:51                                       ` Eli Zaretskii
@ 2021-10-03 19:59                                         ` João Távora
  0 siblings, 0 replies; 37+ messages in thread
From: João Távora @ 2021-10-03 19:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50946

On Sun, Oct 3, 2021 at 8:51 PM Eli Zaretskii <eliz@gnu.org> wrote:

> > Given you dislike hooks, the patch with
> > hack-read-symbol-shorthands-function is below.  Looks good?
>
> I like it slightly less than my proposal, because it spreads the
> arrangement over 3 files instead of just 2.

No, you're not counting right, or I'm not. I count only 2 in my arrangement.

In my arrangement, loadup.el knows nothing about the agreement
between mule.el and shorthands.el. So it's the same, but without the
negative "inhibit" logic and mule.el knowing about a function in files.el,
which is loaded after it.

It's shorthands.el who knows about two things loaded before it: a variable
and a function, as it The Right Thing.  Which reminds me, shorthand.el
should (require 'mule)

>  But if you want a hook so badly, I won't fight.

I don't want it badly, I just think it's simpler and easier to reason about.
The version I propose uses a "-function" variable instead of a hook.

> Just put a comment in mule.el describing how we arrange for this to start
> being in effect.

I have, already.

João





^ permalink raw reply	[flat|nested] 37+ messages in thread

* bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands
  2021-10-02 15:02                 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
@ 2021-10-04  0:14                   ` Richard Stallman
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Stallman @ 2021-10-04  0:14 UTC (permalink / raw)
  To: João Távora; +Cc: 50946, monnier

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > You can reopen this bug if you want.  Eventually, hack-elisp-shorthands will
  > go into load-with-code-conversion or something just before it evaluates the
  > buffer (and thus calls 'read').

I think now is the time to do that.  hack-elisp-shorthands may have sufficed
to get this to be usable, but we should now clean up anything that needs it.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)







^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-10-04  0:14 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-01 17:10 bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands Alan Mackenzie
2021-10-01 17:53 ` Eli Zaretskii
2021-10-01 21:15   ` João Távora
2021-10-02  6:10     ` Eli Zaretskii
2021-10-02  0:48   ` João Távora
2021-10-02 10:50     ` Alan Mackenzie
2021-10-02 11:13       ` João Távora
2021-10-02 11:38       ` João Távora
2021-10-02 12:38         ` Alan Mackenzie
2021-10-02 12:52           ` Eli Zaretskii
2021-10-02 13:57             ` Alan Mackenzie
2021-10-02 14:19               ` Eli Zaretskii
2021-10-02 14:45                 ` Alan Mackenzie
2021-10-02 15:00                   ` Eli Zaretskii
2021-10-02 20:07                     ` Alan Mackenzie
2021-10-03 11:45                       ` Eli Zaretskii
2021-10-03 12:10                     ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] Alan Mackenzie
2021-10-03 12:40                       ` Eli Zaretskii
2021-10-03 13:33                         ` Alan Mackenzie
2021-10-03 15:04                         ` bug#50946: insert-file-contents can corrupt buffers Alan Mackenzie
2021-10-03 15:25                           ` Eli Zaretskii
2021-10-03 17:21                             ` Alan Mackenzie
2021-10-03 17:36                               ` Eli Zaretskii
2021-10-03 18:19                                 ` Alan Mackenzie
2021-10-03 15:34                         ` bug#50946: insert-file-contents can corrupt buffers. [Was: bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands] João Távora
2021-10-03 15:42                           ` João Távora
2021-10-03 15:56                           ` Eli Zaretskii
2021-10-03 16:02                             ` João Távora
2021-10-03 16:20                               ` Eli Zaretskii
2021-10-03 17:05                                 ` João Távora
2021-10-03 17:56                                   ` Eli Zaretskii
2021-10-03 18:59                                     ` João Távora
2021-10-03 19:51                                       ` Eli Zaretskii
2021-10-03 19:59                                         ` João Távora
2021-10-02 15:02                 ` bug#50946: Emacs-28: Inadequate coding in hack-elisp-shorthands João Távora
2021-10-04  0:14                   ` Richard Stallman
2021-10-02 14:47           ` João Távora

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).