Re: Emacs-devel Digest, Vol 246, Issue 17

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Emacs-devel Digest, Vol 246, Issue 17
       [not found] <mailman.39.1723910423.12184.emacs-devel@gnu.org>
@ 2024-08-17 22:49 ` ali_gnu2
  2024-08-18  0:10   ` Po Lu
  0 siblings, 1 reply; 137+ messages in thread
From: ali_gnu2 @ 2024-08-17 22:49 UTC (permalink / raw)
  To: emacs-devel

On 8/17/24 10:00 AM, emacs-devel-request@gnu.org wrote:
>> Moreover, whatever becomes of the portable ELF unexec should
>> not affect the Solaris unexec, which is provided by the operating system
>> and should function without the likes of gmalloc.
> AFAIK, the portable dumper is the default also on Solaris, so there is
> no need to keep the unexec build around just for that platform.

    The Solaris code that does that is called dldump() and was
invented years ago (~25 years?) to support emacs. We used to
get occasional bug reports about emacs not dumping from time
to time, and dldump() put an end to that.

I'm the person who maintains that code in Solaris, and also the
person who packages Emacs for our platform. We stopped using the
unexec code the moment the portable dumper arrived, and haven't
looked back. I don't think we'd even notice if unexec() went away.

There are open source variants of Solaris for whom I don't
speak, but from what I know about our common code, they should
not be any more stuck on unexec() than we are. pdumper really
doesn't use any unix features that didn't exist decades ago.

Thanks for caring, but don't let us slow this down. The portable
dumper is The Way.

- Ali

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Emacs-devel Digest, Vol 246, Issue 17
  2024-08-17 22:49 ` Emacs-devel Digest, Vol 246, Issue 17 ali_gnu2
@ 2024-08-18  0:10   ` Po Lu
  2024-08-18  0:19     ` Po Lu
                       ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Po Lu @ 2024-08-18  0:10 UTC (permalink / raw)
  To: ali_gnu2; +Cc: emacs-devel

ali_gnu2@emvision.com writes:

>    The Solaris code that does that is called dldump() and was
> invented years ago (~25 years?) to support emacs. We used to
> get occasional bug reports about emacs not dumping from time
> to time, and dldump() put an end to that.
>
> I'm the person who maintains that code in Solaris, and also the
> person who packages Emacs for our platform. We stopped using the
> unexec code the moment the portable dumper arrived, and haven't
> looked back. I don't think we'd even notice if unexec() went away.
>
> Thanks for caring, but don't let us slow this down. The portable
> dumper is The Way.
>
> - Ali

Hello Ali!

I think you underestimate the number of programs using dldump.  I've
seen both Perl 5 and GNU Make hacked to save state with dldump, on
Oracle Solaris, producing binaries that don't depend on the presence of
a state file and probably start faster as well.  Meanwhile
pdumper-dumped binaries appear to crash in an x86 Solaris 10 zone,
though I don't really use this configuration and I'm not interested in
trying the portable dumper on sparc:

core 'core' of 7021:        ../../src/bootstrap-emacs -batch --no-site-file --no-site-lisp -f batc
 00007fffaf433dc2 ???????? ()
 00007fffaf5eb3d7 ???????? ()
 00007fffaf5ec590 ???????? ()
 00007fffae3f351a _lwp_kill () + a
 00007fffae3981b9 raise () + 19
 00000000008baf90 terminate_due_to_signal () + c0
 000000000090236e ???????? ()
 0000000000902334 deliver_thread_signal () + 74
 00000000009023b0 deliver_fatal_thread_signal () + 10
 00000000009024ef handle_sigsegv () + 4f
 00007fffae3edd16 __sighndlr () + 6
 00007fffae3e25e2 call_user_handler () + 252
 00007fffae3e280e sigacthandler () + ee
 00007fffaf5ea82d ???????? ()
 ffffffffffffffff ???????? ()
 00000000009c77e7 lisp_align_malloc () + 4d7
 00000000009c9dd2 make_float () + 42
 00000000009d2e9d init_alloc () + d
 00000000008bd373 main () + bb3
 00000000006d15ab ???????? ()

> There are open source variants of Solaris for whom I don't
> speak, but from what I know about our common code, they should
> not be any more stuck on unexec() than we are. pdumper really
> doesn't use any unix features that didn't exist decades ago.

I don't believe we try to support Illumos.  If Emacs should work, more
power to them, but they have bigger fish to fry when GCC exception
handling fails if an exception is raised the instant an object is
unmapped, prompting dl_iterate_phdr to return -1.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Emacs-devel Digest, Vol 246, Issue 17
  2024-08-18  0:10   ` Po Lu
@ 2024-08-18  0:19     ` Po Lu
  2024-08-18  1:15     ` Solaris dldump (was: Pure space) ali_gnu2
  2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
  2 siblings, 0 replies; 137+ messages in thread
From: Po Lu @ 2024-08-18  0:19 UTC (permalink / raw)
  To: ali_gnu2; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> I don't believe we try to support Illumos.  If Emacs should work, more
> power to them, but they have bigger fish to fry when GCC exception
> handling fails if an exception is raised the instant an object is
> unmapped, prompting dl_iterate_phdr to return -1.

As when an iconv_t is closed while an exception is raised in another
thread.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Solaris dldump (was: Pure space)
  2024-08-18  0:10   ` Po Lu
  2024-08-18  0:19     ` Po Lu
@ 2024-08-18  1:15     ` ali_gnu2
  2024-08-18  1:25       ` Solaris dldump Po Lu
  2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 137+ messages in thread
From: ali_gnu2 @ 2024-08-18  1:15 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

On 8/17/24 6:10 PM, Po Lu wrote:
> Hello Ali!
> 
> I think you underestimate the number of programs using dldump.  I've
> seen both Perl 5 and GNU Make hacked to save state with dldump, on
> Oracle Solaris, producing binaries that don't depend on the presence of
> a state file and probably start faster as well.  Meanwhile
> pdumper-dumped binaries appear to crash in an x86 Solaris 10 zone,
> though I don't really use this configuration and I'm not interested in
> trying the portable dumper on sparc:
> 
> core 'core' of 7021:        ../../src/bootstrap-emacs -batch --no-site-file --no-site-lisp -f batc
>   00007fffaf433dc2 ???????? ()
>   00007fffaf5eb3d7 ???????? ()
>   00007fffaf5ec590 ???????? ()
>   00007fffae3f351a _lwp_kill () + a
>   00007fffae3981b9 raise () + 19
>   00000000008baf90 terminate_due_to_signal () + c0
>   000000000090236e ???????? ()
>   0000000000902334 deliver_thread_signal () + 74
>   00000000009023b0 deliver_fatal_thread_signal () + 10
>   00000000009024ef handle_sigsegv () + 4f
>   00007fffae3edd16 __sighndlr () + 6
>   00007fffae3e25e2 call_user_handler () + 252
>   00007fffae3e280e sigacthandler () + ee
>   00007fffaf5ea82d ???????? ()
>   ffffffffffffffff ???????? ()
>   00000000009c77e7 lisp_align_malloc () + 4d7
>   00000000009c9dd2 make_float () + 42
>   00000000009d2e9d init_alloc () + d
>   00000000008bd373 main () + bb3
>   00000000006d15ab ???????? ()
> 

Hello!

    Is that stack from the s10 zone?

You're probably right that I don't know who is using
dldump(), outside of emacs, but not to worry, it's not
going away. It's a committed interface, so hard to remove,
and at the same time, isn't causing any problems. Nonetheless,
it's not our favored way to deploy emacs, and I wouldn't want
anyone to think we prefer its use, or require it.

We use pdumper on newer Solaris 11.4, both x86 and sparc,
with no reported issues. I wasn't aware of the Solaris 10
zone problems (haven't seen any reports). If you end up
looking at it, and think that the s10 zone is somehow at
fault, please feel free to contact me offline. However,
given that s10 is 20 years old, it wouldn't be unreasonable
to drop it off the support tail. From discussions with Rainer
Orth, who maintains gcc for Solaris, I believe that s10 support
for gcc has ended, or is very close to ending. My personal
opinion is that anyone happy to use a 20 year old OS should
have no problem using an older gcc, or emacs, so it's not
really the end of the road for those folks.

>> There are open source variants of Solaris for whom I don't
>> speak, but from what I know about our common code, they should
>> not be any more stuck on unexec() than we are. pdumper really
>> doesn't use any unix features that didn't exist decades ago.
> 
> I don't believe we try to support Illumos.  If Emacs should work, more
> power to them, but they have bigger fish to fry when GCC exception
> handling fails if an exception is raised the instant an object is
> unmapped, prompting dl_iterate_phdr to return -1.

I expect that we're both benefiting from your work anyway.
Isn't emacs still largely C (not C++)? I wouldn't expect
exception handling to be needed, so maybe it's OK.

I do know that dl_iterate_phdr() is a relatively recent
addition for us, and was done after the split, so that
is a case where the code is not common. No doubt the fix
for Illumos would not be difficult, if/when they get to it.

Thanks!

- Ali

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18  1:15     ` Solaris dldump (was: Pure space) ali_gnu2
@ 2024-08-18  1:25       ` Po Lu
  2024-08-18 22:27         ` Stefan Kangas
  0 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-08-18  1:25 UTC (permalink / raw)
  To: ali_gnu2; +Cc: emacs-devel

ali_gnu2@emvision.com writes:

> On 8/17/24 6:10 PM, Po Lu wrote:
>> Hello Ali!
>> I think you underestimate the number of programs using dldump.  I've
>> seen both Perl 5 and GNU Make hacked to save state with dldump, on
>> Oracle Solaris, producing binaries that don't depend on the presence of
>> a state file and probably start faster as well.  Meanwhile
>> pdumper-dumped binaries appear to crash in an x86 Solaris 10 zone,
>> though I don't really use this configuration and I'm not interested in
>> trying the portable dumper on sparc:
>> core 'core' of 7021:        ../../src/bootstrap-emacs -batch
>> --no-site-file --no-site-lisp -f batc
>>   00007fffaf433dc2 ???????? ()
>>   00007fffaf5eb3d7 ???????? ()
>>   00007fffaf5ec590 ???????? ()
>>   00007fffae3f351a _lwp_kill () + a
>>   00007fffae3981b9 raise () + 19
>>   00000000008baf90 terminate_due_to_signal () + c0
>>   000000000090236e ???????? ()
>>   0000000000902334 deliver_thread_signal () + 74
>>   00000000009023b0 deliver_fatal_thread_signal () + 10
>>   00000000009024ef handle_sigsegv () + 4f
>>   00007fffae3edd16 __sighndlr () + 6
>>   00007fffae3e25e2 call_user_handler () + 252
>>   00007fffae3e280e sigacthandler () + ee
>>   00007fffaf5ea82d ???????? ()
>>   ffffffffffffffff ???????? ()
>>   00000000009c77e7 lisp_align_malloc () + 4d7
>>   00000000009c9dd2 make_float () + 42
>>   00000000009d2e9d init_alloc () + d
>>   00000000008bd373 main () + bb3
>>   00000000006d15ab ???????? ()
>> 
>
> Hello!
>
>    Is that stack from the s10 zone?

Yes.

> You're probably right that I don't know who is using
> dldump(), outside of emacs, but not to worry, it's not
> going away. It's a committed interface, so hard to remove,
> and at the same time, isn't causing any problems. Nonetheless,
> it's not our favored way to deploy emacs, and I wouldn't want
> anyone to think we prefer its use, or require it.
>
> We use pdumper on newer Solaris 11.4, both x86 and sparc,
> with no reported issues. I wasn't aware of the Solaris 10
> zone problems (haven't seen any reports). If you end up
> looking at it

I plan to, but not till Emacs 30 is released.

> and think that the s10 zone is somehow at fault, please feel free to
> contact me offline. However, given that s10 is 20 years old, it
> wouldn't be unreasonable to drop it off the support tail. From
> discussions with Rainer Orth, who maintains gcc for Solaris, I believe
> that s10 support for gcc has ended, or is very close to ending. My
> personal opinion is that anyone happy to use a 20 year old OS should
> have no problem using an older gcc, or emacs, so it's not really the
> end of the road for those folks.

I'm fine with using an older C compiler (whether GCC or no), but we have
plenty of precedent in these quarters for remaining on decades-old
operating systems.  Not least when the operating system is to be
supported for two more years.

> I expect that we're both benefiting from your work anyway.
> Isn't emacs still largely C (not C++)? I wouldn't expect
> exception handling to be needed, so maybe it's OK.

C, but several libraries draw in C++ dependencies and others create
threads: HarfBuzz and librsvg for example.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18  1:25       ` Solaris dldump Po Lu
@ 2024-08-18 22:27         ` Stefan Kangas
  2024-08-18 23:56           ` Po Lu
  0 siblings, 1 reply; 137+ messages in thread
From: Stefan Kangas @ 2024-08-18 22:27 UTC (permalink / raw)
  To: Po Lu, ali_gnu2; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> ali_gnu2@emvision.com writes:
>
>> [...] However, given that s10 is 20 years old, it
>> wouldn't be unreasonable to drop it off the support tail. From
>> discussions with Rainer Orth, who maintains gcc for Solaris, I believe
>> that s10 support for gcc has ended, or is very close to ending. My
>> personal opinion is that anyone happy to use a 20 year old OS should
>> have no problem using an older gcc, or emacs, so it's not really the
>> end of the road for those folks.

Thank you for sharing your informed opinion.  I also can't see why we
should consider the 20 year old Solaris 10 a blocker for removing the
unexec build in Emacs 31.

For example, even according to current Oracle communications, it will
reach EOL in around two years.  This means that by the time we release
Emacs 31, users will already be busy moving to Solaris 11.  If they
aren't, they're fine on Emacs 30, or they can help us fix pdumper.

> I'm fine with using an older C compiler (whether GCC or no), but we have
> plenty of precedent in these quarters for remaining on decades-old
> operating systems.  Not least when the operating system is to be
> supported for two more years.

If there is interest in that very old proprietary system, and there is
some problem with using pdumper there, then users should report bugs and
volunteers should step up to fix them.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18 22:27         ` Stefan Kangas
@ 2024-08-18 23:56           ` Po Lu
  2024-08-19 11:18             ` Eli Zaretskii
                               ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Po Lu @ 2024-08-18 23:56 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: ali_gnu2, emacs-devel

Stefan Kangas <stefankangas@gmail.com> writes:

> Thank you for sharing your informed opinion.  I also can't see why we
> should consider the 20 year old Solaris 10 a blocker for removing the
> unexec build in Emacs 31.
>
> For example, even according to current Oracle communications, it will
> reach EOL in around two years.

"Even" implies that it will reach EOL sooner, but by all indications the
EOL date will be as stated, if it is not postponed any further, and
Oracle and related organizations will continue to support the operating
system at a reduced intensity indefinitely.  Why do you suppose this is,
if otherwise than because the operating system is abundantly used?

Fedora 40's remaining support period is shorter; should we not cease to
support it any longer, in view of the one or two crashes in the PGTK
configuration that can only be reproduced with the distribution
packages, and which continue to languish on the bug tracker?

> This means that by the time we release Emacs 31, users will already be
> busy moving to Solaris 11

This is nonsense.  It's impossible to upgrade installed Solaris 10
systems to Solaris 11, and being a robust system many users are content
to remain there till hell freezes over.  

> If they aren't, they're fine on Emacs 30, or they can help us fix
> pdumper.

No one is ever "fine on" an outdated text editor.  You agree that this
principle applies to operating systems, but when Emacs is in question,
the about-face comes very quickly.

It's a waste of my time (and my organization's) that would be totally
needless if you were not so trigger-happy with old and proven features.
It's a-ok to retain pure space to avoid burdening someone with very
hypothetical additional labor, but it's not possible to take a far less
radical measure to conserve my time.  In any event, I promised to devote
some of it to this issue after Emacs 30 is released.

> If there is interest in that very old proprietary system, and there is
> some problem with using pdumper there, then users should report bugs and
> volunteers should step up to fix them.

According to Microsoft, Windows XP reached EOL in 2014, and yet its
users are none the less inclined to the latest releases of Emacs (nor
has it been prevented from retaining 0.38% of Windows's aggregate market
share, in excess of Windows 8's 0.24):

  https://gs.statcounter.com/os-version-market-share/windows/desktop/worldwide.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18 23:56           ` Po Lu
@ 2024-08-19 11:18             ` Eli Zaretskii
  2024-08-19 12:09               ` Po Lu
  2024-08-19 11:44             ` Pip Cet
  2024-08-19 20:35             ` Stefan Kangas
  2 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-08-19 11:18 UTC (permalink / raw)
  To: Po Lu; +Cc: stefankangas, ali_gnu2, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Mon, 19 Aug 2024 07:56:13 +0800
> 
> Stefan Kangas <stefankangas@gmail.com> writes:
> 
> > Thank you for sharing your informed opinion.  I also can't see why we
> > should consider the 20 year old Solaris 10 a blocker for removing the
> > unexec build in Emacs 31.
> >
> > For example, even according to current Oracle communications, it will
> > reach EOL in around two years.
> 
> "Even" implies that it will reach EOL sooner, but by all indications the
> EOL date will be as stated, if it is not postponed any further, and
> Oracle and related organizations will continue to support the operating
> system at a reduced intensity indefinitely.  Why do you suppose this is,
> if otherwise than because the operating system is abundantly used?
> 
> Fedora 40's remaining support period is shorter; should we not cease to
> support it any longer, in view of the one or two crashes in the PGTK
> configuration that can only be reproduced with the distribution
> packages, and which continue to languish on the bug tracker?

These aspects are almost unrelated to the issue at hand: we don't make
our decisions of dropping support of some platform or feature because
it is EOLed by its vendor or developers.  Instead, we make our own
decisions, and in general try not to drop any feature/platform if we
don't have to.

In this case, keeping the support of unexec longer becomes a
maintenance burden (just look at the #ifdef mess it requires), and
that is the reason why we think we should drop those platforms that
don't currently support pdumper.  The fact that all those platforms
are either very old or have better alternatives is just a supporting
consideration, not the main reason.

> It's a waste of my time (and my organization's) that would be totally
> needless if you were not so trigger-happy with old and proven features.

We are very far from being "trigger-happy" in these matters.  In fact,
we are often accused in the opposite.  E.g., Gnulib dropped support
for some of these platforms long ago, and couldn't be convinced to
reconsider, even when told that Emacs needs that continued support.

So what you say above is completely uncalled-for and unfair.

> It's a-ok to retain pure space to avoid burdening someone with very
> hypothetical additional labor, but it's not possible to take a far less
> radical measure to conserve my time.  In any event, I promised to devote
> some of it to this issue after Emacs 30 is released.

If you intend to work on modifying the unexec code to not use pure
space, don't waste your time: I will object to any serious development
of the unexec code.  The only way forward for the platforms that
currently need unexec is to start using pdumper.

> > If there is interest in that very old proprietary system, and there is
> > some problem with using pdumper there, then users should report bugs and
> > volunteers should step up to fix them.
> 
> According to Microsoft, Windows XP reached EOL in 2014, and yet its
> users are none the less inclined to the latest releases of Emacs (nor
> has it been prevented from retaining 0.38% of Windows's aggregate market
> share, in excess of Windows 8's 0.24):
> 
>   https://gs.statcounter.com/os-version-market-share/windows/desktop/worldwide.

Once again, it is immaterial when a platform was EOLed.  That is not
the reason why we want to drop unexec.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 11:18             ` Eli Zaretskii
@ 2024-08-19 12:09               ` Po Lu
  2024-08-19 12:50                 ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-08-19 12:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stefankangas, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> In this case, keeping the support of unexec longer becomes a
> maintenance burden (just look at the #ifdef mess it requires), and
> that is the reason why we think we should drop those platforms that
> don't currently support pdumper.  The fact that all those platforms
> are either very old or have better alternatives is just a supporting
> consideration, not the main reason.

You mean the 35 instances of "HAVE_UNEXEC" in C source files, not
excepting the "HAVE_PDUMPER || HAVE_UNEXEC" conditions, or the malloc
and Gnulib flags that aren't necessary on unexsol?  I would be as glad
as you to see most of them removed, as they are not significant on the
systems where unexec should be retained.

> If you intend to work on modifying the unexec code to not use pure
> space, don't waste your time: I will object to any serious development
> of the unexec code.  The only way forward for the platforms that
> currently need unexec is to start using pdumper.

I need not modify the unexec code, or adapt it to configurations without
pure space, as there simply is no code to adapt.  unexsol.c works _now_
with or without pure space, and I would be immensely surprised if the
same were not true of DJGPP, and as it happens, whether in Emacs or
elsewhere.

> Once again, it is immaterial when a platform was EOLed.  That is not
> the reason why we want to drop unexec.

That's not what I heard just one message removed from mine, where being
two years from EOL was stated to be sufficient grounds to withdraw
support.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 12:09               ` Po Lu
@ 2024-08-19 12:50                 ` Eli Zaretskii
  0 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-08-19 12:50 UTC (permalink / raw)
  To: Po Lu; +Cc: stefankangas, ali_gnu2, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: stefankangas@gmail.com,  ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Mon, 19 Aug 2024 20:09:36 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > In this case, keeping the support of unexec longer becomes a
> > maintenance burden (just look at the #ifdef mess it requires), and
> > that is the reason why we think we should drop those platforms that
> > don't currently support pdumper.  The fact that all those platforms
> > are either very old or have better alternatives is just a supporting
> > consideration, not the main reason.
> 
> You mean the 35 instances of "HAVE_UNEXEC" in C source files, not
> excepting the "HAVE_PDUMPER || HAVE_UNEXEC" conditions, or the malloc
> and Gnulib flags that aren't necessary on unexsol?

I mean all of them, and I also mean the need to understand the fine
details of unexec, the differences between it and pdumper mode, and
the reason for some tricky code we need for unexec.  Most of current
frequent contributors to Emacs have no idea about that, and thus the
unexec build is very easy to break by some change that doesn't take it
into account.

> I would be as glad as you to see most of them removed, as they are
> not significant on the systems where unexec should be retained.

They are necessary.

> > Once again, it is immaterial when a platform was EOLed.  That is not
> > the reason why we want to drop unexec.
> 
> That's not what I heard just one message removed from mine

You've misunderstood what Stefan meant.  He was just responding to
your message, nothing more nothing less.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18 23:56           ` Po Lu
  2024-08-19 11:18             ` Eli Zaretskii
@ 2024-08-19 11:44             ` Pip Cet
  2024-08-19 11:57               ` Po Lu
  2024-08-19 20:35             ` Stefan Kangas
  2 siblings, 1 reply; 137+ messages in thread
From: Pip Cet @ 2024-08-19 11:44 UTC (permalink / raw)
  To: Po Lu; +Cc: Stefan Kangas, ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> It's a-ok to retain pure space to avoid burdening someone with very
> hypothetical additional labor

Wait, I'm not sure I understand that part.  How does removing pure space
burden anyone with additional labor, hypothetical or not?

Also, do the systems that don't support pdumper but do support unexec
work without dumping, when running temacs directly? It takes very long
to build Emacs that way, but since we're talking non-free operating
systems it might be acceptable to ask people to cross-compile for now,
kind of like we do for the Android builds where the .elc files are
generated on the build systems.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 11:44             ` Pip Cet
@ 2024-08-19 11:57               ` Po Lu
  2024-08-19 12:10                 ` Pip Cet
  0 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-08-19 11:57 UTC (permalink / raw)
  To: Pip Cet; +Cc: Stefan Kangas, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> Wait, I'm not sure I understand that part.  How does removing pure space
> burden anyone with additional labor, hypothetical or not?

Isn't this theoretical burden the reason that pure space is not to be
removed except along with unexec?

> Also, do the systems that don't support pdumper but do support unexec
> work without dumping, when running temacs directly? It takes very long
> to build Emacs that way, but since we're talking non-free operating
> systems it might be acceptable to ask people to cross-compile for now,
> kind of like we do for the Android builds where the .elc files are
> generated on the build systems.

There is a substantial segment of our users who don't expect Emacs to
start in 5+ seconds, if only judging by the hullabaloo that erupts
whenever startup performance is threatened or even mildly retarded.
Even in the Android port, this penalty is paid once on installation and
a dump file is retained for subsequent initializations of the same
binary.

Anyway, I want pure space gone as much as any of us, I just don't agree
that taking unexec down with it is justified.  Maybe the ELF, XCOFF, and
Windows unexecs, but not the Solaris or DOS ones.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 11:57               ` Po Lu
@ 2024-08-19 12:10                 ` Pip Cet
  2024-08-19 12:55                   ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Pip Cet @ 2024-08-19 12:10 UTC (permalink / raw)
  To: Po Lu; +Cc: Stefan Kangas, ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:
> Pip Cet <pipcet@protonmail.com> writes:
>> Wait, I'm not sure I understand that part.  How does removing pure space
>> burden anyone with additional labor, hypothetical or not?
>
> Isn't this theoretical burden the reason that pure space is not to be
> removed except along with unexec?

Maybe a compromise would be to keep unexec but put it on probation,
promising to remove it if problems arise that cannot be convincingly and
immediately fixed?

>> Also, do the systems that don't support pdumper but do support unexec
>> work without dumping, when running temacs directly? It takes very long
>> to build Emacs that way, but since we're talking non-free operating
>> systems it might be acceptable to ask people to cross-compile for now,
>> kind of like we do for the Android builds where the .elc files are
>> generated on the build systems.
>
> There is a substantial segment of our users who don't expect Emacs to
> start in 5+ seconds, if only judging by the hullabaloo that erupts
> whenever startup performance is threatened or even mildly retarded.

I agree, but I also think some compromise will have to be found.

> Even in the Android port, this penalty is paid once on installation and
> a dump file is retained for subsequent initializations of the same
> binary.

A very clever hack, I must say!

> Anyway, I want pure space gone as much as any of us, I just don't agree
> that taking unexec down with it is justified.  Maybe the ELF, XCOFF, and
> Windows unexecs, but not the Solaris or DOS ones.

DOS in particular is what triggered my question: given the limitations
of DOS systems, it's quite possible temacs-as-emacs just wouldn't fly on
those machines.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 12:10                 ` Pip Cet
@ 2024-08-19 12:55                   ` Eli Zaretskii
  2024-08-19 13:46                     ` Pip Cet
  0 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-08-19 12:55 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, stefankangas, ali_gnu2, emacs-devel

> Date: Mon, 19 Aug 2024 12:10:19 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>, ali_gnu2@emvision.com,
>  emacs-devel@gnu.org
> 
> "Po Lu" <luangruo@yahoo.com> writes:
> > Pip Cet <pipcet@protonmail.com> writes:
> >> Wait, I'm not sure I understand that part.  How does removing pure space
> >> burden anyone with additional labor, hypothetical or not?
> >
> > Isn't this theoretical burden the reason that pure space is not to be
> > removed except along with unexec?
> 
> Maybe a compromise would be to keep unexec but put it on probation,
> promising to remove it if problems arise that cannot be convincingly and
> immediately fixed?

That'd just add to code churn and maintenance burden.  So I prefer
removing it to begin with.

> > Anyway, I want pure space gone as much as any of us, I just don't agree
> > that taking unexec down with it is justified.  Maybe the ELF, XCOFF, and
> > Windows unexecs, but not the Solaris or DOS ones.
> 
> DOS in particular is what triggered my question: given the limitations
> of DOS systems, it's quite possible temacs-as-emacs just wouldn't fly on
> those machines.

Those limitations are not relevant in our case.

But this all is besides the point.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 12:55                   ` Eli Zaretskii
@ 2024-08-19 13:46                     ` Pip Cet
  2024-08-19 14:39                       ` Eli Zaretskii
  2024-08-19 20:51                       ` Stefan Kangas
  0 siblings, 2 replies; 137+ messages in thread
From: Pip Cet @ 2024-08-19 13:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, stefankangas, ali_gnu2, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Mon, 19 Aug 2024 12:10:19 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>, ali_gnu2@emvision.com,
>>  emacs-devel@gnu.org
>>
>> "Po Lu" <luangruo@yahoo.com> writes:
>> > Pip Cet <pipcet@protonmail.com> writes:
>> >> Wait, I'm not sure I understand that part.  How does removing pure space
>> >> burden anyone with additional labor, hypothetical or not?
>> >
>> > Isn't this theoretical burden the reason that pure space is not to be
>> > removed except along with unexec?
>>
>> Maybe a compromise would be to keep unexec but put it on probation,
>> promising to remove it if problems arise that cannot be convincingly and
>> immediately fixed?
>
> That'd just add to code churn and maintenance burden.  So I prefer
> removing it to begin with.

I've just gone through configure.ac removing all the code that depends
on unexec (no doubt I've missed some), and I must say I now agree it is
time for unexec to go.  In particular, it had so far escaped my
attention that it's incompatible with native compilation!

So I'll update the scratch/no-purespace branch to also remove unexec,
and of course I'm offering to help anyone who wants to fix the remaining
non-pdumper ports.

And while I am skeptical of the value of ASLR, it wuold be really
embarrassing to run into a security issue that's exploitable only
because Emacs disables ASLR for unexec builds.

>> > Anyway, I want pure space gone as much as any of us, I just don't agree
>> > that taking unexec down with it is justified.  Maybe the ELF, XCOFF, and
>> > Windows unexecs, but not the Solaris or DOS ones.
>>
>> DOS in particular is what triggered my question: given the limitations
>> of DOS systems, it's quite possible temacs-as-emacs just wouldn't fly on
>> those machines.
>
> Those limitations are not relevant in our case.

I think it's relevant whether DOS will become completely unusable or
merely difficult to use once unexec is removed, and what can be done to
fix it.

Does the DOS port work on free DOS clones?  And is there a way to gain
access to a Solaris machine to fix pdumper on it?

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 13:46                     ` Pip Cet
@ 2024-08-19 14:39                       ` Eli Zaretskii
  2024-08-19 15:26                         ` Corwin Brust
  2024-08-19 20:51                       ` Stefan Kangas
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-08-19 14:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, stefankangas, ali_gnu2, emacs-devel

> Date: Mon, 19 Aug 2024 13:46:11 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: luangruo@yahoo.com, stefankangas@gmail.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> I've just gone through configure.ac removing all the code that depends
> on unexec (no doubt I've missed some), and I must say I now agree it is
> time for unexec to go.  In particular, it had so far escaped my
> attention that it's incompatible with native compilation!

All the major new feature don't support unexec; native-compilation was
just the first.

> And while I am skeptical of the value of ASLR, it wuold be really
> embarrassing to run into a security issue that's exploitable only
> because Emacs disables ASLR for unexec builds.

We disable ASLR only during the build, AFAIR, not when we run the
dumped Emacs.

> >> DOS in particular is what triggered my question: given the limitations
> >> of DOS systems, it's quite possible temacs-as-emacs just wouldn't fly on
> >> those machines.
> >
> > Those limitations are not relevant in our case.
> 
> I think it's relevant whether DOS will become completely unusable or
> merely difficult to use once unexec is removed, and what can be done to
> fix it.

I was talking specifically about running temacs.

> Does the DOS port work on free DOS clones?

Yes, AFAIK (although I myself only run the DOS port on Windows for
many years now).

> And is there a way to gain access to a Solaris machine to fix
> pdumper on it?

No idea, perhaps Po Lu does.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 14:39                       ` Eli Zaretskii
@ 2024-08-19 15:26                         ` Corwin Brust
  2024-08-19 15:31                           ` Corwin Brust
  0 siblings, 1 reply; 137+ messages in thread
From: Corwin Brust @ 2024-08-19 15:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, luangruo, stefankangas, ali_gnu2, emacs-devel

On Mon, Aug 19, 2024 at 9:39 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > Date: Mon, 19 Aug 2024 13:46:11 +0000
> > From: Pip Cet <pipcet@protonmail.com>
> > Cc: luangruo@yahoo.com, stefankangas@gmail.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> >
>
> > And is there a way to gain access to a Solaris machine to fix
> > pdumper on it?
>
> No idea, perhaps Po Lu does.
>

It may be worth reaching out the GCC devs/testers - a peer on the FSF
sysadmin team thinks they may have a few machines running Solaris (but
isn't sure if 10 or 11 or both).



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 15:26                         ` Corwin Brust
@ 2024-08-19 15:31                           ` Corwin Brust
  0 siblings, 0 replies; 137+ messages in thread
From: Corwin Brust @ 2024-08-19 15:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, luangruo, stefankangas, ali_gnu2, emacs-devel

On Mon, Aug 19, 2024 at 10:26 AM Corwin Brust <corwin@bru.st> wrote:
>
> It may be worth reaching out the GCC devs/testers - a peer on the FSF
> sysadmin team thinks they may have a few machines running Solaris (but
> isn't sure if 10 or 11 or both).

More information!

cfarm210: Solaris 10

See:  https://portal.cfarm.net/machines/list/



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-19 13:46                     ` Pip Cet
  2024-08-19 14:39                       ` Eli Zaretskii
@ 2024-08-19 20:51                       ` Stefan Kangas
  1 sibling, 0 replies; 137+ messages in thread
From: Stefan Kangas @ 2024-08-19 20:51 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> I've just gone through configure.ac removing all the code that depends
> on unexec (no doubt I've missed some), and I must say I now agree it is
> time for unexec to go.  In particular, it had so far escaped my
> attention that it's incompatible with native compilation!
>
> So I'll update the scratch/no-purespace branch to also remove unexec,
> and of course I'm offering to help anyone who wants to fix the remaining
> non-pdumper ports.

Thanks for working on this.  Please push the branch to Savannah when
its ready, and let's take it from there.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Solaris dldump
  2024-08-18 23:56           ` Po Lu
  2024-08-19 11:18             ` Eli Zaretskii
  2024-08-19 11:44             ` Pip Cet
@ 2024-08-19 20:35             ` Stefan Kangas
  2 siblings, 0 replies; 137+ messages in thread
From: Stefan Kangas @ 2024-08-19 20:35 UTC (permalink / raw)
  To: Po Lu; +Cc: ali_gnu2, emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> Fedora 40's remaining support period is shorter; should we not cease to
> support it any longer,

We seem to be miscommunicating.  I'm not suggesting explicitly
desupporting Solaris 10.

Spending considerable time on keeping the unexec build alive has stopped
making sense for the project as a whole.  The good news is that there is
nothing to suggest that the portable dumper should not be fixable on the
systems where it's reportedly not yet up to scratch.

I'm saying 1) that the reported problems with the portable dumper on
some proprietary systems (MS-DOS, Windows 98, Solaris 10) should be
fixed, 2) that I do not consider this blocking us from dropping the
unexec build at the present time, and 3) I urged volunteers to step
forward to improve and/or fix pdumper on these systems.

I recommend reading the "Information for Maintainers of GNU Software"
manual to get a better view of some of the principles that are guiding
my thinking:

    https://www.gnu.org/prep/maintain/maintain.html#Platforms

Note in particular this part:

    "Supporting other platforms is optional -- we do it when that seems
    like a good idea, but we don’t consider it obligatory.  If the users
    don’t take care of a certain platform, you may have to desupport it
    unless and until users come forward to help.  Conversely, if a user
    offers changes to support an additional platform, you will probably
    want to install them, but you don’t have to.  If you feel the
    changes are complex and ugly, if you think that they will increase
    the burden of future maintenance, you can and should reject them.
    This includes both free or mainly-free platforms such as OpenBSD,
    FreeBSD, and NetBSD, and nonfree platforms such as Windows."

These are the basics, applicable to the GNU project as a whole.  There
are special considerations for Emacs, of course, some of which have been
indicated in this thread.  For example, we are probably "best-in-class"
among GNU projects when it comes to supporting various platforms, even
very old/obsolete ones, and at the cost of valuable time and resources.

So don't let anyone believe that we rush to leave (even fringe) groups
of users behind for no good reason.  That's just not the case.  But we
do have certain things that we prioritize ahead of others.  That
sometimes means asking for volunteer help to do things that are merely
secondary, if that helps us advance our primary goals.

Right now, it's very clear that the unexec build has reached the end of
the road.  Thus, we must ask volunteers to help us improve pdumper on
systems that, with respect to existing users, are not currently primary
considerations.

I hope that helps make things more clear.

> In any event, I promised to devote some of it to this issue after
> Emacs 30 is released.

That is good and welcome.  Thanks in advance for your efforts.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* pdumper on Solaris 10
  2024-08-18  0:10   ` Po Lu
  2024-08-18  0:19     ` Po Lu
  2024-08-18  1:15     ` Solaris dldump (was: Pure space) ali_gnu2
@ 2024-12-08 12:17     ` Pip Cet via Emacs development discussions.
  2024-12-08 13:05       ` Eli Zaretskii
                         ` (2 more replies)
  2 siblings, 3 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-08 12:17 UTC (permalink / raw)
  To: Po Lu; +Cc: ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> pdumper-dumped binaries appear to crash in an x86 Solaris 10 zone,
> though I don't really use this configuration and I'm not interested in
> trying the portable dumper on sparc:
>
> core 'core' of 7021:        ../../src/bootstrap-emacs -batch --no-site-file --no-site-lisp -f batc
>  00007fffaf433dc2 ???????? ()
>  00007fffaf5eb3d7 ???????? ()
>  00007fffaf5ec590 ???????? ()
>  00007fffae3f351a _lwp_kill () + a
>  00007fffae3981b9 raise () + 19
>  00000000008baf90 terminate_due_to_signal () + c0
>  000000000090236e ???????? ()
>  0000000000902334 deliver_thread_signal () + 74
>  00000000009023b0 deliver_fatal_thread_signal () + 10
>  00000000009024ef handle_sigsegv () + 4f
>  00007fffae3edd16 __sighndlr () + 6
>  00007fffae3e25e2 call_user_handler () + 252
>  00007fffae3e280e sigacthandler () + ee
>  00007fffaf5ea82d ???????? ()
>  ffffffffffffffff ???????? ()
>  00000000009c77e7 lisp_align_malloc () + 4d7
>  00000000009c9dd2 make_float () + 42
>  00000000009d2e9d init_alloc () + d
>  00000000008bd373 main () + bb3
>  00000000006d15ab ???????? ()

FWIW, this issue doesn't appear to happen on a "fresh" Solaris 10
install, in a qemu virtual machine, on x86.  I used the
sol-10-u11-ga-x86-dvd.iso image, installed to a new disk, then installed
OpenCSW and built Emacs from the master branch with and without
CFLAGS="-m64" (plus the linker path selection). Both builds appear to
work.

What's odd about that backtrace is that lisp_align_malloc in the current
build is only 435 bytes long (with -m64), so it's hard to guess
which part of the alignment code used to be at offset 0x4d7.

But while we're talking about rare and unusual systems, !USE_LSB builds
are currently broken except for the WIDE_EMACS_INT case, because the
stack scanning code makes no attempt to remove MSB tags.  It may be time
to simply remove MSB tag support, unless there are systems around that
actually fail to align static objects to 8-byte boundaries (but such
systems would have been broken for a while now).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
@ 2024-12-08 13:05       ` Eli Zaretskii
  2024-12-08 13:52         ` Pip Cet via Emacs development discussions.
  2024-12-09  0:58       ` Po Lu
  2024-12-09  1:01       ` Po Lu
  2 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-08 13:05 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

> Date: Sun, 08 Dec 2024 12:17:05 +0000
> Cc: ali_gnu2@emvision.com, emacs-devel@gnu.org
> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> 
> "Po Lu" <luangruo@yahoo.com> writes:
> 
> But while we're talking about rare and unusual systems, !USE_LSB builds
> are currently broken except for the WIDE_EMACS_INT case, because the
> stack scanning code makes no attempt to remove MSB tags.

Which builds except WIDE_EMACS_INT need to use !USE_LSB?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 13:05       ` Eli Zaretskii
@ 2024-12-08 13:52         ` Pip Cet via Emacs development discussions.
  2024-12-08 14:52           ` Eli Zaretskii
  2024-12-09  1:08           ` Po Lu
  0 siblings, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-08 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sun, 08 Dec 2024 12:17:05 +0000
>> Cc: ali_gnu2@emvision.com, emacs-devel@gnu.org
>> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>>
>> "Po Lu" <luangruo@yahoo.com> writes:
>>
>> But while we're talking about rare and unusual systems, !USE_LSB builds
>> are currently broken except for the WIDE_EMACS_INT case, because the
>> stack scanning code makes no attempt to remove MSB tags.
>
> Which builds except WIDE_EMACS_INT need to use !USE_LSB?

The only platforms that "need" to use !USE_LSB are those that cannot
guarantee 8-byte alignment for static objects, which is why I asked
about those.  If those exist, we should have received bug reports
indicating that !WIDE_EMACS_INT builds don't work on such platforms.

In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
currently does is a very questionable optimization at best (fixnum
manipulation may be very slightly faster with !USE_LSB, but pointer
manipulation will be slower and requires extra registers, which is an
issue on i386).

For example, NILP() would only need to look at a single 32-bit word for
the WIDE_EMACS_INT + USE_LSB configuration. I strongly suspect that
effect alone would make WIDE_EMACS_INT + USE_LSB faster than
WIDE_EMACS_INT + !USE_LSB (of course, the relevant optimization would
have to be made first).

(Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
as I can tell, the justification for its continued existence is that
some C code assumes buffer positions are fixnums (and, because we expose
fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
had implemented fixnums to be transparent, we could simply remove
WIDE_EMACS_INT, but that mistake has been made...)

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 13:52         ` Pip Cet via Emacs development discussions.
@ 2024-12-08 14:52           ` Eli Zaretskii
  2024-12-08 16:17             ` Pip Cet via Emacs development discussions.
  2024-12-09  1:08           ` Po Lu
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-08 14:52 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

> Date: Sun, 08 Dec 2024 13:52:09 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > Which builds except WIDE_EMACS_INT need to use !USE_LSB?
> 
> The only platforms that "need" to use !USE_LSB are those that cannot
> guarantee 8-byte alignment for static objects, which is why I asked
> about those.

That means: none, AFAIK.  At least not given the platforms we
currently support.  So it's little wonder that configuration had
bit-rotten.

> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
> currently does is a very questionable optimization at best (fixnum
> manipulation may be very slightly faster with !USE_LSB, but pointer
> manipulation will be slower and requires extra registers, which is an
> issue on i386).

Where can one find i386 these days, except in a museum?

> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
> as I can tell, the justification for its continued existence is that
> some C code assumes buffer positions are fixnums (and, because we expose
> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
> had implemented fixnums to be transparent, we could simply remove
> WIDE_EMACS_INT, but that mistake has been made...)

I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not
recommended ;-)

In fact, one of my strongest reservations about the igc branch is that
it will most probably force me to lose WIDE_EMACS_INT.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 14:52           ` Eli Zaretskii
@ 2024-12-08 16:17             ` Pip Cet via Emacs development discussions.
  2024-12-08 16:49               ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-08 16:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sun, 08 Dec 2024 13:52:09 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > Which builds except WIDE_EMACS_INT need to use !USE_LSB?
>>
>> The only platforms that "need" to use !USE_LSB are those that cannot
>> guarantee 8-byte alignment for static objects, which is why I asked
>> about those.
>
> That means: none, AFAIK.  At least not given the platforms we
> currently support.  So it's little wonder that configuration had
> bit-rotten.

So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?

>> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
>> currently does is a very questionable optimization at best (fixnum
>> manipulation may be very slightly faster with !USE_LSB, but pointer
>> manipulation will be slower and requires extra registers, which is an
>> issue on i386).
>
> Where can one find i386 these days, except in a museum?

I meant all x86 systems using the 32-bit instruction set (and, in
particular, its limited exposed register set).  Those will be around for
a while.

>> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
>> as I can tell, the justification for its continued existence is that
>> some C code assumes buffer positions are fixnums (and, because we expose
>> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
>> had implemented fixnums to be transparent, we could simply remove
>> WIDE_EMACS_INT, but that mistake has been made...)
>
> I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not
> recommended ;-)

I don't think you should be happy; WIDE_EMACS_INT is sadly necessary to
support buffers > 512MB on 32-bit systems, but you're wasting 32 bits
for almost every Lisp_Object, and registers as well.

As 32-bit systems go away, it will become harder to write Lisp code that
works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to
make WIDE_EMACS_INT the default at some point.

> In fact, one of my strongest reservations about the igc branch is that
> it will most probably force me to lose WIDE_EMACS_INT.

I believe that problem is exclusively due to the fact that
WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
use WIDE_EMACS_INT normally in MPS builds, I think.

(This is somewhat theoretical because I can't build mingw32 Emacs right
now; https://dl.osdn.net alternates between being entirely unreachable
and responding with an expired certificate.)

The "low-hanging fruit" performance improvements USE_LSB allows for
(faster stack scanning during GC and many places which don't need to
look at the MSB word at all) are, I think, real, while the way in which
!USE_LSB is superior (we dereference pointer words without having to
untag them first) may reduce code size slightly, but shouldn't really
affect performance.

Of course, if we set out to do so, 32-bit Emacs could be optimized in
many other ways, but it's too late for that.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 16:17             ` Pip Cet via Emacs development discussions.
@ 2024-12-08 16:49               ` Eli Zaretskii
  2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
                                   ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-08 16:49 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

> Date: Sun, 08 Dec 2024 16:17:53 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> Date: Sun, 08 Dec 2024 13:52:09 +0000
> >> From: Pip Cet <pipcet@protonmail.com>
> >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> >>
> >> "Eli Zaretskii" <eliz@gnu.org> writes:
> >>
> >> > Which builds except WIDE_EMACS_INT need to use !USE_LSB?
> >>
> >> The only platforms that "need" to use !USE_LSB are those that cannot
> >> guarantee 8-byte alignment for static objects, which is why I asked
> >> about those.
> >
> > That means: none, AFAIK.  At least not given the platforms we
> > currently support.  So it's little wonder that configuration had
> > bit-rotten.
> 
> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?

That'd be a waste of effort.  What we have now works, and works well.
I'm not interested in throwing away a lot of hard work which got us to
where we are with WIDE_EMACS_INT, for advantages which I'm not sure
even exist, let alone are significant.

Those bits are unused in the WIDE_EMACS_INT build, so using them is a
no-brainer, IMO.

> >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
> >> currently does is a very questionable optimization at best (fixnum
> >> manipulation may be very slightly faster with !USE_LSB, but pointer
> >> manipulation will be slower and requires extra registers, which is an
> >> issue on i386).
> >
> > Where can one find i386 these days, except in a museum?
> 
> I meant all x86 systems using the 32-bit instruction set (and, in
> particular, its limited exposed register set).  Those will be around for
> a while.

Modern x86 CPUs can handle 64-bit values just fine, thank you.

> >> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
> >> as I can tell, the justification for its continued existence is that
> >> some C code assumes buffer positions are fixnums (and, because we expose
> >> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
> >> had implemented fixnums to be transparent, we could simply remove
> >> WIDE_EMACS_INT, but that mistake has been made...)
> >
> > I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not
> > recommended ;-)
> 
> I don't think you should be happy; WIDE_EMACS_INT is sadly necessary to
> support buffers > 512MB on 32-bit systems, but you're wasting 32 bits
> for almost every Lisp_Object, and registers as well.

Why should I care?  It isn't like each wasted bit comes with some
monetary fine, does it?

> As 32-bit systems go away, it will become harder to write Lisp code that
> works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to
> make WIDE_EMACS_INT the default at some point.

If you are trying to convince me to switch to 64-bit development
environment, you are wasting your time.  I have my very good reasons,
and don't plan on doing so any time soon.

And 64-but Windows supports 32-bit code perfectly for my needs.

> > In fact, one of my strongest reservations about the igc branch is that
> > it will most probably force me to lose WIDE_EMACS_INT.
> 
> I believe that problem is exclusively due to the fact that
> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
> use WIDE_EMACS_INT normally in MPS builds, I think.

No, there's also a built-in assumption in MPS about the size of a
word.

> The "low-hanging fruit" performance improvements USE_LSB allows for
> (faster stack scanning during GC and many places which don't need to
> look at the MSB word at all) are, I think, real, while the way in which
> !USE_LSB is superior (we dereference pointer words without having to
> untag them first) may reduce code size slightly, but shouldn't really
> affect performance.

I have no problems with performance that I can report, so I don't
expect anyone to waste time and effort on these optimizations.  We
have enough real problems for the resources we have.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 16:49               ` Eli Zaretskii
@ 2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
  2024-12-08 18:41                   ` Eli Zaretskii
  2024-12-08 18:47                 ` Pip Cet via Emacs development discussions.
  2024-12-09  1:13                 ` Po Lu
  2 siblings, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-08 17:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
>
> That'd be a waste of effort.

It'd be a good investment of effort today, in exchange for making the GC
code significantly easier to understand and maintain in the future. It
would certainly not be without its benefits, so calling it a "waste of
effort" is unfair.

> I'm not interested in throwing away a lot of hard work which got us to
> where we are with WIDE_EMACS_INT, for advantages which I'm not sure
> even exist, let alone are significant.

I think maintainability of the GC code is significant.

> Those bits are unused in the WIDE_EMACS_INT build, so using them is a
> no-brainer, IMO.

As are the low-order bits of pointers, which have the advantage of
already being present in the 32-bit register rather than needing a
second register.

>> >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
>> >> currently does is a very questionable optimization at best (fixnum
>> >> manipulation may be very slightly faster with !USE_LSB, but pointer
>> >> manipulation will be slower and requires extra registers, which is an
>> >> issue on i386).
>> >
>> > Where can one find i386 these days, except in a museum?
>>
>> I meant all x86 systems using the 32-bit instruction set (and, in
>> particular, its limited exposed register set).  Those will be around for
>> a while.
>
> Modern x86 CPUs can handle 64-bit values just fine, thank you.

Modern x86 CPUs running 32-bit code (x86, not x32) still need two
register names for each 64-bit value.  With 8 GPRs, that's a significant
problem. So, no, "just fine" isn't accurate here.

>> >> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
>> >> as I can tell, the justification for its continued existence is that
>> >> some C code assumes buffer positions are fixnums (and, because we expose
>> >> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
>> >> had implemented fixnums to be transparent, we could simply remove
>> >> WIDE_EMACS_INT, but that mistake has been made...)
>> >
>> > I'm a very happy user of WIDE_EMACS_INT, so bad-mouthing it is not
>> > recommended ;-)
>>
>> I don't think you should be happy; WIDE_EMACS_INT is sadly necessary to
>> support buffers > 512MB on 32-bit systems, but you're wasting 32 bits
>> for almost every Lisp_Object, and registers as well.
>
> Why should I care?  It isn't like each wasted bit comes with some
> monetary fine, does it?

I think most users of 32-bit systems at this stage do care about wasting
a lot of memory, even if you personally don't.

>> As 32-bit systems go away, it will become harder to write Lisp code that
>> works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to
>> make WIDE_EMACS_INT the default at some point.
>
> If you are trying to convince me to switch to 64-bit development
> environment, you are wasting your time.  I have my very good reasons,
> and don't plan on doing so any time soon.

I wasn't, and I'm not sure how you got the impression I was. I meant
what I said, that we may have to give up on !WIDE_EMACS_INT 32-bit
builds at some point. As you're using WIDE_EMACS_INT already, this
wouldn't affect you.

>> > In fact, one of my strongest reservations about the igc branch is that
>> > it will most probably force me to lose WIDE_EMACS_INT.
>>
>> I believe that problem is exclusively due to the fact that
>> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
>> use WIDE_EMACS_INT normally in MPS builds, I think.
>
> No, there's also a built-in assumption in MPS about the size of a
> word.

That's very vague. If there is an assumption that EMACS_INT ==
mps_word_t, it would certainly not be built into MPS, which doesn't know
about EMACS_INT at all.  But as it is, I have no idea where you even
suspect this "built-in" assumption is made.

>> The "low-hanging fruit" performance improvements USE_LSB allows for
>> (faster stack scanning during GC and many places which don't need to
>> look at the MSB word at all) are, I think, real, while the way in which
>> !USE_LSB is superior (we dereference pointer words without having to
>> untag them first) may reduce code size slightly, but shouldn't really
>> affect performance.
>
> I have no problems with performance that I can report, so I don't
> expect anyone to waste time and effort on these optimizations.  We
> have enough real problems for the resources we have.

If performance and wasted memory aren't issues, then it's a tradeoff
between leaving old code untouched and simplifying it to enable future
development.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
@ 2024-12-08 18:41                   ` Eli Zaretskii
  2024-12-08 19:15                     ` Gerd Möllmann
                                       ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-08 18:41 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

> Date: Sun, 08 Dec 2024 17:37:50 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
> >
> > That'd be a waste of effort.
> 
> It'd be a good investment of effort today, in exchange for making the GC
> code significantly easier to understand and maintain in the future. It
> would certainly not be without its benefits, so calling it a "waste of
> effort" is unfair.

I disagree.  We've lived with this GC code for a long time, and I
don't see any complications due to !USE_LSB.  And if we are going to
switch to igc at some point, investment in GC is even less sensible.

I don't see what's unfair in making my position clear.

> > I'm not interested in throwing away a lot of hard work which got us to
> > where we are with WIDE_EMACS_INT, for advantages which I'm not sure
> > even exist, let alone are significant.
> 
> I think maintainability of the GC code is significant.

It is, but there are no significant issues there at this time due to
!USE_LSB.

> > Those bits are unused in the WIDE_EMACS_INT build, so using them is a
> > no-brainer, IMO.
> 
> As are the low-order bits of pointers, which have the advantage of
> already being present in the 32-bit register rather than needing a
> second register.

What's your point?  The !USE_LSB ode works, the one you suggest needs
to be written and debugged.

> >> >> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
> >> >> currently does is a very questionable optimization at best (fixnum
> >> >> manipulation may be very slightly faster with !USE_LSB, but pointer
> >> >> manipulation will be slower and requires extra registers, which is an
> >> >> issue on i386).
> >> >
> >> > Where can one find i386 these days, except in a museum?
> >>
> >> I meant all x86 systems using the 32-bit instruction set (and, in
> >> particular, its limited exposed register set).  Those will be around for
> >> a while.
> >
> > Modern x86 CPUs can handle 64-bit values just fine, thank you.
> 
> Modern x86 CPUs running 32-bit code (x86, not x32) still need two
> register names for each 64-bit value.  With 8 GPRs, that's a significant
> problem. So, no, "just fine" isn't accurate here.

I again disagree.  And you forget other registers.

> >> > In fact, one of my strongest reservations about the igc branch is that
> >> > it will most probably force me to lose WIDE_EMACS_INT.
> >>
> >> I believe that problem is exclusively due to the fact that
> >> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
> >> use WIDE_EMACS_INT normally in MPS builds, I think.
> >
> > No, there's also a built-in assumption in MPS about the size of a
> > word.
> 
> That's very vague. If there is an assumption that EMACS_INT ==
> mps_word_t, it would certainly not be built into MPS, which doesn't know
> about EMACS_INT at all.

Not EMACS_INT, Lisp_Object.  At least that's what Gerd explained to me
back when I asked about WIDE_EMACS_INT in the MPS build.  Maybe he can
chime in and clarify this.

> >> The "low-hanging fruit" performance improvements USE_LSB allows for
> >> (faster stack scanning during GC and many places which don't need to
> >> look at the MSB word at all) are, I think, real, while the way in which
> >> !USE_LSB is superior (we dereference pointer words without having to
> >> untag them first) may reduce code size slightly, but shouldn't really
> >> affect performance.
> >
> > I have no problems with performance that I can report, so I don't
> > expect anyone to waste time and effort on these optimizations.  We
> > have enough real problems for the resources we have.
> 
> If performance and wasted memory aren't issues, then it's a tradeoff
> between leaving old code untouched and simplifying it to enable future
> development.

The existing code doesn't preclude nor interfere with future
development.  So yes, leaving working code untouched is the preference
here.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 18:41                   ` Eli Zaretskii
@ 2024-12-08 19:15                     ` Gerd Möllmann
  2024-12-08 20:38                       ` Eli Zaretskii
  2024-12-09  4:59                     ` Stefan Kangas
  2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-08 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sun, 08 Dec 2024 17:37:50 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>> 
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>> 
>> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
>> >
>> > That'd be a waste of effort.
>> 
>> It'd be a good investment of effort today, in exchange for making the GC
>> code significantly easier to understand and maintain in the future. It
>> would certainly not be without its benefits, so calling it a "waste of
>> effort" is unfair.
>
> I disagree.  We've lived with this GC code for a long time, and I
> don't see any complications due to !USE_LSB.  And if we are going to
> switch to igc at some point, investment in GC is even less sensible.
>
> I don't see what's unfair in making my position clear.

I think Pip meant igc. That would be a lot simpler without the 32-bit
stuff, wide ints or not. I said already what I think about that before.

>
>> >> > In fact, one of my strongest reservations about the igc branch is that
>> >> > it will most probably force me to lose WIDE_EMACS_INT.
>> >>
>> >> I believe that problem is exclusively due to the fact that
>> >> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
>> >> use WIDE_EMACS_INT normally in MPS builds, I think.
>> >
>> > No, there's also a built-in assumption in MPS about the size of a
>> > word.
>> 
>> That's very vague. If there is an assumption that EMACS_INT ==
>> mps_word_t, it would certainly not be built into MPS, which doesn't know
>> about EMACS_INT at all.
>
> Not EMACS_INT, Lisp_Object.  At least that's what Gerd explained to me
> back when I asked about WIDE_EMACS_INT in the MPS build.  Maybe he can
> chime in and clarify this.

(Not sure I understand the context in which you are discussing.)

As far as igc goes, a Lisp_Object consisting of 2 mps_word_t poses a
problem because we scan one mps_word_t at a time. Depending on where the
tag bits are, we need the other mps_word_t belonging to a Lisp_Object to
be able to determine its type (Lisp_Int0/1Lisp_Symbol, ...). IIRC
this is currently the case, and it's a major PITA. 




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 19:15                     ` Gerd Möllmann
@ 2024-12-08 20:38                       ` Eli Zaretskii
  2024-12-09  3:09                         ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-08 20:38 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Pip Cet <pipcet@protonmail.com>,  luangruo@yahoo.com,
>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Sun, 08 Dec 2024 20:15:09 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> It'd be a good investment of effort today, in exchange for making the GC
> >> code significantly easier to understand and maintain in the future. It
> >> would certainly not be without its benefits, so calling it a "waste of
> >> effort" is unfair.
> >
> > I disagree.  We've lived with this GC code for a long time, and I
> > don't see any complications due to !USE_LSB.  And if we are going to
> > switch to igc at some point, investment in GC is even less sensible.
> >
> > I don't see what's unfair in making my position clear.
> 
> I think Pip meant igc.

Then it's all a huge misunderstanding, and I apologize fore not
guessing that it was about igc.  In my defense I can only say that igc
was never mentioned.

> That would be a lot simpler without the 32-bit
> stuff, wide ints or not. I said already what I think about that before.

If you want to drop the 32-bit stuff, then (a) you will need to find
someone else to regularly build and test the Windows port of the
branch, and (b) we will need to agree on emacs-devel right now that
32-bit builds of Emacs will be dropped when igc lands.

> >> > No, there's also a built-in assumption in MPS about the size of a
> >> > word.
> >> 
> >> That's very vague. If there is an assumption that EMACS_INT ==
> >> mps_word_t, it would certainly not be built into MPS, which doesn't know
> >> about EMACS_INT at all.
> >
> > Not EMACS_INT, Lisp_Object.  At least that's what Gerd explained to me
> > back when I asked about WIDE_EMACS_INT in the MPS build.  Maybe he can
> > chime in and clarify this.
> 
> (Not sure I understand the context in which you are discussing.)
> 
> As far as igc goes, a Lisp_Object consisting of 2 mps_word_t poses a
> problem because we scan one mps_word_t at a time. Depending on where the
> tag bits are, we need the other mps_word_t belonging to a Lisp_Object to
> be able to determine its type (Lisp_Int0/1Lisp_Symbol, ...). IIRC
> this is currently the case, and it's a major PITA. 

That's what I remembered from when you explained that a few months
ago.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 20:38                       ` Eli Zaretskii
@ 2024-12-09  3:09                         ` Gerd Möllmann
  2024-12-09  3:32                           ` Eli Zaretskii
  2024-12-09  9:56                           ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-09  3:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Pip Cet <pipcet@protonmail.com>,  luangruo@yahoo.com,
>>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
>> Date: Sun, 08 Dec 2024 20:15:09 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> It'd be a good investment of effort today, in exchange for making the GC
>> >> code significantly easier to understand and maintain in the future. It
>> >> would certainly not be without its benefits, so calling it a "waste of
>> >> effort" is unfair.
>> >
>> > I disagree.  We've lived with this GC code for a long time, and I
>> > don't see any complications due to !USE_LSB.  And if we are going to
>> > switch to igc at some point, investment in GC is even less sensible.
>> >
>> > I don't see what's unfair in making my position clear.
>> 
>> I think Pip meant igc.
>
> Then it's all a huge misunderstanding, and I apologize fore not
> guessing that it was about igc.  In my defense I can only say that igc
> was never mentioned.

Or I'm wrong, and Pip meant something else.

>> That would be a lot simpler without the 32-bit
>> stuff, wide ints or not. I said already what I think about that before.
>
> If you want to drop the 32-bit stuff, then (a) you will need to find
> someone else to regularly build and test the Windows port of the
> branch, and (b) we will need to agree on emacs-devel right now that
> 32-bit builds of Emacs will be dropped when igc lands.

I would recommend that, indeed, but I don't expect it to happen any time
soon :-).

>> >> > No, there's also a built-in assumption in MPS about the size of a
>> >> > word.
>> >> 
>> >> That's very vague. If there is an assumption that EMACS_INT ==
>> >> mps_word_t, it would certainly not be built into MPS, which doesn't know
>> >> about EMACS_INT at all.
>> >
>> > Not EMACS_INT, Lisp_Object.  At least that's what Gerd explained to me
>> > back when I asked about WIDE_EMACS_INT in the MPS build.  Maybe he can
>> > chime in and clarify this.
>> 
>> (Not sure I understand the context in which you are discussing.)
>> 
>> As far as igc goes, a Lisp_Object consisting of 2 mps_word_t poses a
>> problem because we scan one mps_word_t at a time. Depending on where the
>> tag bits are, we need the other mps_word_t belonging to a Lisp_Object to
>> be able to determine its type (Lisp_Int0/1Lisp_Symbol, ...). IIRC
>> this is currently the case, and it's a major PITA. 
>
> That's what I remembered from when you explained that a few months
> ago.

What about dropping, officially sanctioned so to speak, WIDE_EMACS_INT
support for igc? That would help.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  3:09                         ` Gerd Möllmann
@ 2024-12-09  3:32                           ` Eli Zaretskii
  2024-12-09  3:43                             ` Gerd Möllmann
  2024-12-09  9:56                           ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-09  3:32 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  luangruo@yahoo.com,  ali_gnu2@emvision.com,
>   emacs-devel@gnu.org
> Date: Mon, 09 Dec 2024 04:09:39 +0100
> 
> What about dropping, officially sanctioned so to speak, WIDE_EMACS_INT
> support for igc? That would help.

You already dropped it, didn't you?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  3:32                           ` Eli Zaretskii
@ 2024-12-09  3:43                             ` Gerd Möllmann
  2024-12-09  4:53                               ` Stefan Kangas
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-09  3:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  luangruo@yahoo.com,  ali_gnu2@emvision.com,
>>   emacs-devel@gnu.org
>> Date: Mon, 09 Dec 2024 04:09:39 +0100
>> 
>> What about dropping, officially sanctioned so to speak, WIDE_EMACS_INT
>> support for igc? That would help.
>
> You already dropped it, didn't you?

There is

#ifdef WIDE_EMACS_INT
# error "WIDE_EMACS_INT not supported"
#endif

in igc.c simply because it's not implemented.

Mentally, I've dropped it, yes. I think it would make things really
ugly, and not having it doesn't take away anything from users of
WIDE_EMACS_INT which they currently have, i.e. the current GC.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  3:43                             ` Gerd Möllmann
@ 2024-12-09  4:53                               ` Stefan Kangas
  2024-12-09  5:26                                 ` Gerd Möllmann
  2024-12-09 13:58                                 ` Eli Zaretskii
  0 siblings, 2 replies; 137+ messages in thread
From: Stefan Kangas @ 2024-12-09  4:53 UTC (permalink / raw)
  To: Gerd Möllmann, Eli Zaretskii; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> There is
>
> #ifdef WIDE_EMACS_INT
> # error "WIDE_EMACS_INT not supported"
> #endif
>
> in igc.c simply because it's not implemented.
>
> Mentally, I've dropped it, yes. I think it would make things really
> ugly, and not having it doesn't take away anything from users of
> WIDE_EMACS_INT which they currently have, i.e. the current GC.

Is the idea to continue supporting both the old GC and mpc for the
foreseeable future?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  4:53                               ` Stefan Kangas
@ 2024-12-09  5:26                                 ` Gerd Möllmann
  2024-12-09 13:58                                 ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-09  5:26 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Eli Zaretskii, pipcet, luangruo, ali_gnu2, emacs-devel

Stefan Kangas <stefankangas@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> There is
>>
>> #ifdef WIDE_EMACS_INT
>> # error "WIDE_EMACS_INT not supported"
>> #endif
>>
>> in igc.c simply because it's not implemented.
>>
>> Mentally, I've dropped it, yes. I think it would make things really
>> ugly, and not having it doesn't take away anything from users of
>> WIDE_EMACS_INT which they currently have, i.e. the current GC.
>
> Is the idea to continue supporting both the old GC and mpc for the
> foreseeable future?

ISTR Po Lu mentioning that some OS (Android?) does not support MPS, so
yes from me.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  4:53                               ` Stefan Kangas
  2024-12-09  5:26                                 ` Gerd Möllmann
@ 2024-12-09 13:58                                 ` Eli Zaretskii
  2024-12-10  0:02                                   ` Po Lu
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-09 13:58 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: gerd.moellmann, pipcet, luangruo, ali_gnu2, emacs-devel

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Sun, 8 Dec 2024 20:53:05 -0800
> Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, 
> 	emacs-devel@gnu.org
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > There is
> >
> > #ifdef WIDE_EMACS_INT
> > # error "WIDE_EMACS_INT not supported"
> > #endif
> >
> > in igc.c simply because it's not implemented.
> >
> > Mentally, I've dropped it, yes. I think it would make things really
> > ugly, and not having it doesn't take away anything from users of
> > WIDE_EMACS_INT which they currently have, i.e. the current GC.
> 
> Is the idea to continue supporting both the old GC and mpc for the
> foreseeable future?

It could be, if MPS support is less than universal.

But that doesn't necessarily decide the fate of WIDE_EMACS_INT.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09 13:58                                 ` Eli Zaretskii
@ 2024-12-10  0:02                                   ` Po Lu
  0 siblings, 0 replies; 137+ messages in thread
From: Po Lu @ 2024-12-10  0:02 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Stefan Kangas, gerd.moellmann, pipcet, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stefan Kangas <stefankangas@gmail.com>
>> Date: Sun, 8 Dec 2024 20:53:05 -0800
>> Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, 
>> 	emacs-devel@gnu.org
>> 
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> 
>> > There is
>> >
>> > #ifdef WIDE_EMACS_INT
>> > # error "WIDE_EMACS_INT not supported"
>> > #endif
>> >
>> > in igc.c simply because it's not implemented.
>> >
>> > Mentally, I've dropped it, yes. I think it would make things really
>> > ugly, and not having it doesn't take away anything from users of
>> > WIDE_EMACS_INT which they currently have, i.e. the current GC.
>> 
>> Is the idea to continue supporting both the old GC and mpc for the
>> foreseeable future?
>
> It could be, if MPS support is less than universal.

It is less than universal, although it was simpler to port than I
anticipated.  E.g. Solaris on SPARC is not a supported configuration
upstream.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  3:09                         ` Gerd Möllmann
  2024-12-09  3:32                           ` Eli Zaretskii
@ 2024-12-09  9:56                           ` Pip Cet via Emacs development discussions.
  2024-12-10  0:04                             ` Po Lu
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-09  9:56 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, luangruo, ali_gnu2, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: Pip Cet <pipcet@protonmail.com>,  luangruo@yahoo.com,
>>>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
>>> Date: Sun, 08 Dec 2024 20:15:09 +0100
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>> >> It'd be a good investment of effort today, in exchange for making the GC
>>> >> code significantly easier to understand and maintain in the future. It
>>> >> would certainly not be without its benefits, so calling it a "waste of
>>> >> effort" is unfair.
>>> >
>>> > I disagree.  We've lived with this GC code for a long time, and I
>>> > don't see any complications due to !USE_LSB.  And if we are going to
>>> > switch to igc at some point, investment in GC is even less sensible.
>>> >
>>> > I don't see what's unfair in making my position clear.
>>>
>>> I think Pip meant igc.
>>
>> Then it's all a huge misunderstanding, and I apologize fore not
>> guessing that it was about igc.  In my defense I can only say that igc
>> was never mentioned.
>
> Or I'm wrong, and Pip meant something else.

I was talking about the non-mps branch, yes.  We should drop !USE_LSB,
which doesn't work in its original use case today and hasn't for a
while.  It does happen to work in the WIDE_EMACS_INT case, but that's a
fortuitous accident at best.

>>> >> > No, there's also a built-in assumption in MPS about the size of a
>>> >> > word.
>>> >>
>>> >> That's very vague. If there is an assumption that EMACS_INT ==
>>> >> mps_word_t, it would certainly not be built into MPS, which doesn't know
>>> >> about EMACS_INT at all.
>>> >
>>> > Not EMACS_INT, Lisp_Object.  At least that's what Gerd explained to me

(Of course, we have

typedef EMACS_INT Lisp_Word;
typedef Lisp_Word Lisp_Object;

so this is the same thing)

>>> > back when I asked about WIDE_EMACS_INT in the MPS build.  Maybe he can
>>> > chime in and clarify this.
>>>
>>> (Not sure I understand the context in which you are discussing.)
>>>
>>> As far as igc goes, a Lisp_Object consisting of 2 mps_word_t poses a
>>> problem because we scan one mps_word_t at a time. Depending on where the
>>> tag bits are, we need the other mps_word_t belonging to a Lisp_Object to
>>> be able to determine its type (Lisp_Int0/1Lisp_Symbol, ...). IIRC
>>> this is currently the case, and it's a major PITA.

So the problem is !USE_LSB, not WIDE_EMACS_INT.  Another reason we
should drop !USE_LSB, since it gives us working WIDE_EMACS_INT + MPS
builds.

>> That's what I remembered from when you explained that a few months
>> ago.
>
> What about dropping, officially sanctioned so to speak, WIDE_EMACS_INT
> support for igc? That would help.

I don't see a technical reason to do so, since WIDE_EMACS_INT + !USE_LSB
works fine (see the patch I sent).  We already refuse to build igc.c in
!USE_LSB situations and should continue to do so.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  9:56                           ` Pip Cet via Emacs development discussions.
@ 2024-12-10  0:04                             ` Po Lu
  2024-12-10  3:34                               ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-12-10  0:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: Gerd Möllmann, Eli Zaretskii, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> I was talking about the non-mps branch, yes.  We should drop !USE_LSB,
> which doesn't work in its original use case today and hasn't for a
> while.  It does happen to work in the WIDE_EMACS_INT case, but that's a
> fortuitous accident at best.

I propose to make it work again.  It ought to be a simple matter of
scanning stack slots twice, with and without tag bits.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10  0:04                             ` Po Lu
@ 2024-12-10  3:34                               ` Eli Zaretskii
  2024-12-11  1:13                                 ` Po Lu
  0 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10  3:34 UTC (permalink / raw)
  To: Po Lu; +Cc: pipcet, gerd.moellmann, ali_gnu2, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,  Eli
>  Zaretskii <eliz@gnu.org>,
>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Tue, 10 Dec 2024 08:04:03 +0800
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > I was talking about the non-mps branch, yes.  We should drop !USE_LSB,
> > which doesn't work in its original use case today and hasn't for a
> > while.  It does happen to work in the WIDE_EMACS_INT case, but that's a
> > fortuitous accident at best.
> 
> I propose to make it work again.  It ought to be a simple matter of
> scanning stack slots twice, with and without tag bits.

Patches to that effect will be welcome, thanks.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10  3:34                               ` Eli Zaretskii
@ 2024-12-11  1:13                                 ` Po Lu
  2024-12-11 11:29                                   ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-12-11  1:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,  Eli
>>  Zaretskii <eliz@gnu.org>,
>>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
>> Date: Tue, 10 Dec 2024 08:04:03 +0800
>> 
>> Pip Cet <pipcet@protonmail.com> writes:
>> 
>> > I was talking about the non-mps branch, yes.  We should drop !USE_LSB,
>> > which doesn't work in its original use case today and hasn't for a
>> > while.  It does happen to work in the WIDE_EMACS_INT case, but that's a
>> > fortuitous accident at best.
>> 
>> I propose to make it work again.  It ought to be a simple matter of
>> scanning stack slots twice, with and without tag bits.
>
> Patches to that effect will be welcome, thanks.

Yes, like I said at the beginning of this (burgeoning) thread, I intend
to return to active Emacs development after the release of Emacs 30.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-11  1:13                                 ` Po Lu
@ 2024-12-11 11:29                                   ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 11:29 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, gerd.moellmann, ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>>> From: Po Lu <luangruo@yahoo.com>
>>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,  Eli
>>>  Zaretskii <eliz@gnu.org>,
>>>   ali_gnu2@emvision.com,  emacs-devel@gnu.org
>>> Date: Tue, 10 Dec 2024 08:04:03 +0800
>>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>> > I was talking about the non-mps branch, yes.  We should drop !USE_LSB,
>>> > which doesn't work in its original use case today and hasn't for a
>>> > while.  It does happen to work in the WIDE_EMACS_INT case, but that's a
>>> > fortuitous accident at best.
>>>
>>> I propose to make it work again.  It ought to be a simple matter of
>>> scanning stack slots twice, with and without tag bits.
>>
>> Patches to that effect will be welcome, thanks.
>
> Yes, like I said at the beginning of this (burgeoning) thread, I intend
> to return to active Emacs development after the release of Emacs 30.

That's great to hear, but I'd like to make a final (promise!) attempt to
dissuade you from making this particular change ("fixing" the code to
support !USE_LSB_TAG more often).

The changes that are necessary concern the most delicate part of the
garbage collector: ambiguous scanning needs to remove the tag (the easy
part), and live_cons_p etc. have to be changed to allow for more offsets
(we need to recognize pointers to &Lisp_Object + 4 as well as pointers
to &Lisp_Object itself; I think this bug is already present on
big-endian 32-bit builds utilizing WIDE_EMACS_INT, but no one's using
that).  I suspect other changes will be necessary (in particular, I
expect breakage on systems that use the high byte of 64-bit pointers, as
some Android systems do; I also expect there will be sign extension /
zero extension problems). The pdumper code also needs to be studied
carefully, and most likely changed. (Pure space and unexec will likely
have gone away by then, but they would be affected, too).  This is not a
quick fix.

What makes this code delicate is that it's very rare for a stack
reference, particularly an unusual one, to be the last reference that
keeps another object alive; even if we fail to recognize an ambiguous
reference and free the object it refers to, the most likely outcome is
an invisible UAF error, because we happen to use-after-free memory right
after the garbage collection, and it'll still have the expected
contents.

This part of the garbage collector has long been in need of some work
(we currently search the RB tree twice for every word, even though the
second pass is usually unnecessary). Obviously, that will be harder if
we change the code in other ways.

The very best outcome of making the changes you propose is that no one
will ever use the changed code; in that case, all that will be achieved
is to add unused code to a function that's already hard to understand,
and to make future changes that much harder.

But that's not what I think will hapen. What I think will happen is that
users will start or continue using !USE_LSB_TAG, try to switch to MPS,
run into a problem, (hopefully) report a bug, and we won't be able to
deal with that bug report because we're comparing a USE_LSB_TAG + MPS
build to a !USE_LSB_TAG + !MPS one, and it'll be impossible to tell
which of the two major changes are causing the problem.

In other words, every person affected by your proposed changes will be
unable to usefully test MPS. I think that's bad.

If you insist on making the changes, please make sure there is a visible
"feature" in the corresponding MPS build which will let us know that bug
reports are useless and should be disregarded. I personally won't ask
anyone to test MPS in a setting where they cannot usefully report bugs.

Obviously, reducing the number of people who can usefully test MPS will
make it slightly less likely it'll ever land.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 18:41                   ` Eli Zaretskii
  2024-12-08 19:15                     ` Gerd Möllmann
@ 2024-12-09  4:59                     ` Stefan Kangas
  2024-12-09 14:39                       ` Eli Zaretskii
  2024-12-09 16:21                       ` Pip Cet via Emacs development discussions.
  2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
  2 siblings, 2 replies; 137+ messages in thread
From: Stefan Kangas @ 2024-12-09  4:59 UTC (permalink / raw)
  To: Eli Zaretskii, Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sun, 08 Dec 2024 17:37:50 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
>> >
>> > That'd be a waste of effort.
>>
>> It'd be a good investment of effort today, in exchange for making the GC
>> code significantly easier to understand and maintain in the future. It
>> would certainly not be without its benefits, so calling it a "waste of
>> effort" is unfair.
>
> I disagree.  We've lived with this GC code for a long time, and I
> don't see any complications due to !USE_LSB.  And if we are going to
> switch to igc at some point, investment in GC is even less sensible.

Assuming that we are 100% sure that mpc will land, then I can agree that
making any changes here is basically wasted effort.  Unless, of course,
the change would also simplify the mpc work (would it?).

On the other hand, IIUC, we have some way to go with making the merging
of the mpc branch a guarantee.  While I'm an enthusiastic supporter of
the great work that's being done on the mpc branch, isn't hedging our
bets prudent until that work is done?

Or am I misunderstanding how close we are to merging the mpc branch?

>> If performance and wasted memory aren't issues, then it's a tradeoff
>> between leaving old code untouched and simplifying it to enable future
>> development.
>
> The existing code doesn't preclude nor interfere with future
> development.  So yes, leaving working code untouched is the preference
> here.

Based on my limited mucking around in the GC, it does interfere somewhat
because you do need to understand both configurations, at least on a
high level, and once you do you need to mentally filter that stuff out
when reading the code.  So I think I'd appreciate the simplification, at
least.

If the only known drawbacks are stability concerns, we could also
consider an intermediate step along these lines:

Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
on master.  See what issues crop up, if any.  If anything does come up,
ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting
to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG.  If
nothing too bad comes up, we can then consider removing the associated
code in Emacs 32.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  4:59                     ` Stefan Kangas
@ 2024-12-09 14:39                       ` Eli Zaretskii
  2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
  2024-12-10  0:09                         ` pdumper on Solaris 10 Stefan Kangas
  2024-12-09 16:21                       ` Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-09 14:39 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Sun, 8 Dec 2024 23:59:14 -0500
> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Date: Sun, 08 Dec 2024 17:37:50 +0000
> >> From: Pip Cet <pipcet@protonmail.com>
> >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> >>
> >> "Eli Zaretskii" <eliz@gnu.org> writes:
> >>
> >> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
> >> >
> >> > That'd be a waste of effort.
> >>
> >> It'd be a good investment of effort today, in exchange for making the GC
> >> code significantly easier to understand and maintain in the future. It
> >> would certainly not be without its benefits, so calling it a "waste of
> >> effort" is unfair.
> >
> > I disagree.  We've lived with this GC code for a long time, and I
> > don't see any complications due to !USE_LSB.  And if we are going to
> > switch to igc at some point, investment in GC is even less sensible.
> 
> Assuming that we are 100% sure that mpc will land, then I can agree that
> making any changes here is basically wasted effort.  Unless, of course,
> the change would also simplify the mpc work (would it?).

The igc branch already dropped WIDE_EMACS_INT support, so it only
supports USE_LSB anyway.

> On the other hand, IIUC, we have some way to go with making the merging
> of the mpc branch a guarantee.  While I'm an enthusiastic supporter of
> the great work that's being done on the mpc branch, isn't hedging our
> bets prudent until that work is done?

From where I stand, what's left to do on the branch is stability:
using the branch, reporting bugs, and fixing them, especially on some
rarer platforms (*BSD, for example).  Plus some decisions: do we fork
MPS or not, for example.  So it isn't such a distant future.

> Or am I misunderstanding how close we are to merging the mpc branch?

Possibly.

> >> If performance and wasted memory aren't issues, then it's a tradeoff
> >> between leaving old code untouched and simplifying it to enable future
> >> development.
> >
> > The existing code doesn't preclude nor interfere with future
> > development.  So yes, leaving working code untouched is the preference
> > here.
> 
> Based on my limited mucking around in the GC, it does interfere somewhat
> because you do need to understand both configurations, at least on a
> high level, and once you do you need to mentally filter that stuff out
> when reading the code.  So I think I'd appreciate the simplification, at
> least.

The simplification is minuscule at best.  We need to mask some bits,
either at the LSB end or at MSB end, that's all the difference.  And
we have macros that hide the differences from most levels.

And remember that the original scheme of tagging in Emacs was
!USE_LSB, so some veterans might even prefer it.

> If the only known drawbacks are stability concerns, we could also
> consider an intermediate step along these lines:
> 
> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
> on master.

That would put the WIDE_EMACS_INT configuration at risk, since that
configuration will need changes.

> See what issues crop up, if any.  If anything does come up,
> ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting
> to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG.  If
> nothing too bad comes up, we can then consider removing the associated
> code in Emacs 32.

My point is that all of that could be avoided entirely, given some
development decisions which basically drop !USE_LSB_TAG
configurations.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 14:39                       ` Eli Zaretskii
@ 2024-12-09 21:06                         ` Stefan Kangas
  2024-12-09 21:49                           ` Óscar Fuentes
                                             ` (2 more replies)
  2024-12-10  0:09                         ` pdumper on Solaris 10 Stefan Kangas
  1 sibling, 3 replies; 137+ messages in thread
From: Stefan Kangas @ 2024-12-09 21:06 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: pipcet, luangruo, ali_gnu2, emacs-devel, Gerd Möllmann,
	Stefan Monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> On the other hand, IIUC, we have some way to go with making the merging
>> of the mpc branch a guarantee.  While I'm an enthusiastic supporter of
>> the great work that's being done on the mpc branch, isn't hedging our
>> bets prudent until that work is done?
>
> From where I stand, what's left to do on the branch is stability:
> using the branch, reporting bugs, and fixing them, especially on some
> rarer platforms (*BSD, for example).  Plus some decisions: do we fork
> MPS or not, for example.  So it isn't such a distant future.

In that case, I'd suggest that we start working on getting README-IGC
into an excellent state.  In August, when I last tried building the
branch, getting it to build was non-trivial, but I didn't try with the
latest instructions.

Taking a look at README-IGC, it seems like we're still missing build
instructions for Debian.  Maybe people could volunteer to add other
popular distros too, and *BSD, etc.  (If the idea is that such users
should just follow the instructions under "Building MPS yourself", then
we should say that instead of "TBD".)

Once we feel happy that it's reasonably straightforward to follow the
instructions, I'd suggest that Someone (TM) makes a post to emacs-devel,
asking people to start seriously testing the branch.  Such a post should
normally get picked up by Emacs News, Reddit, etc. and hopefully the
branch will then start seeing wider use.  (Remember to Cc Sacha Chua to
get it on Emacs News.)

I'm sure that users will be excited to help test igc once they
understand that we're working seriously on stabilizing it in preparation
of getting it merged.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
@ 2024-12-09 21:49                           ` Óscar Fuentes
  2024-12-10  4:17                             ` Xiyue Deng
  2024-12-10 13:09                             ` Eli Zaretskii
  2024-12-09 23:13                           ` chad
  2024-12-10 12:41                           ` Eli Zaretskii
  2 siblings, 2 replies; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-09 21:49 UTC (permalink / raw)
  To: emacs-devel

Stefan Kangas <stefankangas@gmail.com> writes:

> Taking a look at README-IGC, it seems like we're still missing build
> instructions for Debian.

AFAIK Debian does not package MPS.

The instructions I added to README-IGC for building MPS from their git
repo are distro-agnostic. They are tested in Debian Trixie (a.k.a
Testing) which is what I have installed on all the machines I regularly
use.

In fact, I'm pretty sure that any experienced autotools hacker can add
MPS to the Emacs build in no time. The only annoying bit is that some
MPS headers collide with Emacs', so I chose to instruct the user to copy
the needed headers to a new directory and tell the config script to use
it.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 21:49                           ` Óscar Fuentes
@ 2024-12-10  4:17                             ` Xiyue Deng
  2024-12-10  4:26                               ` Sean Whitton
                                                 ` (4 more replies)
  2024-12-10 13:09                             ` Eli Zaretskii
  1 sibling, 5 replies; 137+ messages in thread
From: Xiyue Deng @ 2024-12-10  4:17 UTC (permalink / raw)
  To: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

Óscar Fuentes <ofv@wanadoo.es> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> Taking a look at README-IGC, it seems like we're still missing build
>> instructions for Debian.
>
> AFAIK Debian does not package MPS.
>
> The instructions I added to README-IGC for building MPS from their git
> repo are distro-agnostic. They are tested in Debian Trixie (a.k.a
> Testing) which is what I have installed on all the machines I regularly
> use.
>
> In fact, I'm pretty sure that any experienced autotools hacker can add
> MPS to the Emacs build in no time. The only annoying bit is that some
> MPS headers collide with Emacs', so I chose to instruct the user to copy
> the needed headers to a new directory and tell the config script to use
> it.
>
>

If making MPS available in Debian would help Emacs packaging I'm willing
to work on this (in the coming weeks as igc may not land with the
upcoming Emacs 30 release so not in a hurry.)

I have a few questions regarding the Emacs/igc usage of MPS:

* Does igc require only mps.{h,c} or more sources from the MPS source
  package?  It looks like there are many sources and it's autotools
  build script fails with GCC 14.2 in Debian Trixie due to several
  "-Werror"s.  It may be easier to just compile and ship the required
  subset, though it may require providing a custom build script.

* Does igc work with a dynamically linked MPS library?  Currently I have
  seen people suggesting that directly compiling the source, which is
  effectively like using MPS as a static library.  It would be less
  useful to package a static-only library in Debian because in case of
  any issues (usually security) updating the library is insufficient and
  its dependencies would need to be rebuilt as well.  Using a dynamic
  library would solve this scalability issue, and it would be good to
  know if igc can work with a dynamically linked MPS.

* Does igc work with the latest tagged version (release-1.118.0) or only
  the latest snapshot?  Packaging a tagged version would be easier,
  though working with a snapshot may also work with a bit of extra
  efforts.

-- 
Regards,
Xiyue Deng

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10  4:17                             ` Xiyue Deng
@ 2024-12-10  4:26                               ` Sean Whitton
  2024-12-10  4:42                               ` chad
                                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Sean Whitton @ 2024-12-10  4:26 UTC (permalink / raw)
  To: Xiyue Deng; +Cc: Óscar Fuentes, emacs-devel

Hello,

I can review and sponsor Xiyue’s upload to Debian.

-- 
Sean Whitton

Please excuse top-posting and brevity. I am writing to you from a mobile phone.

> On 10 Dec 2024, at 12:19, Xiyue Deng <manphiz@gmail.com> wrote:
> 
> Óscar Fuentes <ofv@wanadoo.es> writes:
> 
>> Stefan Kangas <stefankangas@gmail.com> writes:
>> 
>>> Taking a look at README-IGC, it seems like we're still missing build
>>> instructions for Debian.
>> 
>> AFAIK Debian does not package MPS.
>> 
>> The instructions I added to README-IGC for building MPS from their git
>> repo are distro-agnostic. They are tested in Debian Trixie (a.k.a
>> Testing) which is what I have installed on all the machines I regularly
>> use.
>> 
>> In fact, I'm pretty sure that any experienced autotools hacker can add
>> MPS to the Emacs build in no time. The only annoying bit is that some
>> MPS headers collide with Emacs', so I chose to instruct the user to copy
>> the needed headers to a new directory and tell the config script to use
>> it.
>> 
>> 
> 
> If making MPS available in Debian would help Emacs packaging I'm willing
> to work on this (in the coming weeks as igc may not land with the
> upcoming Emacs 30 release so not in a hurry.)
> 
> I have a few questions regarding the Emacs/igc usage of MPS:
> 
> * Does igc require only mps.{h,c} or more sources from the MPS source
>  package?  It looks like there are many sources and it's autotools
>  build script fails with GCC 14.2 in Debian Trixie due to several
>  "-Werror"s.  It may be easier to just compile and ship the required
>  subset, though it may require providing a custom build script.
> 
> * Does igc work with a dynamically linked MPS library?  Currently I have
>  seen people suggesting that directly compiling the source, which is
>  effectively like using MPS as a static library.  It would be less
>  useful to package a static-only library in Debian because in case of
>  any issues (usually security) updating the library is insufficient and
>  its dependencies would need to be rebuilt as well.  Using a dynamic
>  library would solve this scalability issue, and it would be good to
>  know if igc can work with a dynamically linked MPS.
> 
> * Does igc work with the latest tagged version (release-1.118.0) or only
>  the latest snapshot?  Packaging a tagged version would be easier,
>  though working with a snapshot may also work with a bit of extra
>  efforts.
> 
> --
> Regards,
> Xiyue Deng
> <signature.asc>




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10  4:17                             ` Xiyue Deng
  2024-12-10  4:26                               ` Sean Whitton
@ 2024-12-10  4:42                               ` chad
  2024-12-10 13:10                               ` Óscar Fuentes
                                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: chad @ 2024-12-10  4:42 UTC (permalink / raw)
  To: Xiyue Deng; +Cc: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Mon, Dec 9, 2024 at 11:19 PM Xiyue Deng <manphiz@gmail.com> wrote:

> * Does igc require only mps.{h,c} or more sources from the MPS source
>   package?  It looks like there are many sources and it's autotools
>   build script fails with GCC 14.2 in Debian Trixie due to several
>   "-Werror"s.  It may be easier to just compile and ship the required
>   subset, though it may require providing a custom build script.
>

Emacs itself needs:

#include <mps.h>
> #include <mpsavm.h>
> #include <mpscamc.h>
> #include "mpscams.h"
> #include <mpscawl.h>
> #include <mpslib.h>


This is shorter than mps/code/mps*.h (which I suggested earlier).

* Does igc work with the latest tagged version (release-1.118.0) or only
>   the latest snapshot?  Packaging a tagged version would be easier,
>   though working with a snapshot may also work with a bit of extra
>   efforts.


I get the impression that Ravenbrook/mps is working towards an updated
release, but at the moment, I believe that you really want patches that
aren't in release-1.118.0.

Hope that helps,
~Chad

[-- Attachment #2: Type: text/html, Size: 1860 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10  4:17                             ` Xiyue Deng
  2024-12-10  4:26                               ` Sean Whitton
  2024-12-10  4:42                               ` chad
@ 2024-12-10 13:10                               ` Óscar Fuentes
  2024-12-10 15:10                                 ` Pip Cet via Emacs development discussions.
  2024-12-10 13:20                               ` Eli Zaretskii
  2024-12-10 14:46                               ` Pip Cet via Emacs development discussions.
  4 siblings, 1 reply; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 13:10 UTC (permalink / raw)
  To: emacs-devel

Xiyue Deng <manphiz@gmail.com> writes:

> If making MPS available in Debian would help Emacs packaging I'm willing
> to work on this (in the coming weeks as igc may not land with the
> upcoming Emacs 30 release so not in a hurry.)

As a Debian user, I value every package that is made available through
its repos, thank you.

However, in the specific case of Emacs/MPS, IMAO distro packaging is not
the best way, because:

* Depending on packaged MPS brings versioning problems, not to mention
  that it would take a long time to have MPS available on a large part
  of the distro ecosystem. We would need the DIY part of README-IGC
  anyway.

* It is very likely that we end doing some patching to the MPS sources
  to adapt to our specific needs (if those patches end upstream or not,
  that's another question.)

* MPS does a performance-critical job. Using it as a shared object might
  incur in a performance penalty. Having it in source form alongside the
  Emacs sources will result in opportunities for optimizations (LTO,
  PGO, ...) that may bring better performance.

* MPS does a correctness-critical job. Depending on multiple external
  sources for such core component is a recipe for problems (future
  changes by the MPS maintainers, patching by packagers, buggy
  compilers, etc.) We need to keep a close watch on what MPS incarnation
  we use. Better yet, total control.

For those reasons, incorporating MPS into the Emacs sources is the right
thing to do.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 13:10                               ` Óscar Fuentes
@ 2024-12-10 15:10                                 ` Pip Cet via Emacs development discussions.
  2024-12-10 15:37                                   ` Óscar Fuentes
  2024-12-12  4:37                                   ` Xiyue Deng
  0 siblings, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 15:10 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On Tuesday, December 10th, 2024 at 13:10, Óscar Fuentes <ofv@wanadoo.es> wrote:
> Xiyue Deng manphiz@gmail.com writes:
> * It is very likely that we end doing some patching to the MPS sources
> to adapt to our specific needs (if those patches end upstream or not,
> that's another question.)

If that ends up being the case, we'll have to make sure not to use shared libraries which may contain the upstream code. But that's true of all libraries; in the particular case of a Debian package, both the APT versioning schemes and ELF versioning are available for that.

> * MPS does a performance-critical job. Using it as a shared object might
> incur in a performance penalty. Having it in source form alongside the
> Emacs sources will result in opportunities for optimizations (LTO,
> PGO, ...) that may bring better performance.

...and more problems. MPS has made the decision not to work with gcc -O3, only with -O2 or less, and LTO in particular is something MPS cannot reliably support, IIUC.

> * MPS does a correctness-critical job. Depending on multiple external
> sources for such core component is a recipe for problems (future
> changes by the MPS maintainers, patching by packagers, buggy
> compilers, etc.) We need to keep a close watch on what MPS incarnation
> we use. Better yet, total control.

I think the correctness argument goes both ways: shared linking means bugs may be fixed for you automatically, as is routinely the case with libc.

> For those reasons, incorporating MPS into the Emacs sources is the right
> thing to do.

I don't think that's an option, because Emacs should remain capable of switching to GPLv4 if and when that is released, and we don't know whether the MPS license is compatible with such a future document.

So it's either static or dynamic linking; static links have these disadvantages:

* shared libraries on GNU/Linux have versioning, static libs don't, AFAIK
* legally, statically-linked binaries are quite different from dynamically-linked ones
* someone might enable LTO and break MPS (this may be done automatically by the compiler rather than a user error)
* with dynamic linking, there is some hope we could switch from libmps.so to libmps-debug.so without having to recompile Emacs, which would help us diagnose crashes in their actual environment

Pip



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 15:10                                 ` Pip Cet via Emacs development discussions.
@ 2024-12-10 15:37                                   ` Óscar Fuentes
  2024-12-10 15:47                                     ` Pip Cet via Emacs development discussions.
  2024-12-10 17:16                                     ` Eli Zaretskii
  2024-12-12  4:37                                   ` Xiyue Deng
  1 sibling, 2 replies; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 15:37 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

>> * MPS does a performance-critical job. Using it as a shared object might
>> incur in a performance penalty. Having it in source form alongside the
>> Emacs sources will result in opportunities for optimizations (LTO,
>> PGO, ...) that may bring better performance.
>
> ...and more problems. MPS has made the decision not to work with gcc
> -O3, only with -O2 or less, and LTO in particular is something MPS
> cannot reliably support, IIUC.

That sounds worrysome. If I understand the implications of what you
wrote, MPS basically depends on what the specifics of what gcc does. But
gcc can do something else on future versions... not to mention what
happens if the user wants to use other compilers.

Can you point me to a description of how MPS is related to compiler
optimizations and specifically to LTO?

>> * MPS does a correctness-critical job. Depending on multiple external
>> sources for such core component is a recipe for problems (future
>> changes by the MPS maintainers, patching by packagers, buggy
>> compilers, etc.) We need to keep a close watch on what MPS incarnation
>> we use. Better yet, total control.
>
> I think the correctness argument goes both ways: shared linking means
> bugs may be fixed for you automatically, as is routinely the case with
> libc.

libc is a central piece of any GNU/Linux distribution and therefore much
cared by the packagers. MPS not so. Fixes on minor packages like MPS can
take *years* to propagate through the distro universe, if at all.

>> For those reasons, incorporating MPS into the Emacs sources is the right
>> thing to do.
>
> I don't think that's an option, because Emacs should remain capable of
> switching to GPLv4 if and when that is released, and we don't know
> whether the MPS license is compatible with such a future document.

Yeah, the licensing point is what I was too afraid to mention :-)



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 15:37                                   ` Óscar Fuentes
@ 2024-12-10 15:47                                     ` Pip Cet via Emacs development discussions.
  2024-12-10 17:16                                     ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 15:47 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On Tuesday, December 10th, 2024 at 15:37, Óscar Fuentes <ofv@wanadoo.es> wrote:
> Pip Cet pipcet@protonmail.com writes:
> 
> > > * MPS does a performance-critical job. Using it as a shared object might
> > > incur in a performance penalty. Having it in source form alongside the
> > > Emacs sources will result in opportunities for optimizations (LTO,
> > > PGO, ...) that may bring better performance.
> > 
> > ...and more problems. MPS has made the decision not to work with gcc
> > -O3, only with -O2 or less, and LTO in particular is something MPS
> > cannot reliably support, IIUC.
> 
> That sounds worrysome. If I understand the implications of what you
> wrote, MPS basically depends on what the specifics of what gcc does. But

No compiler can perform cross-object linking without LTO, so we're safe there.

> gcc can do something else on future versions... not to mention what
> happens if the user wants to use other compilers.

GC in general depends a lot on the compiler (and the programmer) not misbehaving. It is perfectly legal for a C compiler to scramble a pointer in a register, for example, but all conservative stack marking GC approaches will fail to recognize such a scrambled pointer and crash.

> Can you point me to a description of how MPS is related to compiler
> optimizations and specifically to LTO?

I'll have a look.  IIRC, setjmp() and the "void *top_of_stack = &top_of_stack" trick failed to properly detect all registers when the entry point was being inlined across objects, and Ravenbrook decided against moving to assembly code for those entry points.

Of course there's also the scrambled frame pointer problem, but that's about what the client code does, not just about the MPS code.

> > > * MPS does a correctness-critical job. Depending on multiple external
> > > sources for such core component is a recipe for problems (future
> > > changes by the MPS maintainers, patching by packagers, buggy
> > > compilers, etc.) We need to keep a close watch on what MPS incarnation
> > > we use. Better yet, total control.
> > 
> > I think the correctness argument goes both ways: shared linking means
> > bugs may be fixed for you automatically, as is routinely the case with
> > libc.
> 
> libc is a central piece of any GNU/Linux distribution and therefore much
> cared by the packagers. MPS not so. Fixes on minor packages like MPS can
> take years to propagate through the distro universe, if at all.

Very good point, thank you.

> > > For those reasons, incorporating MPS into the Emacs sources is the right
> > > thing to do.
> > 
> > I don't think that's an option, because Emacs should remain capable of
> > switching to GPLv4 if and when that is released, and we don't know
> > whether the MPS license is compatible with such a future document.
> 
> Yeah, the licensing point is what I was too afraid to mention :-)

Don't get me wrong: if Ravenbrook were to assign copyright to the FSF, including it in Emacs would be TRT, but that's unlikely to happen.

Pip



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 15:37                                   ` Óscar Fuentes
  2024-12-10 15:47                                     ` Pip Cet via Emacs development discussions.
@ 2024-12-10 17:16                                     ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 17:16 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: pipcet, emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Cc: emacs-devel@gnu.org
> Date: Tue, 10 Dec 2024 16:37:28 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> >> * MPS does a performance-critical job. Using it as a shared object might
> >> incur in a performance penalty. Having it in source form alongside the
> >> Emacs sources will result in opportunities for optimizations (LTO,
> >> PGO, ...) that may bring better performance.
> >
> > ...and more problems. MPS has made the decision not to work with gcc
> > -O3, only with -O2 or less, and LTO in particular is something MPS
> > cannot reliably support, IIUC.
> 
> That sounds worrysome. If I understand the implications of what you
> wrote, MPS basically depends on what the specifics of what gcc does. But
> gcc can do something else on future versions... not to mention what
> happens if the user wants to use other compilers.

MPS does quite a few questionable things and depends on several
assumptions that are not easy to uphold.  We have already bumped into
some of them, with signals (like SIGPROF), for example, and don't yet
have a satisfactory solution, at least IMO.  It is hardly surprising
for a library that attempts to literally pull the rug from under the
feet of a running program.  We will probably find other issues as we
continue testing the branch.  That is why it's important for as many
people as possible to test it and report any problems.  That is most
of what is left to do on the branch before we decide it is ready to be
merged (or, unlikely, decide the problems are too much for us to cope
with).

> >> For those reasons, incorporating MPS into the Emacs sources is the right
> >> thing to do.
> >
> > I don't think that's an option, because Emacs should remain capable of
> > switching to GPLv4 if and when that is released, and we don't know
> > whether the MPS license is compatible with such a future document.
> 
> Yeah, the licensing point is what I was too afraid to mention :-)

We could alternatively fork the library and keep it in a separate
repository, under a different but compatible license.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 15:10                                 ` Pip Cet via Emacs development discussions.
  2024-12-10 15:37                                   ` Óscar Fuentes
@ 2024-12-12  4:37                                   ` Xiyue Deng
  1 sibling, 0 replies; 137+ messages in thread
From: Xiyue Deng @ 2024-12-12  4:37 UTC (permalink / raw)
  To: Pip Cet, Óscar Fuentes; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2989 bytes --]

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

> On Tuesday, December 10th, 2024 at 13:10, Óscar Fuentes <ofv@wanadoo.es> wrote:
>> Xiyue Deng manphiz@gmail.com writes:
>> * It is very likely that we end doing some patching to the MPS sources
>> to adapt to our specific needs (if those patches end upstream or not,
>> that's another question.)
>
> If that ends up being the case, we'll have to make sure not to use shared libraries which may contain the upstream code. But that's true of all libraries; in the particular case of a Debian package, both the APT versioning schemes and ELF versioning are available for that.
>

I think it's possible to package a specifically customized variant of
MPS for use of Emacs, which can specify compiler options or even carry
patches specifically for use of Emacs.  It would be good to be able to
work with vanilla MPS to avoid this extra maintenance burden, of course.

>> * MPS does a performance-critical job. Using it as a shared object might
>> incur in a performance penalty. Having it in source form alongside the
>> Emacs sources will result in opportunities for optimizations (LTO,
>> PGO, ...) that may bring better performance.
>
> ...and more problems. MPS has made the decision not to work with gcc -O3, only with -O2 or less, and LTO in particular is something MPS cannot reliably support, IIUC.
>
>> * MPS does a correctness-critical job. Depending on multiple external
>> sources for such core component is a recipe for problems (future
>> changes by the MPS maintainers, patching by packagers, buggy
>> compilers, etc.) We need to keep a close watch on what MPS incarnation
>> we use. Better yet, total control.
>
> I think the correctness argument goes both ways: shared linking means bugs may be fixed for you automatically, as is routinely the case with libc.
>
>> For those reasons, incorporating MPS into the Emacs sources is the right
>> thing to do.
>
> I don't think that's an option, because Emacs should remain capable of switching to GPLv4 if and when that is released, and we don't know whether the MPS license is compatible with such a future document.
>
> So it's either static or dynamic linking; static links have these disadvantages:
>
> * shared libraries on GNU/Linux have versioning, static libs don't, AFAIK
> * legally, statically-linked binaries are quite different from dynamically-linked ones
> * someone might enable LTO and break MPS (this may be done automatically by the compiler rather than a user error)
> * with dynamic linking, there is some hope we could switch from libmps.so to libmps-debug.so without having to recompile Emacs, which would help us diagnose crashes in their actual environment
>

Thanks for the summary!

I would hope using dynamic linking would not be a deal breaker, and the
advantages vs a static lib you listed are good from distribution point
of view.

> Pip
>

-- 
Regards,
Xiyue Deng

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10  4:17                             ` Xiyue Deng
                                                 ` (2 preceding siblings ...)
  2024-12-10 13:10                               ` Óscar Fuentes
@ 2024-12-10 13:20                               ` Eli Zaretskii
  2024-12-10 14:46                               ` Pip Cet via Emacs development discussions.
  4 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 13:20 UTC (permalink / raw)
  To: Xiyue Deng; +Cc: ofv, emacs-devel

> From: Xiyue Deng <manphiz@gmail.com>
> Date: Mon, 09 Dec 2024 20:17:54 -0800
> 
> If making MPS available in Debian would help Emacs packaging I'm willing
> to work on this (in the coming weeks as igc may not land with the
> upcoming Emacs 30 release so not in a hurry.)

Thanks in advance.

> I have a few questions regarding the Emacs/igc usage of MPS:
> 
> * Does igc require only mps.{h,c} or more sources from the MPS source
>   package?  It looks like there are many sources and it's autotools
>   build script fails with GCC 14.2 in Debian Trixie due to several
>   "-Werror"s.  It may be easier to just compile and ship the required
>   subset, though it may require providing a custom build script.

I suggest to use the detailed instructions under "Building the MPS for
development" in manual/build.txt.  This is what I did, and had no
serious problems, even though I needed to concoct the various *.gmk
Makefiles because my platform was not supported OOTB (GNU/Linux is
supported OOTB).  The reason I suggest that is that an official Debian
distro of MPS had better included the several different builds of the
library ("cool" and "hot"), and also included all the headers that any
program using MPS might need, even if Emacs uses just part of them.
The package should also include the Info manual, IMO.

> * Does igc work with a dynamically linked MPS library?

The MPS Makefiles build only static libraries, not shared libraries.
Since this library implements GC, and Emacs must have some GC, why
does it make sense to build MPS as a shared library?

>   Currently I have
>   seen people suggesting that directly compiling the source, which is
>   effectively like using MPS as a static library.  It would be less
>   useful to package a static-only library in Debian because in case of
>   any issues (usually security) updating the library is insufficient and
>   its dependencies would need to be rebuilt as well.  Using a dynamic
>   library would solve this scalability issue, and it would be good to
>   know if igc can work with a dynamically linked MPS.

If you must build a shared library, you are basically on your own.
And doing that is in stark contrast to what you asked above about
headers used only by Emacs.

> * Does igc work with the latest tagged version (release-1.118.0) or only
>   the latest snapshot?  Packaging a tagged version would be easier,
>   though working with a snapshot may also work with a bit of extra
>   efforts.

I built the official release, not a snapshot, FWIW.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10  4:17                             ` Xiyue Deng
                                                 ` (3 preceding siblings ...)
  2024-12-10 13:20                               ` Eli Zaretskii
@ 2024-12-10 14:46                               ` Pip Cet via Emacs development discussions.
  4 siblings, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 14:46 UTC (permalink / raw)
  To: Xiyue Deng; +Cc: Óscar Fuentes, emacs-devel

On Tuesday, December 10th, 2024 at 04:17, Xiyue Deng <manphiz@gmail.com> wrote:
> Óscar Fuentes ofv@wanadoo.es writes:
> If making MPS available in Debian would help Emacs packaging I'm willing
> to work on this (in the coming weeks as igc may not land with the
> upcoming Emacs 30 release so not in a hurry.)

I think that would be great, even if we decide we cannot make do with an unmodified upstream version of MPS.

> * Does igc require only mps.{h,c} or more sources from the MPS source
> package? It looks like there are many sources and it's autotools
> build script fails with GCC 14.2 in Debian Trixie due to several
> "-Werror"s. It may be easier to just compile and ship the required
> subset, though it may require providing a custom build script.

Ravenbrook recommends building the library directly by compiling mps.c, and that's what I usually do. I still ended up having to remove -Werror from the .mk files, at some point...

> * Does igc work with a dynamically linked MPS library?

It definitely does, because that's what we're using on Android. On other systems, statically-linked code may be very slightly faster, but IMHO packaging a statically-linked Emacs+MPS binary is problematic for a few reasons, just as statically linking to libc would be. (It should go without saying that "we always use it" is not sufficient reason for using a statically-linked library)

> Currently I have
> seen people suggesting that directly compiling the source, which is
> effectively like using MPS as a static library.

That works, and it's what I do on GNU/Linux, but we should probably change our approach there.

> It would be less
> useful to package a static-only library in Debian because in case of
> any issues (usually security) updating the library is insufficient and
> its dependencies would need to be rebuilt as well. Using a dynamic
> library would solve this scalability issue, and it would be good to
> know if igc can work with a dynamically linked MPS.

It definitely can work, and I'll look into switching my builds over to using dynamic linking.

> * Does igc work with the latest tagged version (release-1.118.0) or only
> the latest snapshot? Packaging a tagged version would be easier,
> though working with a snapshot may also work with a bit of extra
> efforts.

It's not quite clear to me yet whether we're going to be able to use unpatched MPS on all architectures (that's somewhat unlikely) or on every architecture except for 32-bit x86 (more likely).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 21:49                           ` Óscar Fuentes
  2024-12-10  4:17                             ` Xiyue Deng
@ 2024-12-10 13:09                             ` Eli Zaretskii
  2024-12-10 13:20                               ` Óscar Fuentes
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 13:09 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Mon, 09 Dec 2024 22:49:13 +0100
> 
> In fact, I'm pretty sure that any experienced autotools hacker can add
> MPS to the Emacs build in no time. The only annoying bit is that some
> MPS headers collide with Emacs'

??? The MPS build instructions in manual/build.txt say to copy to
/usr/include only the headers that begin with "mps", and there are no
such headers in Emacs, AFAICT.  So what kind of collisions did you
see?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 13:09                             ` Eli Zaretskii
@ 2024-12-10 13:20                               ` Óscar Fuentes
  2024-12-10 14:41                                 ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 13:20 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Óscar Fuentes <ofv@wanadoo.es>
>> Date: Mon, 09 Dec 2024 22:49:13 +0100
>> 
>> In fact, I'm pretty sure that any experienced autotools hacker can add
>> MPS to the Emacs build in no time. The only annoying bit is that some
>> MPS headers collide with Emacs'
>
> ??? The MPS build instructions in manual/build.txt say to copy to
> /usr/include only the headers that begin with "mps", and there are no
> such headers in Emacs, AFAICT.  So what kind of collisions did you
> see?

I don't recall the details, but passing -I/path/to/mps/code to Emacs'
config script resulted in a failed build because the wrong headers were
picked while compiling certain .c files. That should be quite easy to
replicate, if you are interested.

That problem does not happen if the directory passed to config only
contains mps*.h files.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-10 13:20                               ` Óscar Fuentes
@ 2024-12-10 14:41                                 ` Eli Zaretskii
  0 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 14:41 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Tue, 10 Dec 2024 14:20:44 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Óscar Fuentes <ofv@wanadoo.es>
> >> Date: Mon, 09 Dec 2024 22:49:13 +0100
> >> 
> >> In fact, I'm pretty sure that any experienced autotools hacker can add
> >> MPS to the Emacs build in no time. The only annoying bit is that some
> >> MPS headers collide with Emacs'
> >
> > ??? The MPS build instructions in manual/build.txt say to copy to
> > /usr/include only the headers that begin with "mps", and there are no
> > such headers in Emacs, AFAICT.  So what kind of collisions did you
> > see?
> 
> I don't recall the details, but passing -I/path/to/mps/code to Emacs'
> config script resulted in a failed build because the wrong headers were
> picked while compiling certain .c files. That should be quite easy to
> replicate, if you are interested.

If that's what you did, then I understand.  The MPS instructions tell
to copy all the mps*.h files into your /usr/include tree, not what you
did.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
  2024-12-09 21:49                           ` Óscar Fuentes
@ 2024-12-09 23:13                           ` chad
  2024-12-10 12:41                           ` Eli Zaretskii
  2 siblings, 0 replies; 137+ messages in thread
From: chad @ 2024-12-09 23:13 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, pipcet, luangruo, ali_gnu2, emacs-devel,
	Gerd Möllmann, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

On Mon, Dec 9, 2024 at 4:07 PM Stefan Kangas <stefankangas@gmail.com> wrote:

> [...]
> Taking a look at README-IGC, it seems like we're still missing build
> instructions for Debian.
>

FWIW, I use a somewhat wacky Debian setup, in that it's basically pure
Debian run inside ChromeOS, which adds some mild container
restrictions. In practice, the only impact this has is that my window
system/manager is pre-selected and mostly unchangeable. I switched to
scatch/igc a couple weeks ago, and have noticed no issues. I used some
advice from this list, which was basically:

  git clone https://github.com/Ravenbrook/mps.git
  cd mps/code
  cc -O2-c mps.c
  ar rvs libmps.a mps.o

make this available; I put it in /usr/local/lib
make the header files available; I ended up doing

  cp mps*.h /usr/local/include

configure emacs-igc with "--with-mps=yes"; I also used
"--enable-checking=yes --enable-check-lisp-object-type=yes", which is
normal practice for me with non-release builds.

I suspect that not all of mps/code/mps*.h need to be copied into
/usr/local/include,
but I did the first few one at a time before breaking out the shotgun.

I've been using this emacs regularly for almost 3 weeks now, but my usage
has
been quite light; mostly short Org/text docs, plus some occasional package
updates (and thus byte & native compiling).

I hope that helps,
~Chad

[-- Attachment #2: Type: text/html, Size: 2116 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Merging MPS a.k.a. scratch/igc, yet again
  2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
  2024-12-09 21:49                           ` Óscar Fuentes
  2024-12-09 23:13                           ` chad
@ 2024-12-10 12:41                           ` Eli Zaretskii
  2 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 12:41 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: pipcet, luangruo, ali_gnu2, emacs-devel, gerd.moellmann, monnier

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Mon, 9 Dec 2024 13:06:18 -0800
> Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, 
> 	emacs-devel@gnu.org, Gerd Möllmann <gerd.moellmann@gmail.com>, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > From where I stand, what's left to do on the branch is stability:
> > using the branch, reporting bugs, and fixing them, especially on some
> > rarer platforms (*BSD, for example).  Plus some decisions: do we fork
> > MPS or not, for example.  So it isn't such a distant future.
> 
> In that case, I'd suggest that we start working on getting README-IGC
> into an excellent state.  In August, when I last tried building the
> branch, getting it to build was non-trivial, but I didn't try with the
> latest instructions.

Sure, if the instructions could be improved, this would be good
regardless.

> Once we feel happy that it's reasonably straightforward to follow the
> instructions, I'd suggest that Someone (TM) makes a post to emacs-devel,
> asking people to start seriously testing the branch.  Such a post should
> normally get picked up by Emacs News, Reddit, etc. and hopefully the
> branch will then start seeing wider use.  (Remember to Cc Sacha Chua to
> get it on Emacs News.)

This was already done:

  https://lists.gnu.org/archive/html/emacs-devel/2024-09/msg00257.html

and some people already provide such feedback.  But, of course,
repeating the request for testing and feedback can never do any harm,
and can be posted right now.

> I'm sure that users will be excited to help test igc once they
> understand that we're working seriously on stabilizing it in preparation
> of getting it merged.

Let's hope you are right.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09 14:39                       ` Eli Zaretskii
  2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
@ 2024-12-10  0:09                         ` Stefan Kangas
  2024-12-10 12:59                           ` Eli Zaretskii
  1 sibling, 1 reply; 137+ messages in thread
From: Stefan Kangas @ 2024-12-10  0:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stefan Kangas <stefankangas@gmail.com>
>> Date: Sun, 8 Dec 2024 23:59:14 -0500
>> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>
>> Assuming that we are 100% sure that mpc will land, then I can agree that
>> making any changes here is basically wasted effort.  Unless, of course,
>> the change would also simplify the mpc work (would it?).
>
> The igc branch already dropped WIDE_EMACS_INT support, so it only
> supports USE_LSB anyway.

I thought that WIDE_EMACS_INT will remain supported in non-MPS
(i.e. "old GC") builds even after the igc merge?  Am I mistaken?

>> Based on my limited mucking around in the GC, it does interfere somewhat
>> because you do need to understand both configurations, at least on a
>> high level, and once you do you need to mentally filter that stuff out
>> when reading the code.  So I think I'd appreciate the simplification, at
>> least.
>
> The simplification is minuscule at best.  We need to mask some bits,
> either at the LSB end or at MSB end, that's all the difference.  And
> we have macros that hide the differences from most levels.

I agree that it's not a major issue, indeed.  You don't need to look at
this unless you want to understand how we do GC tagging in detail.

OTOH, complexity almost always presents itself in small increments that
individually don't look like much.  It's only with the combined effect
of many such small increments that they become a concern; hence the
desire to take similarly small steps towards removing complexity.

>> If the only known drawbacks are stability concerns, we could also
>> consider an intermediate step along these lines:
>>
>> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
>> on master.
>
> That would put the WIDE_EMACS_INT configuration at risk, since that
> configuration will need changes.

That's why I proposed disabling it on master tentatively, with the
option to revert the change if we don't like it.  Setting a flag back to
0 is easy enough.  But making the experiment I proposed might also
demonstrate that we're fine, after all.

OTOH, if we don't make the experiment, we have less data on which to
base our decision.

>> See what issues crop up, if any.  If anything does come up,
>> ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting
>> to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG.  If
>> nothing too bad comes up, we can then consider removing the associated
>> code in Emacs 32.
>
> My point is that all of that could be avoided entirely, given some
> development decisions which basically drop !USE_LSB_TAG
> configurations.

Is your thinking here that we could merge MPS, wait, and then when it
comes time to remove the old GC, we will get to drop !USE_LSB_TAG for
free?  If yes, couldn't that leave us waiting for a very long time
indeed?

Or are you saying something else?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10  0:09                         ` pdumper on Solaris 10 Stefan Kangas
@ 2024-12-10 12:59                           ` Eli Zaretskii
  2024-12-10 13:39                             ` Óscar Fuentes
  2024-12-10 15:23                             ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 12:59 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: pipcet, luangruo, ali_gnu2, emacs-devel

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Mon, 9 Dec 2024 19:09:59 -0500
> Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, 
> 	emacs-devel@gnu.org
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Stefan Kangas <stefankangas@gmail.com>
> >> Date: Sun, 8 Dec 2024 23:59:14 -0500
> >> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> >>
> >> Assuming that we are 100% sure that mpc will land, then I can agree that
> >> making any changes here is basically wasted effort.  Unless, of course,
> >> the change would also simplify the mpc work (would it?).
> >
> > The igc branch already dropped WIDE_EMACS_INT support, so it only
> > supports USE_LSB anyway.
> 
> I thought that WIDE_EMACS_INT will remain supported in non-MPS
> (i.e. "old GC") builds even after the igc merge?  Am I mistaken?

Probably, but who will want to give up igc to get back WIDE_EMACS_INT
(if indeed they are incompatible, which seems to be in disagreement)?
I most probably won't.

> OTOH, complexity almost always presents itself in small increments that
> individually don't look like much.

But here we have only a handful of increments, so the sum is also
minuscule.

> >> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
> >> on master.
> >
> > That would put the WIDE_EMACS_INT configuration at risk, since that
> > configuration will need changes.
> 
> That's why I proposed disabling it on master tentatively, with the
> option to revert the change if we don't like it.  Setting a flag back to
> 0 is easy enough.  But making the experiment I proposed might also
> demonstrate that we're fine, after all.

I think we already know that we are "not fine"?  Didn't someone say
that stack scanning is broken?

> > My point is that all of that could be avoided entirely, given some
> > development decisions which basically drop !USE_LSB_TAG
> > configurations.
> 
> Is your thinking here that we could merge MPS, wait, and then when it
> comes time to remove the old GC, we will get to drop !USE_LSB_TAG for
> free?  If yes, couldn't that leave us waiting for a very long time
> indeed?

Maybe so, but why is such a long wait a problem?  GC _works_, and
works well.  There are no pressing problems there, and we've lived
with it for many years virtually without changes.  What's the urge to
make modifications there now, especially when there are chances we
will be dropping this GC at some point?

IMO, our main task here is to develop the application levels of Emacs,
and infrastructure needed to enable such developments.  We should only
invest efforts in stuff like GC and other basics if we see significant
issues, or could envision significant performance gains.  There are no
such issues or gains here, AFAIU.  So diverting our humble resources
to such jobs is a mistake, IMO.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 12:59                           ` Eli Zaretskii
@ 2024-12-10 13:39                             ` Óscar Fuentes
  2024-12-10 14:39                               ` Eli Zaretskii
                                                 ` (2 more replies)
  2024-12-10 15:23                             ` Pip Cet via Emacs development discussions.
  1 sibling, 3 replies; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 13:39 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Maybe so, but why is such a long wait a problem?  GC _works_, and
> works well.

Working on certain projects with lsp-mode is a miserable experience due
to all the random pauses.

My perception of the past week or two using igc is that those pauses are
much less jarring, if perceptible at all. I need more time to make a
definitive judgment, though.

As code edition evolves and Emacs is put on more demanding tasks the
limitations of GC become more obvious (and CPUs are not getting faster
anymore).

Apart from that, I'm convinced that there is quite a bit of evolutionary
pressure exerted by GC on the Elisp package ecosystem: something that
works too slowly or is too bumpy does not atract users and die. Others
may end devoting a lot of effort to optimize GC usage and when they
finally work "well enough" (for some generous interpretation) most
potential users already made their mind (flx.el is a paradigmatic case)
or the package author simply stops working on it, sometimes without
making the first release.

GC also diminishes the benefits of native-comp and other performance
enhancements: no matter how fast you make your Elisp execution engine,
the time taken by GC stablishes a hard limit.

But the "stop the world" mode of GC operation makes user experience
quite worse even if the total time to perform a task is smaller.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 13:39                             ` Óscar Fuentes
@ 2024-12-10 14:39                               ` Eli Zaretskii
  2024-12-10 15:21                                 ` Óscar Fuentes
  2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
  2024-12-10 18:13                               ` pdumper on Solaris 10 Gerd Möllmann
  2 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 14:39 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Tue, 10 Dec 2024 14:39:54 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Maybe so, but why is such a long wait a problem?  GC _works_, and
> > works well.
> 
> Working on certain projects with lsp-mode is a miserable experience due
> to all the random pauses.

And the changes discussed in this sub-thread will make it
spectacularly faster?

> My perception of the past week or two using igc is that those pauses are
> much less jarring, if perceptible at all. I need more time to make a
> definitive judgment, though.

I was not talking about igc, and its advantages are clear to me.
That's not what this sub-thread is about.

> As code edition evolves and Emacs is put on more demanding tasks the
> limitations of GC become more obvious (and CPUs are not getting faster
> anymore).
> 
> Apart from that, I'm convinced that there is quite a bit of evolutionary
> pressure exerted by GC on the Elisp package ecosystem: something that
> works too slowly or is too bumpy does not atract users and die. Others
> may end devoting a lot of effort to optimize GC usage and when they
> finally work "well enough" (for some generous interpretation) most
> potential users already made their mind (flx.el is a paradigmatic case)
> or the package author simply stops working on it, sometimes without
> making the first release.
> 
> GC also diminishes the benefits of native-comp and other performance
> enhancements: no matter how fast you make your Elisp execution engine,
> the time taken by GC stablishes a hard limit.
> 
> But the "stop the world" mode of GC operation makes user experience
> quite worse even if the total time to perform a task is smaller.

All correct, but completely irrelevant to the issue at hand.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 14:39                               ` Eli Zaretskii
@ 2024-12-10 15:21                                 ` Óscar Fuentes
  2024-12-10 16:39                                   ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 15:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> All correct, but completely irrelevant to the issue at hand.

I was specifically addressing your "GC works, and works well".

A GC that takes big chunks of time on what is essentially a
single-threaded execution engine and, even more significantly,
introduces pauses that impacts user experience, does not work well, I
would say that it barely works at all, in the sense that it is far from
adequate for the kind of application Emacs is.

I mean, if igc is finally deemed a success, any effort directed at
keeping GC at the expense of anything else would be work invested on a
misfeature, IMHO.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 15:21                                 ` Óscar Fuentes
@ 2024-12-10 16:39                                   ` Eli Zaretskii
  0 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 16:39 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Cc: emacs-devel@gnu.org
> Date: Tue, 10 Dec 2024 16:21:13 +0100
> X-Spam-Status: No
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > All correct, but completely irrelevant to the issue at hand.
> 
> I was specifically addressing your "GC works, and works well".
> 
> A GC that takes big chunks of time on what is essentially a
> single-threaded execution engine and, even more significantly,
> introduces pauses that impacts user experience, does not work well, I
> would say that it barely works at all, in the sense that it is far from
> adequate for the kind of application Emacs is.
> 
> I mean, if igc is finally deemed a success, any effort directed at
> keeping GC at the expense of anything else would be work invested on a
> misfeature, IMHO.

This sub-thread was not about GC vs igc, it was about changes in GC
itself that would never come even close to igc.  Everything I wrote
should be assessed from that angle.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 13:39                             ` Óscar Fuentes
  2024-12-10 14:39                               ` Eli Zaretskii
@ 2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
  2024-12-10 16:04                                 ` Óscar Fuentes
  2024-12-11  5:27                                 ` Gap buffer problem? Gerd Möllmann
  2024-12-10 18:13                               ` pdumper on Solaris 10 Gerd Möllmann
  2 siblings, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 15:38 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
> Eli Zaretskii eliz@gnu.org writes:
> > Maybe so, but why is such a long wait a problem? GC works, and
> > works well.
> 
> Working on certain projects with lsp-mode is a miserable experience due
> to all the random pauses.

To be fair, part of that may be the gap buffer problem rather than GC.

> My perception of the past week or two using igc is that those pauses are
> much less jarring, if perceptible at all. I need more time to make a
> definitive judgment, though.

If you do, and it's negative, please take into account that MPS offers many tunable parameters, and hasn't been fine-tuned for Emacs yet. Even if the current scratch/igc branch isn't satisfactory by itself, it's very likely it can be improved by changing some numbers.

> But the "stop the world" mode of GC operation makes user experience
> quite worse even if the total time to perform a task is smaller.

Of course, these problems are largely fixable, and have been fixed, by such approaches as the fork()-based GC I wrote, which Eli vetoed (I believe the same applies to moving the GC mark bits to their own memory regions, which would have allowed us to interrupt GC on user input). The "don't touch the GC" edict has done a great deal of harm to Emacs; this is relevant because we're now discussing a simplification of the GC code which would help MPS, but is being vetoed (again), while putting effort into making our current code even more complicated by including an impossible code path is being encouraged.

So, no, the current GC doesn't work well, it does cause problems, its code is overly complicated, and simplifications would make switching to MPS a lot easier. All is not well in GC land.

Put drastically, if MPS fails to land, the most likely reason is the capriciously-applied "do not touch the GC" rule.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
@ 2024-12-10 16:04                                 ` Óscar Fuentes
  2024-12-10 17:23                                   ` Eli Zaretskii
  2024-12-11  5:27                                 ` Gap buffer problem? Gerd Möllmann
  1 sibling, 1 reply; 137+ messages in thread
From: Óscar Fuentes @ 2024-12-10 16:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

>> My perception of the past week or two using igc is that those pauses are
>> much less jarring, if perceptible at all. I need more time to make a
>> definitive judgment, though.
>
> If you do, and it's negative, please take into account that MPS offers
> many tunable parameters, and hasn't been fine-tuned for Emacs yet.
> Even if the current scratch/igc branch isn't satisfactory by itself,
> it's very likely it can be improved by changing some numbers.

Noted, thanks.

> this is relevant because we're now discussing a simplification of the
> GC code which would help MPS

Those modifications can go on a branch (a fork of scratch/igc). When/if
igc demonstrates its virtues and considered a considerable improvement
for Emacs, related changes surely meet less oposition. Then you can
point to that branch and suggest merging it instead of scratch/igc.

> Put drastically, if MPS fails to land, the most likely reason is the
> capriciously-applied "do not touch the GC" rule.

What appears capriciously from the outside, may be responsible
maintenance from the inside.

Eli and a few others have a very long term commitment with Emacs' and,
as maintainers, consider not degrading stability their principal duty
towards users, which in practice means being almost overly conservative.
And even if I sometimes get irritated by some decisions, knowing that I
can rely on Emacs working (save for very occassional tweaks) is
something that I appreciate very much.

Remember XEmacs?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 16:04                                 ` Óscar Fuentes
@ 2024-12-10 17:23                                   ` Eli Zaretskii
  0 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 17:23 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: pipcet, emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Cc: emacs-devel@gnu.org
> Date: Tue, 10 Dec 2024 17:04:39 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > Put drastically, if MPS fails to land, the most likely reason is the
> > capriciously-applied "do not touch the GC" rule.
> 
> What appears capriciously from the outside, may be responsible
> maintenance from the inside.

More importantly, since some platforms we care about probably won't
support MPS, it could be that the old GC will have to stay with us for
a very long time, alongside MPS.  Keeping that old GC code stable and
reliable is thus very important even if MPS will land (which I
personally hope it will).

Emacs is a very stable platform, and our users rely on us to keep it
stable, even though we sometimes add semi-revolutionary new features
to it.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Gap buffer problem?
  2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
  2024-12-10 16:04                                 ` Óscar Fuentes
@ 2024-12-11  5:27                                 ` Gerd Möllmann
  2024-12-11  8:50                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 14:22                                   ` Eli Zaretskii
  1 sibling, 2 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11  5:27 UTC (permalink / raw)
  To: Pip Cet via Emacs development discussions.; +Cc: Óscar Fuentes, Pip Cet

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>> Eli Zaretskii eliz@gnu.org writes:
>> > Maybe so, but why is such a long wait a problem? GC works, and
>> > works well.
>> 
>> Working on certain projects with lsp-mode is a miserable experience due
>> to all the random pauses.
>
> To be fair, part of that may be the gap buffer problem rather than GC.

Could you please tell more about the gap buffer problem?

I've read a little about the tradeoffs between gap buffers, piece
tables, ropes, but I'm wondering if there is something concrete already
known for sure that is a performance problem in Emacs. Maybe a bug that
has been analyzed or something.

(I'm asking because I just recently encountered a performance problem
when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
editing there was so slow that it was absolutely no fun, and that on a
an M1 pro. Haven't investigated the reason.)

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  5:27                                 ` Gap buffer problem? Gerd Möllmann
@ 2024-12-11  8:50                                   ` Pip Cet via Emacs development discussions.
  2024-12-11  9:35                                     ` Gerd Möllmann
  2024-12-11 14:22                                   ` Eli Zaretskii
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11  8:50 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>>> Eli Zaretskii eliz@gnu.org writes:
>>> > Maybe so, but why is such a long wait a problem? GC works, and
>>> > works well.
>>>
>>> Working on certain projects with lsp-mode is a miserable experience due
>>> to all the random pauses.
>>
>> To be fair, part of that may be the gap buffer problem rather than GC.
>
> Could you please tell more about the gap buffer problem?

Just anecdotes, I'm afraid.  My problem was a large buffer of test
descriptions for a programming language, and I was running the tests and
modifying the buffer to contain the output for each test in a block
after the test itself. That worked, but running several tests in
parallel, moving back and forth in the buffer to modify text as the
output came in ... not so much.

I also recall discussion somewhere (nullprogram.com, maybe) about
multiple cursors and the gap buffer, and that's also a potential use
case where the gap buffer would make things very slow.

> I've read a little about the tradeoffs between gap buffers, piece
> tables, ropes, but I'm wondering if there is something concrete already
> known for sure that is a performance problem in Emacs. Maybe a bug that
> has been analyzed or something.

I'd be very interested in such a bug. Replacing the gap buffer
assumption is quite hard: IIRC, the main problem is that the regexp code
has been hacked to support gap buffers but not other data structures, so
we'd need to do something about that.

> (I'm asking because I just recently encountered a performance problem
> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
> editing there was so slow that it was absolutely no fun, and that on a
> an M1 pro. Haven't investigated the reason.)

Interesting. It may be worth it to try reproducing that and disabling
modes one by one to find out which one is at fault. I suspect that it's
overlays/the interval tree rather than the gap buffer per se (however,
if we ever replace the gap buffer code, we should make sure its
replacement actually handles buffer text and text properties/intervals
in an integrated manner, rather than storing just buffer text).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  8:50                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-11  9:35                                     ` Gerd Möllmann
  2024-12-11 11:50                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 12:27                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11  9:35 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>>
>>> On Tuesday, December 10th, 2024 at 13:39, Óscar Fuentes <ofv@wanadoo.es> wrote:
>>>> Eli Zaretskii eliz@gnu.org writes:
>>>> > Maybe so, but why is such a long wait a problem? GC works, and
>>>> > works well.
>>>>
>>>> Working on certain projects with lsp-mode is a miserable experience due
>>>> to all the random pauses.
>>>
>>> To be fair, part of that may be the gap buffer problem rather than GC.
>>
>> Could you please tell more about the gap buffer problem?
>
> Just anecdotes, I'm afraid.  My problem was a large buffer of test
> descriptions for a programming language, and I was running the tests and
> modifying the buffer to contain the output for each test in a block
> after the test itself. That worked, but running several tests in
> parallel, moving back and forth in the buffer to modify text as the
> output came in ... not so much.
>
> I also recall discussion somewhere (nullprogram.com, maybe) about
> multiple cursors and the gap buffer, and that's also a potential use
> case where the gap buffer would make things very slow.

Thanks.

>
>> I've read a little about the tradeoffs between gap buffers, piece
>> tables, ropes, but I'm wondering if there is something concrete already
>> known for sure that is a performance problem in Emacs. Maybe a bug that
>> has been analyzed or something.
>
> I'd be very interested in such a bug. Replacing the gap buffer
> assumption is quite hard: IIRC, the main problem is that the regexp code
> has been hacked to support gap buffers but not other data structures, so
> we'd need to do something about that.
>
>> (I'm asking because I just recently encountered a performance problem
>> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
>> editing there was so slow that it was absolutely no fun, and that on a
>> an M1 pro. Haven't investigated the reason.)
>
> Interesting. It may be worth it to try reproducing that and disabling
> modes one by one to find out which one is at fault. I suspect that it's
> overlays/the interval tree rather than the gap buffer per se (however,

Yeah, maybe I'll investigate that further at some point, not sure. I did
try with VSCode and Zed now, though, for no good reason. They don't have
a problem.

> if we ever replace the gap buffer code, we should make sure its
> replacement actually handles buffer text and text properties/intervals
> in an integrated manner, rather than storing just buffer text).
>
> Pip

And if I may add a wish to the future author: Make whatever you use 
persistent data structures, so that one could think of letting redisplay
run concurrently. Really! :-)



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  9:35                                     ` Gerd Möllmann
@ 2024-12-11 11:50                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 13:22                                         ` Gerd Möllmann
  2024-12-11 12:27                                       ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 11:50 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>> if we ever replace the gap buffer code, we should make sure its
>> replacement actually handles buffer text and text properties/intervals
>> in an integrated manner, rather than storing just buffer text).
>>
>> Pip
>
> And if I may add a wish to the future author: Make whatever you use
> persistent data structures, so that one could think of letting redisplay
> run concurrently. Really! :-)

You won't be surprised to hear I've been playing with some code, so
could I ask you to expand on this point? What precisely does redisplay
require? Full snapshotting or would it be sufficient to have
fine-grained locking?

(However, before anyone gets their hopes and/or fears up, my code
depends on disabling most of the regexp code, and the additional number
of garbage-collected objects is so great that I concluded I'd wait for
MPS to land before resuming work on it. One of the few distinct
advantages of the current gap buffer approach is that it doesn't affect
GC...)

I know virtually nothing about redisplay.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 11:50                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-11 13:22                                         ` Gerd Möllmann
  2024-12-11 14:53                                           ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 13:22 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

[-- Attachment #1: Type: text/plain, Size: 1621 bytes --]

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>> if we ever replace the gap buffer code, we should make sure its
>>> replacement actually handles buffer text and text properties/intervals
>>> in an integrated manner, rather than storing just buffer text).
>>>
>>> Pip
>>
>> And if I may add a wish to the future author: Make whatever you use
>> persistent data structures, so that one could think of letting redisplay
>> run concurrently. Really! :-)
>
> You won't be surprised to hear I've been playing with some code, 

Indeed, I was just thinking to myself "I knew it" :-).
Two thumbs up!

> so could I ask you to expand on this point? What precisely does
> redisplay require? Full snapshotting or would it be sufficient to have
> fine-grained locking?

Maybe it's helpful when I tell something about the background. Some time
last year I asked myself if I could make Emacs more than one of my
plenty of CPU cores without solving the multi-threaded Elisp problem.
And the idea was that I could do that, possibly, by letting redisplay
happen in another thread.

I later realized while thinking about the details, that this undertaking
is an order of magnitude too large for me. Everything taking more than a
few months is. And, in addition, I wouldn't want to do data structures
in C anyway.

So it's history. Won't happen. But, there is an incomplete, terse,
terrible Org file from those times that I kept. I talked a bit about
this with Stefan Monnier and Eli at the time, just FYI. 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Concurrent redisplay --]
[-- Type: text/x-org, Size: 19475 bytes --]

:PROPERTIES:
:ID:       E5E87FA1-48D1-4753-AAAE-E86FB36F5742
:END:
#+title: Concurrent Redisplay
# -*- mode: org; eval: (auto-fill-mode 1)  -*-
#+STARTUP: content
#+AUTHOR: gerd@gnu.org

* Concurrent Redisplay

Redisplay is currently performed sequentially as part of Emacs'
command loop. The command loop calls =redisplay= to make sure that
changes in buffers are made visible on the screen.

Concurrent redisplay means to change Emacs' architecture, so that
redisplay can be done concurrently with the command loop and running
Elisp.

In this document, I'm trying to get an impression if a parallel
redisplay is achievable, from a very high-level perspective at least.
To make thinking about this possible, I make a number of assumptions
and simplications, which are described in the following.

** Multi-threaded Lisp

This document is no way concerned with making Elisp multi-threaded, if
that's possible, if so how, and what else.

Due to demand from others, I'm also considering the case that a
concurrent redisplay can call Lisp.  How this is made possible, I'm
not considering.

** Possible Gains

- Distribute work on more than one CPU core
- Makes it possible to implement advanced display features in the
  future that would be too costly to perform in a sequential
  redisplay.

** Concurrency Architecture

As a simple to reason about architecture, I assume that Emacs will
consist of two modules:

- The =main= module consists of command loop and Lisp, and runs in
  one thread.
- The =redisplay= module runs in another thread.

Both modules are isolated from each other, and may not access data
owned by the other module.  Communication between modules only happens
by exchanging non-blocking messages.

I could imagine a GUI/TUI backend model in this picture, for good
measure, but won't consider that further.

Random links:

The Problem with Threads
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

Plain Threads are the Goto of Modern Computing
https://isocpp.org/blog/2014/12/plain-threads-are-the-goto-of-todays-computing

** Display Model

Concurrent redisplay in the architecture described must work on a
model that it owns.

It is assumed for now, that this model represents a buffer's text plus
a number of properties/variables relevant to redisplay, like faces
that apply to regions of text. See [[*Redisplay Model]].

** Triggering Redisplay

Concurrentl redisplay could choose to display at its own whim, or
triggered by receiving a message from the main module.  It could, for
example, decide to redisplay based on available hardware frame
rates. How this is done is not considered here.

** Display Update

Roughly speaking, current redisplay can be divided into two parts:

- Produce desired glyphs, which describe what the display should look
  like.
- Update the display by comparing current and desired glyphs and
  calling the GUI/TUI backend(s).  Then set the current glyphs to the
  desired glyphs for the next round.

The update part is not considered in the following.  There are several
conceivable ways to implement an update:

- Update in the =redisplay= module
  * Call GUI backend directly
  * Post messages to the =main= module, which calls GUI backend
  * Post messages to a possible GUI module
- Update in the =main= module
  + =Redisplay= posts message containing desired glyphs

This looks like a solvable problem to me. So, for simplicity, I don't
consider it here.  In the following, "redisplay" mainly refers to
producing desired glyphs.

** One Window/Buffer

For simplicity, I only consider one window displaying one buffer.

An interesting, maybe even natural, idea might be to run more than one
redisplay in parallel, one for each window, but that is also not in
scope here.

** Frame-based Redisplay

Also not considered here is the update phase of TTY frames, which
currently requires a view of all windows on a frame at once, which is
commonly called frame-based redisplay.

* Aspects to Consider

The following is a list of things to consider when thinking of making
redisplax concurrent.

** jit-lock

Current redisplay calls run =fontification-functions= to ensure that
properties are up to date for the text being displayed.  This will not
be possible in a concurrent redisplay, unless one assumes that Lisp
can be called from multiple threads.

Stefan (and Eli?) thinks we will eventually need to be able to call
some Lisp-ish code from a concurrent redisplay before it can fully
replace the existing synchronous redisplay.

I myself would accept a display that is not always 100% accorate, for
exmaple because parts of the text have not been fontified yet, or
compositions determined.

Instead of calling Lisp from redisplay, one use background
fontification in the main module together with guesstimates which
regions of text will actually be displayed to make sure that
fontification results are visible as soon as possible.

The redisplay module could also post messages to guide this guessing,
for example at the end of redisplay.

** Hooks and functions

In general, the idea is that =redisplay= posts messages to the =main=
module that certain things have happened.

Hooks/functions run by current redisplay are:
  - window-scroll-functions
  - activate-menubar-hook (redisplay_internal -> prepare_menu_bar ->
    update_menu_bar)
  - update-menubar-hook (update_menu_bar)
  - fontification-functions (jit-lock, and maybe others)
  - ?

Open: what are the expectations of these hook function about
variables, buffers?

** Caches

Redisplay needs access to face, font and image caches which are stored
on frames (owned by main module).

I propose as an idea to remove the caches from frames and give
ownership of these to concurrent redisplay.  Could be a table frame ->
caches.  Some communication is necessary from the main module to
redisplay for frame changes, clearing caches from Lisp, ....

- does face/font/image code called during redisplay call Lisp?
- can face etc. code be run from another thread?

** Glyph matrices

Glyph matrices are a form of cache, so they should be treated likewise.

Depends on which module does the update phase of redisplay.

** Point and mark

Needed for region highlighting.

Create new model version when changing these.  This requires that
creating new model versions is reasonably fast.

** window-start computation

Redisplay posts message back to main module containing information
about what is on the display.  Window start/end could be part of that.

** move_it functions

This concerns Lisp functions like vertical-motion.  Rely (relied?)
partly on current glyph matricx, and otherwise on redisplay functions
that are used without producing glyphs.

- do expect results based on current text, even if not displayed yet.

Open.  No solution in mind that isn't ugly (locking, maybe).

- need display model to operate on.
- Pixel positions in text?

** Mouse highlight

Open. No idea how that is done nowadays.  It used to use the current
matrices, only.

** TTYs

The update phase of redisplay on ttys needs a view of the whole
frame's current and desired glyph matrices for optimization.  This is
done by giving tty frames matrices, and sub-allocating window matrices
from these.

The descriptions above are not affected by this, but it has to be kept
in mind, for the update phase, which is not yet taken into account.

** minibuffer, reading from

Seems to be all the same as other windows, but open.

** Echo area

Open (-> window geometry?)

** Bidi

Eli says:

#+begin_quote
I think this can be removed from the list of issues.  Basically (with
a few caveats, see below, which I don't think change anything in
principle), bidi.c is just a subroutine of set_iterator_to_next, which
implements the non-linear scanning of buffer text needed for bidi
reordering.  It effectively causes set_iterator_to_next to move to the
next character in _visual_ order, not in buffer position order (the
latter would require just incrementing the buffer position).  To do
this, bidi.c needs access to buffer text, and little else.

The caveats I mentioned are:

  . sometimes we need to figure out the base paragraph direction,
    either L2R or R2L (the latter will be displayed with characters
    starting at the right edge of the window instead of the left), in
    this case bidi.c looks back using regexps for the beginning of the
    paragraph, because the Unicode Bidirectional Algorithm mandates
    that the paragraph direction is determined by the first strong
    directional character of the paragraph
  . when the buffer includes display properties, bidi.c treats all
    the characters "covered" by the property as a single neutral
    character, since this is how images and other such stuff needs to
    be handled for display reordering purposes -- this requires
    partial processing of display properties for the single purpose of
    determining whether they are "replacing" or "non-replacing"
    properties, and in the former case to determine at which buffer
    position the display property ends

I don't think these caveats change anything, since again they only
need to access buffer text.

The bidi reordering code maintains a state (struct bidi_it), but it is
a sub-structure of struct it, and lives only as long as the iterator
object lives.
#+end_quote

** Narrowing

Don't remember how that is done.

** Selective display

Open.  Should at long last die.

** Window geometry changes

Open.

** Others?

* Redisplay Model

What is being displayed and how it is displayed depends on

- buffer text
- Properties of the text (overlays, text properties)
- Values of display-relevant variables (=truncate-lines=, ...)

Concurrent redisplay mustown such a model, so that no synchronization
is necessary between =main= and =redisplay= module.

** Buffer Text

*** Copying

One could think of making copies of all what is needed for redisplay
and let concurrent redisplay work on such a model.

I believe this is out of question, for performance reasons.

Such a copy would have to be made by the main module, and that could
easily cost more than what what we do now in sequential redisplay,
especially if we don't exactly know what data redisplay will need
(range of text, for example).

*** Another "Copying" Possibility

Stefan Monnier had another interesting idea that I quote here
#+begin_quote
Note that you can also use the current text representation with
a concurrent redisplay: simply keep a whole copy of the buffer over in
the redisplay side.  Updating that whole copy should usually be quite
efficient thanks to BEG/END_UNCHANGED.
#+end_quote

*** Persistence

To avoid copying, let buffer text be represented as a persistent data
structure.

Conceptually, this persistent data structure contains an ordered set
of buffer-text versions.  When the =main= module modifies buffer text,
new versions are created.  When =redisplay= starts, it picks the
youngest version available as buffer zexz. because it is known that
any modification in =main= will lead to a new version, and not modify an
existing version.

The "piece table" is an interesting representation for such a
persistent buffer text data structure.  Some later descriptions assume
that buffer text uses a persistent piece table.

Some links:

An interesting paper about text representations in general:
https://www.cs.unm.edu/~crowley/papers/sds.pdf

Piece tables:
https://www.averylaird.com/programming/the%20text%20editor/2017/09/30/the-piece-table

An implementation of a persistent piece table:
https://github.com/cdacamar/fredbuf

A blog post about VSCode using piece tables:
https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

An implementation of a persistent tree:
https://cglab.ca/~dana/pbst/#:~:text=A%20persistent%20binary%20search%20tree,into%2Fdeletion%20from%20the%20tree.

** Properties (Overlays + Text Properties)

Properties that are relevant for redisplay are:

  - =face=
  - =invisible=
  - =display=
  - =composition=

Redisplay needs the following information about properties:

  - start + end position
  - property value

Property values can contain constructs that eval Lisp.  Examples:

  - display (=:when=, =(:height FN)=, ...)
  - =mode-line-format= may also contain =:eval=

If concurrent redisplay cannot call Lisp:

The parts of the property values that require evaluating Lisp must be
part of the display model in evaluated form.

Such a model could contain a map =Lisp_Object= -> =value= (at the time the
model version was current), where

  - the key =Lisp_Object= is the part of the original property value
    containing =:eval=, for instance.  It could be the =cons= cell of
    an =(:eval ...)=
  - =value= is the evaluated value

Changing properties must create new model versions.

  - adding/removing/changing props -> new model version
  - each piece in a piece table could have a list of applicable props
    for the whole piece.
  - mass changes could be done without producing lots of new model
    versions
    + requires that concurrent redisplay doesn't work on a model that
      is mass-updated, which could require synchronization, which is
      ugly.

Possible optimizations:

  - discard/coalesce old model versions in the background, to reduce
    memory footprint? The main module creates new versions, only.
    Redisplay uses only the latest version.

** Variables

The display model must also contain a snapshot of the values of all
relevant variables at the time of the model version.

Relevant values are:

  - truncate-lines
  - scroll-conservatively
  - window, frame, buffer, global values (window-start, ...)
  - ?
  - todo. make a list

* Persistent Data Structures

Wikipedia:
https://en.wikipedia.org/wiki/Persistent_data_structure

** Terminology

Short summary of the terminology:

- persistent
  + general term encompassing veriations below
  + always preserves versions of itself when modified.
  + immutable in the sense that they are not changed in-place.

- partially persistent
  + all versions can be read
  + only newest version can be modified.

- fully persistent
  + all versions can be read
  + every version can be modified.

- confluently persistent
  + fully persistent
  + versions can be merged (melded).

** Links

Kind of a brief overview:
https://academic-accelerator.com/encyclopedia/persistent-data-structure

Irmin: Mergeable ropes
https://inria.hal.science/hal-01099136v1/document
Intersting article:
https://blog.acolyer.org/2015/01/14/mergeable-persistent-data-structures/

Partially and fully persistent DS in C (no merges)
https://github.com/vineeths96/Persistent-Data-Structures

Confluently persistent DS paper
https://arxiv.org/pdf/1301.3388.pdf
https://www.cs.utexas.edu/~ecprice/papers/confluent_swat.pdf

Data visualization with persistent DS
https://www.researchgate.net/publication/258713092_Efficient_Dynamic_Data_Visualization_with_Persistent_Data_Structures

* Redisplay calling Lisp

This is a hypotheical scenario, but Eli and Stefan seem to assume that
it is important to have to make concurrent redisplay acceptable to
users.

- Redisplay calls Lisp to fontify etc.
  + just assuming that is possible in the future
  + as a substitute for storing a snapshot in display model.
  + how calling Lisp from redisplay works on the Lisp side, is
    not yet specified

My conclusions from this:

- Properties must be persistent data structures
  + because no props snapshot in display model
  + because redisplay needs props corresponding to its buffer-text
    version

- Properties must be confluently persistent data structures
  + need to be able to modify prop versions in Lisp
  + need to merge back changes into current versions

- buffer modifications from Lisp either
  + should be prevented (how?)
  + or buffer-text must be confluently persistent
    - merge or discard any buffer-text changes (delete version, if it
      was created).  Probably discard.

- what about if Lisp changes display-relevant variables?
  - unclear

- Other modifications?

** Merging properties

Imagine =fontification-functions= adding properties for font-lock.
These modifications should not be lost once concurrent redisplay has
finished.  That means the properties added to an old version of the
buffer text etc must be merged into newer versions.

- Confluently persistent props require
  + way to merge changes to newer versions
  + consider only merge version n-1 to n
- single prop = (beg end value)
  + position translation
    - know what changed in buffer-texts from n-1 to n
      + wanting translation pos_{n-1} to pos_n
      + piece added in n in front of pos_{n-1} => add length
      + piece delete in from of pos_{n-1} => subtract
      + details depend on buffer-text DS
	- buffer-text DS must take into account that translations must
          be possible
    - looks doable
  + value merging
    + assume interval [beg end] in the following (including beg and end)
    + added property
      - [beg_n end_n] may intersect with 0+ props in version n.
      - say first intersection is in [a b], a >= beg, b <= end with
        value val
      - [beg a-1] -> new value
      - [a b] -> value-dependent handling of old/new value
        (merge/discard..., must be defined)
      - [b+1 end] -> either new value or merge with next intersecing
        prop from n
    + changed values
      - treat as remove + add
    + removed props
      - no direct representation in version of props in version n-1
      - assume scan whole version n for prop of the same kind
	- could record min/max pos of changes in n-1
        - let [a b val] be "interesting" prop in n (face, ...)
        - if there is no intersecting prop in [a b] in n-1, what does
          that mean?
	  - it has been newly added in n compared to n-1
	  - it was in n-2 and been removed in n-1
	  - must find out to resolve
            + can we in all cases?

Assuming everything still open can be resolved, this looks doable, but
it is certainly non-trivial.

* Performance/Memory Considerations

The use of persistent data structures will have an impoact on both
performance and memory consumption.

How large this impact will be I find impossible to tell, especially on
older hardware.  But keep in mind, that at the time this might be
implemented, current hardware will be old.

* Personal Conclusions

I'm stopping here, despite open questions, because I think I have
reached a sufficient level of gut feeling about the subject.

I'd summarize my thoughts as:

- Concurrent redisplay is feasible, both with and without being able
  to call Lisp from redisplay.
- Changing buffer-text representation using a piece table is a big
  enough bite that it is only worth it only if a concurrent redisplay
  comes at some point.
- If performance on old hardware will be acceptable, for some value of
  acceptable, I find unpredictable.
- Concurrent redisplay with the ability to call Lisp from redisplay is
  considerably more complex than without being able to call Lisp.  I'd
  say at least 2 times.
- Concurrent redisplay will not happen unless at least 2 or 3 people
  with enought time decide to work on it.

* Random Grab Bag

- make pieces for long lines (max length of piece)
- concurrency -> dump complicated redisplay optimizations?
- pieces provide more detailed information about what text has changed
  (compared to BEG_UNCHANGED and END_UNCHANGED).

Zed editor, rasterization on GPU
https://zed.dev/blog/videogame

# end.

[-- Attachment #3: Type: text/plain, Size: 630 bytes --]

It's probably not very helpful, but at least I get the idea of a
concurrent redisplay planted into brains, where it can do it's evil work
:-).

>
> (However, before anyone gets their hopes and/or fears up, my code
> depends on disabling most of the regexp code, and the additional number
> of garbage-collected objects is so great that I concluded I'd wait for
> MPS to land before resuming work on it. One of the few distinct
> advantages of the current gap buffer approach is that it doesn't affect
> GC...)
>
> I know virtually nothing about redisplay.
>
> Pip

What I've written is pretty high-level, nothing to worry about.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 13:22                                         ` Gerd Möllmann
@ 2024-12-11 14:53                                           ` Pip Cet via Emacs development discussions.
  2024-12-11 15:33                                             ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 14:53 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>> if we ever replace the gap buffer code, we should make sure its
>>>> replacement actually handles buffer text and text properties/intervals
>>>> in an integrated manner, rather than storing just buffer text).
>>>>
>>>> Pip
>>>
>>> And if I may add a wish to the future author: Make whatever you use
>>> persistent data structures, so that one could think of letting redisplay
>>> run concurrently. Really! :-)
>>
>> You won't be surprised to hear I've been playing with some code,
>
> Indeed, I was just thinking to myself "I knew it" :-).
> Two thumbs up!
>
>> so could I ask you to expand on this point? What precisely does
>> redisplay require? Full snapshotting or would it be sufficient to have
>> fine-grained locking?
>
> Maybe it's helpful when I tell something about the background. Some time
> last year I asked myself if I could make Emacs more than one of my
> plenty of CPU cores without solving the multi-threaded Elisp problem.
> And the idea was that I could do that, possibly, by letting redisplay
> happen in another thread.

This may be a very stupid idea, but why not use a separate process?
fork() is fast on GNU/Linux, and I suspect on macOS too, and the
redisplay child would receive a consistent snapshot of the data to
inspect and/or modify while coming up with the redisplay instructions,
which it would then send back via a pipe or shared memory to be executed
in the main process.

I suggested doing something similar for GC (the GC child would perform a
full GC and send back the Lisp_Objects which are definitely unreachable
via a pipe. No, I never figured out how to make that work for weak hash
tables which may resurrect references, I just made all hash tables
strong...), and in that case the pipe seemed sufficient for the amount
of data that was transferred, but I'm not sure how compact (or
otherwise) serialized redisplay "instructions" would be.

One issue I see is that fork() does a lot of housekeeping work in
addition to marking the child's memory as a COW copy of the parent's
memory at the time of the fork(). ISTR you can split that process on
GNU/Linux (probably not Android), so you'd already have a prepared
thread/LWP which wouldn't need to "start up" when you un-share the
memory, but I can't find the relevant manpage right now. However, I have
no real idea just how bad the fork() latency would be (as you point out,
most people have more CPU cores than they can use, so I don't consider
the approximate doubling of CPU usage a problem).

This would deal very nicely with fontification code attempting to modify
data it shouldn't, by ignoring such modifications. It would also deal
with catastrophic failure in the redisplay code, as it's insulated in a
separate process and we could just print a nice message in the main
process rather than crashing all of Emacs.

I'm emphatically not suggesting letting the redisplay child actually
communicate with the X server or equivalent. That would be much more
difficult.

In fact, I think a good way to test this approach would be to use the
tty code, since there's already a standard serialization of redisplay
instructions for tty displays: VT100 escape sequences.

> I later realized while thinking about the details, that this undertaking
> is an order of magnitude too large for me. Everything taking more than a
> few months is. And, in addition, I wouldn't want to do data structures
> in C anyway.

I think the VT100 case could be done as a weekend project (those always
end up taking several weeks for me...), but I'm not sure it's worth it
as VT100 redisplay isn't the common use case, and the performance
problems are more visible on GUI terminals.

And, like pretty much all Emacs ideas, this depends on having a better
GC.

(However, I've just experimented with an 8 GB process forking, and it's
much slower than I'd hoped for - about 70 ms.  I wouldn't be surprised
if most of that cost is setting up page tables for the ridiculously
small 4KB page size x86 uses, so it may work a lot better for AArch64
systems such as yours).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 14:53                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-11 15:33                                             ` Gerd Möllmann
  2024-12-11 16:58                                               ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 15:33 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>>> Pip Cet <pipcet@protonmail.com> writes:
>>>>> if we ever replace the gap buffer code, we should make sure its
>>>>> replacement actually handles buffer text and text properties/intervals
>>>>> in an integrated manner, rather than storing just buffer text).
>>>>>
>>>>> Pip
>>>>
>>>> And if I may add a wish to the future author: Make whatever you use
>>>> persistent data structures, so that one could think of letting redisplay
>>>> run concurrently. Really! :-)
>>>
>>> You won't be surprised to hear I've been playing with some code,
>>
>> Indeed, I was just thinking to myself "I knew it" :-).
>> Two thumbs up!
>>
>>> so could I ask you to expand on this point? What precisely does
>>> redisplay require? Full snapshotting or would it be sufficient to have
>>> fine-grained locking?
>>
>> Maybe it's helpful when I tell something about the background. Some time
>> last year I asked myself if I could make Emacs more than one of my
>> plenty of CPU cores without solving the multi-threaded Elisp problem.
>> And the idea was that I could do that, possibly, by letting redisplay
>> happen in another thread.
>
> This may be a very stupid idea, but why not use a separate process?

Not stupid at all. I thought about something similar in a different
context, namely if one could decouple the GUI part of Emacs from the
rest.

Something like that has been done by Eberhard Mattes for OS/2 with the
old redisplay. He had to do that because the whole Presentation Manager
(GUI) in OS/2 would block, for all process, when an application did not
timely handle events and return to the PM. Something like that.
Eberhard's OS/2 Emacs had one process doing the GUI stuff, and one for
the rest. Both communicated with each other using a defined message
protocol. It worked. Don't remember what he used for process
communication, pipes or something else.

I got stuck with this idea because everything seemed to depend on
everything else nowadays. Redisplay needs to execute Lisp, Font backends
I think, not sure. Some GUIs call redisplay (nsterm). And then I
imagined the licensing issues, and dropped the idea. Although - NS could
really need something done, IMO, which was the reason I thought about
that in the first place. NS is not working for me at least. I always
wonder why nobody else has the same freezing problems that I have.

I think the same dependency problems also creep up with to concurrent
redisplay, don't know. Values of variables, faces, jit-lock, and so on.
I think it would be "easier" to handle if one has everything in one
process.

But in principle both could be done. An actor model.

> fork() is fast on GNU/Linux, and I suspect on macOS too, and the
> redisplay child would receive a consistent snapshot of the data to
> inspect and/or modify while coming up with the redisplay instructions,
> which it would then send back via a pipe or shared memory to be executed
> in the main process.
>
> I suggested doing something similar for GC (the GC child would perform a
> full GC and send back the Lisp_Objects which are definitely unreachable
> via a pipe. No, I never figured out how to make that work for weak hash
> tables which may resurrect references, I just made all hash tables
> strong...), and in that case the pipe seemed sufficient for the amount
> of data that was transferred, but I'm not sure how compact (or
> otherwise) serialized redisplay "instructions" would be.
>
> One issue I see is that fork() does a lot of housekeeping work in
> addition to marking the child's memory as a COW copy of the parent's
> memory at the time of the fork(). ISTR you can split that process on
> GNU/Linux (probably not Android), so you'd already have a prepared
> thread/LWP which wouldn't need to "start up" when you un-share the
> memory, but I can't find the relevant manpage right now. However, I have
> no real idea just how bad the fork() latency would be (as you point out,
> most people have more CPU cores than they can use, so I don't consider
> the approximate doubling of CPU usage a problem).
>
> This would deal very nicely with fontification code attempting to modify
> data it shouldn't, by ignoring such modifications. It would also deal
> with catastrophic failure in the redisplay code, as it's insulated in a
> separate process and we could just print a nice message in the main
> process rather than crashing all of Emacs.
>
> I'm emphatically not suggesting letting the redisplay child actually
> communicate with the X server or equivalent. That would be much more
> difficult.
>
> In fact, I think a good way to test this approach would be to use the
> tty code, since there's already a standard serialization of redisplay
> instructions for tty displays: VT100 escape sequences.
>
>> I later realized while thinking about the details, that this undertaking
>> is an order of magnitude too large for me. Everything taking more than a
>> few months is. And, in addition, I wouldn't want to do data structures
>> in C anyway.
>
> I think the VT100 case could be done as a weekend project (those always
> end up taking several weeks for me...), but I'm not sure it's worth it
> as VT100 redisplay isn't the common use case, and the performance
> problems are more visible on GUI terminals.

Yes. In a way, it's already the case that the GUI part of Emacs that I
described above for OS/2, is the terminal emulator, and the protocol is
VT100.

> And, like pretty much all Emacs ideas, this depends on having a better
> GC.
>
> (However, I've just experimented with an 8 GB process forking, and it's
> much slower than I'd hoped for - about 70 ms.  I wouldn't be surprised
> if most of that cost is setting up page tables for the ridiculously
> small 4KB page size x86 uses, so it may work a lot better for AArch64
> systems such as yours).
>
> Pip



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:33                                             ` Gerd Möllmann
@ 2024-12-11 16:58                                               ` Eli Zaretskii
  2024-12-11 17:13                                                 ` Gerd Möllmann
  2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 16:58 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, emacs-devel, ofv

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>  Óscar Fuentes <ofv@wanadoo.es>
> Date: Wed, 11 Dec 2024 16:33:18 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > This may be a very stupid idea, but why not use a separate process?
> 
> Not stupid at all. I thought about something similar in a different
> context, namely if one could decouple the GUI part of Emacs from the
> rest.

If it can be done by two processes, it can also be done by two threads
in the same process.  Right?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 16:58                                               ` Eli Zaretskii
@ 2024-12-11 17:13                                                 ` Gerd Möllmann
  2024-12-11 17:45                                                   ` Robert Pluim
  2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 17:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, emacs-devel, ofv

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>  Óscar Fuentes <ofv@wanadoo.es>
>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>> 
>> Pip Cet <pipcet@protonmail.com> writes:
>> 
>> > This may be a very stupid idea, but why not use a separate process?
>> 
>> Not stupid at all. I thought about something similar in a different
>> context, namely if one could decouple the GUI part of Emacs from the
>> rest.
>
> If it can be done by two processes, it can also be done by two threads
> in the same process.  Right?

Yes, I think so.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:13                                                 ` Gerd Möllmann
@ 2024-12-11 17:45                                                   ` Robert Pluim
  2024-12-11 18:11                                                     ` Gerd Möllmann
  2024-12-11 19:08                                                     ` Eli Zaretskii
  0 siblings, 2 replies; 137+ messages in thread
From: Robert Pluim @ 2024-12-11 17:45 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv

>>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:

    Gerd> Eli Zaretskii <eliz@gnu.org> writes:
    >>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
    >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
    >>> Óscar Fuentes <ofv@wanadoo.es>
    >>> Date: Wed, 11 Dec 2024 16:33:18 +0100
    >>> 
    >>> Pip Cet <pipcet@protonmail.com> writes:
    >>> 
    >>> > This may be a very stupid idea, but why not use a separate process?
    >>> 
    >>> Not stupid at all. I thought about something similar in a different
    >>> context, namely if one could decouple the GUI part of Emacs from the
    >>> rest.
    >> 
    >> If it can be done by two processes, it can also be done by two threads
    >> in the same process.  Right?

    Gerd> Yes, I think so.

But then you have to throw a lock over all the memory in the
non-display thread that might affect redisplay (although come to think
of it, youʼd probably need that even when using fork)

Robert
-- 



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:45                                                   ` Robert Pluim
@ 2024-12-11 18:11                                                     ` Gerd Möllmann
  2024-12-11 19:08                                                     ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 18:11 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Eli Zaretskii, pipcet, emacs-devel, ofv

Robert Pluim <rpluim@gmail.com> writes:

>>>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
>
>     Gerd> Eli Zaretskii <eliz@gnu.org> writes:
>     >>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>     >>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>     >>> Óscar Fuentes <ofv@wanadoo.es>
>     >>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>     >>> 
>     >>> Pip Cet <pipcet@protonmail.com> writes:
>     >>> 
>     >>> > This may be a very stupid idea, but why not use a separate process?
>     >>> 
>     >>> Not stupid at all. I thought about something similar in a different
>     >>> context, namely if one could decouple the GUI part of Emacs from the
>     >>> rest.
>     >> 
>     >> If it can be done by two processes, it can also be done by two threads
>     >> in the same process.  Right?
>
>     Gerd> Yes, I think so.
>
> But then you have to throw a lock over all the memory in the
> non-display thread that might affect redisplay (although come to think
> of it, youʼd probably need that even when using fork)
>
> Robert

Well, it depends. Assume you have a solution that works in a second
process. That solution wouldn't use things in the first process because
it can't. Now move that code of the second process to the first process,
and make two threads out of the two process, and replace process
communication with inter-thread message passing like in an actor model.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:45                                                   ` Robert Pluim
  2024-12-11 18:11                                                     ` Gerd Möllmann
@ 2024-12-11 19:08                                                     ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 19:08 UTC (permalink / raw)
  To: Robert Pluim; +Cc: gerd.moellmann, pipcet, emacs-devel, ofv

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,
>   emacs-devel@gnu.org,  ofv@wanadoo.es
> Date: Wed, 11 Dec 2024 18:45:15 +0100
> 
> >>>>> On Wed, 11 Dec 2024 18:13:41 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
> 
>     Gerd> Eli Zaretskii <eliz@gnu.org> writes:
>     >> 
>     >> If it can be done by two processes, it can also be done by two threads
>     >> in the same process.  Right?
> 
>     Gerd> Yes, I think so.
> 
> But then you have to throw a lock over all the memory in the
> non-display thread that might affect redisplay

No, you copy on write.  Exactly like the OS does with forked process.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 16:58                                               ` Eli Zaretskii
  2024-12-11 17:13                                                 ` Gerd Möllmann
@ 2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
  2024-12-11 19:04                                                   ` Eli Zaretskii
  2024-12-11 19:09                                                   ` Gerd Möllmann
  1 sibling, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 17:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, emacs-devel, ofv

"Eli Zaretskii" <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>  Óscar Fuentes <ofv@wanadoo.es>
>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>> > This may be a very stupid idea, but why not use a separate process?
>>
>> Not stupid at all. I thought about something similar in a different
>> context, namely if one could decouple the GUI part of Emacs from the
>> rest.
>
> If it can be done by two processes, it can also be done by two threads
> in the same process.  Right?

AFAIU: No, not right.

I may have misunderstood, but if the idea is to preserve a consistent
state of all Lisp data and buffer text for redisplay to use, the easiest
way to ensure that consistency is fork().  The other ways, such as
copying all heap objects that might be used by redisplay (and adjusting
all internal pointers in such heap objects to point to the copy rather
than the original data), probably will end up either being a lot slower
or being very specific to the system we're running on.

I know that implementing fork() on Windows is very slow, and I don't
know about a comparable snapshotting mechanism for Windows.

To be honest, though, I'm a bit disappointed that GNU/Linux appears to
make fork() take significant time that is proportional to the size of
the mapped address space, even if it's never COW-faulted in.  I'm pretty
sure that could be avoided (and I hope the Linux kernel avoids doing it
for swapped-out memory, not that anyone still does that).

Concurrent access to Lisp data from several threads requires a locking
mechanism (fine-grained or coarse) for all such data, and possibly
requires rewriting addresses, which means no "ambiguous" references
whatsoever. That's a lot harder than using MPS, which generously allows
for ambiguous references.

It's possible we could have gotten away with concurrent access by the
redisplay machinery if we inhibited GC while the redisplay thread was
busy inspecting our data, but inhibiting MPS GC is a lot harder and
shouldn't be done for ordinary operations.

Oh, and of course mmap() breaks fork()'s snapshotting magic.

The reason I said this depends on a new GC is a bit subtle, by the way:
the old GC does best if we sacrifice a lot of memory and only run it
rarely, which we can usually get away with because RAM is cheap.  With a
fork()-based approach, memory usage comes with a performance penalty for
every fork(), so we need to reduce both memory usage and GC time, which
we can't do with non-incremental GC.

The last reason it's difficult is that MPS isn't optimized for
multi-thread settings: in an ideal world, "scanning" a memory area would
use a secondary mapping of the memory, known only to the scanning code,
so other threads could continue running while an area is being scanned.
With MPS, there is only one mapping, so we need to stop all other
threads while one thread un-mprotect()s a memory area to scan it.

Unless MPS breaks POSIX threads in some spectacular way, fork() should
still work, though.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
@ 2024-12-11 19:04                                                   ` Eli Zaretskii
  2024-12-11 19:54                                                     ` Pip Cet via Emacs development discussions.
  2024-12-11 19:09                                                   ` Gerd Möllmann
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 19:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv

> Date: Wed, 11 Dec 2024 17:41:29 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > If it can be done by two processes, it can also be done by two threads
> > in the same process.  Right?
> 
> AFAIU: No, not right.
> 
> I may have misunderstood, but if the idea is to preserve a consistent
> state of all Lisp data and buffer text for redisplay to use, the easiest
> way to ensure that consistency is fork().  The other ways, such as
> copying all heap objects that might be used by redisplay (and adjusting
> all internal pointers in such heap objects to point to the copy rather
> than the original data), probably will end up either being a lot slower
> or being very specific to the system we're running on.

How do you do the same in a forked process?  The glyph matrices are
not allocated once, they are reallocated constantly.  Are you going to
fork each time?  And if you are, how is it different from copying
stuff lazily within the same process, exactly like the OS does with
forked processes?

> I know that implementing fork() on Windows is very slow, and I don't
> know about a comparable snapshotting mechanism for Windows.

I'm not talking about Windows, I'm talking about Posix systems.

Anyway, the fact that redisplay calls Lisp and Lisp calls back into
redisplay all but kills this idea.  Gerd's document has also other
gotchas.  We didn't just give up easily back when we discussed that.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:04                                                   ` Eli Zaretskii
@ 2024-12-11 19:54                                                     ` Pip Cet via Emacs development discussions.
  2024-12-11 20:26                                                       ` Eli Zaretskii
  2024-12-11 22:07                                                       ` Dmitry Gutov
  0 siblings, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 19:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Wed, 11 Dec 2024 17:41:29 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, emacs-devel@gnu.org, ofv@wanadoo.es
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > If it can be done by two processes, it can also be done by two threads
>> > in the same process.  Right?
>>
>> AFAIU: No, not right.
>>
>> I may have misunderstood, but if the idea is to preserve a consistent
>> state of all Lisp data and buffer text for redisplay to use, the easiest
>> way to ensure that consistency is fork().  The other ways, such as
>> copying all heap objects that might be used by redisplay (and adjusting
>> all internal pointers in such heap objects to point to the copy rather
>> than the original data), probably will end up either being a lot slower
>> or being very specific to the system we're running on.
>
> How do you do the same in a forked process?  The glyph matrices are
> not allocated once, they are reallocated constantly.  Are you going to
> fork each time?

Not necessarily "each time" (meaning once per frame/keystroke), but
quite frequently, yes.

> And if you are, how is it different from copying
> stuff lazily within the same process, exactly like the OS does with
> forked processes?

It is very different indeed:

Copying within a process involves changing the (virtual) addresses that
the copied data is at (unless you use an architecture-specific
implementation of TLS). The beauty of fork() is that the virtual
addresses stay the same, so we don't need to adjust any pointers, which
we cannot do because there are ambiguous references to Lisp data.

IOW, no, you can't lazily create two copies of Lisp data in the same
process. You have to do so eagerly, adjusting any and all pointers (and
only those) in the Lisp data before the new data is read for the first
time (because what you read might be a pointer, and then it needs to be
adjusted). With fork(), you only have to make the copy when the data is
being written to, by either process.

(Of course you can just access all memory through some sort of API that
translates addresses for you, but that would effectively mean we'd be
running Emacs on a virtual machine and simulate fork() on it).

> Anyway, the fact that redisplay calls Lisp and Lisp calls back into
> redisplay all but kills this idea.  Gerd's document has also other
> gotchas.  We didn't just give up easily back when we discussed that.

I don't see why the redisplay process would not be able to call Lisp;
it's a full Emacs process (with a single thread), except it doesn't have
an FD or socket for the window system, and has an extra pipe to
communicate with the parent process instead.

It's true that the side effects of the called Lisp code won't be visible
to the next redisplay process, but such side effects are perilous
anyway, and avoiding them would seem to me to be a feature, not a bug.

However, if such side effects are desired, we can use IPC to execute
Lisp in the main process (some effort) or simply send a "this redisplay
needs to happen synchronously" message to the main process, which would
kill the current redisplay process and perform a synchronous redisplay
(as not all operating systems support fork() reliably, we'll have to
retain the ability to redisplay synchronously, either way).

But, to be perfectly honest, I'm not sure redisplay is slowing me down
the way traditional GC is.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:54                                                     ` Pip Cet via Emacs development discussions.
@ 2024-12-11 20:26                                                       ` Eli Zaretskii
  2024-12-11 22:07                                                       ` Dmitry Gutov
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 20:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, emacs-devel, ofv

> Date: Wed, 11 Dec 2024 19:54:07 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org, ofv@wanadoo.es
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > Anyway, the fact that redisplay calls Lisp and Lisp calls back into
> > redisplay all but kills this idea.  Gerd's document has also other
> > gotchas.  We didn't just give up easily back when we discussed that.
> 
> I don't see why the redisplay process would not be able to call Lisp;
> it's a full Emacs process (with a single thread)

So you are going to fork on each redisplay?

And how will you pass back the results of Lisp evaluation, if the
other process meanwhile changes the global state (as it's running
concurrently)?

> except it doesn't have
> an FD or socket for the window system, and has an extra pipe to
> communicate with the parent process instead.

Do you have an estimation of the throughput that such a pipe will need
to handle in order to support GUI display?  What will you send through
the pipe?  If you send only some kind of commands, then the other
process will need to generate the font glyphs in some way -- the same
glyphs that the "redisplay" process already produced.  And if you
intend to send the pixels, that would be too much traffic, I think.
And again, the global state of the receiving process could have
changed, which means any high-level data might be useless (e.g., using
a font that was unloaded).

> It's true that the side effects of the called Lisp code won't be visible
> to the next redisplay process, but such side effects are perilous
> anyway, and avoiding them would seem to me to be a feature, not a bug.

In Emacs, they are a feature, and are expected to work.  You'd be
surprised to see how many packages and user code rely on that.

> But, to be perfectly honest, I'm not sure redisplay is slowing me down
> the way traditional GC is.

It's the other way around: the Lisp machine blocks user interaction,
including the UI and display.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:54                                                     ` Pip Cet via Emacs development discussions.
  2024-12-11 20:26                                                       ` Eli Zaretskii
@ 2024-12-11 22:07                                                       ` Dmitry Gutov
  1 sibling, 0 replies; 137+ messages in thread
From: Dmitry Gutov @ 2024-12-11 22:07 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel, ofv

On 11/12/2024 21:54, Pip Cet via Emacs development discussions. wrote:

> It is very different indeed:
> 
> Copying within a process involves changing the (virtual) addresses that
> the copied data is at (unless you use an architecture-specific
> implementation of TLS). The beauty of fork() is that the virtual
> addresses stay the same, so we don't need to adjust any pointers, which
> we cannot do because there are ambiguous references to Lisp data.
> 
> IOW, no, you can't lazily create two copies of Lisp data in the same
> process. You have to do so eagerly, adjusting any and all pointers (and
> only those) in the Lisp data before the new data is read for the first
> time (because what you read might be a pointer, and then it needs to be
> adjusted). With fork(), you only have to make the copy when the data is
> being written to, by either process.
> 
> (Of course you can just access all memory through some sort of API that
> translates addresses for you, but that would effectively mean we'd be
> running Emacs on a virtual machine and simulate fork() on it).

Could one really avoid global interpreter lock using fork()? It doesn't 
sound right: even if you get cheap snapshotting using the underlying 
OS's mechanisms, could we guarantee that the snapshot is "consistent" 
from the Lisp VM's point of view. Or if Lisp were not in the picture, 
that the data structures are consistent anyway, unless the second 
process adheres to the first one's locks anyway.

> But, to be perfectly honest, I'm not sure redisplay is slowing me down
> the way traditional GC is.

IMO redisplay is not the problem most of time indeed. Sometimes it is, 
but I'm not sure parallelizing the rendering is the best answer in those 
scenarios either.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
  2024-12-11 19:04                                                   ` Eli Zaretskii
@ 2024-12-11 19:09                                                   ` Gerd Möllmann
  2024-12-12  8:55                                                     ` Robert Pluim
  1 sibling, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 19:09 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, emacs-devel, ofv

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: "Pip Cet via \"Emacs development discussions.\"" <emacs-devel@gnu.org>,
>>>  Óscar Fuentes <ofv@wanadoo.es>
>>> Date: Wed, 11 Dec 2024 16:33:18 +0100
>>>
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>> > This may be a very stupid idea, but why not use a separate process?
>>>
>>> Not stupid at all. I thought about something similar in a different
>>> context, namely if one could decouple the GUI part of Emacs from the
>>> rest.
>>
>> If it can be done by two processes, it can also be done by two threads
>> in the same process.  Right?
>
> AFAIU: No, not right.
>
> I may have misunderstood, but if the idea is to preserve a consistent
> state of all Lisp data and buffer text for redisplay to use, the easiest
> way to ensure that consistency is fork().  The other ways, such as
> copying all heap objects that might be used by redisplay (and adjusting
> all internal pointers in such heap objects to point to the copy rather
> than the original data), probably will end up either being a lot slower
> or being very specific to the system we're running on.

I may also be misunderstanding, but in principle, I agree with Eli.

Say we have processes A and B communicating with each other. Take the
code of A and move it to B, possibly with some automatic transformations
if A and B have the same source code. Make two threads in the result
process for A and B. Replace inter-process message passing with
inter-thread message passing. Initial message may be "fork" transferring
the world of thread A to thread B.

But I'm also thinking too abstract sometimes.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 19:09                                                   ` Gerd Möllmann
@ 2024-12-12  8:55                                                     ` Robert Pluim
  2024-12-12 10:14                                                       ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Robert Pluim @ 2024-12-12  8:55 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv

>>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:

    Gerd> I may also be misunderstanding, but in principle, I agree with Eli.

    Gerd> Say we have processes A and B communicating with each other. Take the
    Gerd> code of A and move it to B, possibly with some automatic transformations
    Gerd> if A and B have the same source code. Make two threads in the result
    Gerd> process for A and B. Replace inter-process message passing with
    Gerd> inter-thread message passing. Initial message may be "fork" transferring
    Gerd> the world of thread A to thread B.

Your first sentence is doing a lot of lifting there :-)

In the two process situation, you automatically have two copies of
every object, so you donʼt need to ensure that the processes are not
stepping on each other. In the two thread situation you donʼt have
that guarantee, unless you have *first* ensured (via locking, or COW,
etc) that the object states are consistent. Once youʼve done that,
then you can use messaging to keep things synchronized (or again,
locking etc)

Robert
-- 



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-12  8:55                                                     ` Robert Pluim
@ 2024-12-12 10:14                                                       ` Gerd Möllmann
  0 siblings, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-12 10:14 UTC (permalink / raw)
  To: Robert Pluim; +Cc: Pip Cet, Eli Zaretskii, emacs-devel, ofv

Robert Pluim <rpluim@gmail.com> writes:

>>>>>> On Wed, 11 Dec 2024 20:09:51 +0100, Gerd Möllmann <gerd.moellmann@gmail.com> said:
>
>     Gerd> I may also be misunderstanding, but in principle, I agree with Eli.
>
>     Gerd> Say we have processes A and B communicating with each other. Take the
>     Gerd> code of A and move it to B, possibly with some automatic transformations
>     Gerd> if A and B have the same source code. Make two threads in the result
>     Gerd> process for A and B. Replace inter-process message passing with
>     Gerd> inter-thread message passing. Initial message may be "fork" transferring
>     Gerd> the world of thread A to thread B.
>
> Your first sentence is doing a lot of lifting there :-)
>
> In the two process situation, you automatically have two copies of
> every object, so you donʼt need to ensure that the processes are not
> stepping on each other. In the two thread situation you donʼt have
> that guarantee, unless you have *first* ensured (via locking, or COW,
> etc) that the object states are consistent. Once youʼve done that,
> then you can use messaging to keep things synchronized (or again,
> locking etc)
>
> Robert

Sure, a bit of effort is left to the implementer :-).



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  9:35                                     ` Gerd Möllmann
  2024-12-11 11:50                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-11 12:27                                       ` Pip Cet via Emacs development discussions.
  2024-12-11 13:27                                         ` Gerd Möllmann
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 12:27 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> I also recall discussion somewhere (nullprogram.com, maybe) about
>> multiple cursors and the gap buffer, and that's also a potential use
>> case where the gap buffer would make things very slow.

It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
seems accurate to me.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 12:27                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-11 13:27                                         ` Gerd Möllmann
  2024-12-11 15:06                                           ` Marcus Harnisch
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 13:27 UTC (permalink / raw)
  To: Pip Cet
  Cc: Pip Cet via "Emacs development discussions.",
	Óscar Fuentes

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>>> I also recall discussion somewhere (nullprogram.com, maybe) about
>>> multiple cursors and the gap buffer, and that's also a potential use
>>> case where the gap buffer would make things very slow.
>
> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
> seems accurate to me.
>
> Pip

Thanks! Added to my collection. 



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 13:27                                         ` Gerd Möllmann
@ 2024-12-11 15:06                                           ` Marcus Harnisch
  2024-12-11 22:11                                             ` Dmitry Gutov
  0 siblings, 1 reply; 137+ messages in thread
From: Marcus Harnisch @ 2024-12-11 15:06 UTC (permalink / raw)
  To: emacs-devel

On 11/12/2024 14.27, Gerd Möllmann wrote:
> Pip Cet <pipcet@protonmail.com> writes:
> 
>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>> seems accurate to me.
> 
> Thanks! Added to my collection.

You may be interested in this article, too, which refererences the blog 
post above:
https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:06                                           ` Marcus Harnisch
@ 2024-12-11 22:11                                             ` Dmitry Gutov
  2024-12-12  3:49                                               ` Gerd Möllmann
  2024-12-12  6:01                                               ` Eli Zaretskii
  0 siblings, 2 replies; 137+ messages in thread
From: Dmitry Gutov @ 2024-12-11 22:11 UTC (permalink / raw)
  To: Marcus Harnisch, emacs-devel

On 11/12/2024 17:06, Marcus Harnisch wrote:
> On 11/12/2024 14.27, Gerd Möllmann wrote:
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>>> seems accurate to me.
>>
>> Thanks! Added to my collection.
> 
> You may be interested in this article, too, which refererences the blog 
> post above:
> https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/

To quote from the bottom of the article:

   The way I see it, gap buffers are better for searching and memory
   usage, but ropes are better at non-local editing patterns. Despite
   their simplicity, gap buffers can hold their own in the modern world.
   Maybe Emacs was on to something.

This is also my takeaway from reading a number of other texts on the 
subject (not benchmarking personally, though, TBF).



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 22:11                                             ` Dmitry Gutov
@ 2024-12-12  3:49                                               ` Gerd Möllmann
  2024-12-12 19:07                                                 ` Dmitry Gutov
  2024-12-12  6:01                                               ` Eli Zaretskii
  1 sibling, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-12  3:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel

Dmitry Gutov <dmitry@gutov.dev> writes:

> On 11/12/2024 17:06, Marcus Harnisch wrote:
>> On 11/12/2024 14.27, Gerd Möllmann wrote:
>>> Pip Cet <pipcet@protonmail.com> writes:
>>>
>>>> It was nullprogram.com, at https://nullprogram.com/blog/2017/09/07/. The
>>>> title is "Gap Buffers Are Not Optimized for Multiple Cursors", which
>>>> seems accurate to me.
>>>
>>> Thanks! Added to my collection.
>> You may be interested in this article, too, which refererences the
>> blog post above:
>> https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/
>
> To quote from the bottom of the article:
>
>   The way I see it, gap buffers are better for searching and memory
>   usage, but ropes are better at non-local editing patterns. Despite
>   their simplicity, gap buffers can hold their own in the modern world.
>   Maybe Emacs was on to something.
>
> This is also my takeaway from reading a number of other texts on the
> subject (not benchmarking personally, though, TBF).

The Zed editor, which is heavily performance-oriented, decided to use
ropes. They have are a number of blog entries that I find interesting,
for example

  https://zed.dev/blog/zed-decoded-rope-sumtree

VSCode uses persistent piece tables

  https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-12  3:49                                               ` Gerd Möllmann
@ 2024-12-12 19:07                                                 ` Dmitry Gutov
  2024-12-12 19:30                                                   ` Eli Zaretskii
  2024-12-12 19:40                                                   ` Gerd Möllmann
  0 siblings, 2 replies; 137+ messages in thread
From: Dmitry Gutov @ 2024-12-12 19:07 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Marcus Harnisch, emacs-devel

On 12/12/2024 05:49, Gerd Möllmann wrote:

> The Zed editor, which is heavily performance-oriented, decided to use
> ropes. They have are a number of blog entries that I find interesting,
> for example
> 
>    https://zed.dev/blog/zed-decoded-rope-sumtree

IIUC their goal there was a use a data structure that can do everything.

They also have an ambition to support live collaboration, which we don't 
have anything for, and not for the reasons of performance.

> VSCode uses persistent piece tables
> 
>    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

And this article compared to the previous "array of strings" implementation.

Both editors' data structures (not ropes) seem to have something that 
can be used like our "newline cache", so if anything I would try to 
understand whether either has an advantage in that area.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-12 19:07                                                 ` Dmitry Gutov
@ 2024-12-12 19:30                                                   ` Eli Zaretskii
  2024-12-12 19:40                                                   ` Gerd Möllmann
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-12 19:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: gerd.moellmann, mh-gmane, emacs-devel

> Date: Thu, 12 Dec 2024 21:07:25 +0200
> Cc: Marcus Harnisch <mh-gmane@online.de>, emacs-devel@gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> >    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
> 
> And this article compared to the previous "array of strings" implementation.
> 
> Both editors' data structures (not ropes) seem to have something that 
> can be used like our "newline cache", so if anything I would try to 
> understand whether either has an advantage in that area.

Our newline cache doesn't really help to solve the problems in the
display engine for which it would be good to find an alternative to
gap buffer.  So I wouldn't waste time on thinking how to replace the
newline cache.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-12 19:07                                                 ` Dmitry Gutov
  2024-12-12 19:30                                                   ` Eli Zaretskii
@ 2024-12-12 19:40                                                   ` Gerd Möllmann
  1 sibling, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-12 19:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Marcus Harnisch, emacs-devel

Dmitry Gutov <dmitry@gutov.dev> writes:

> On 12/12/2024 05:49, Gerd Möllmann wrote:
>
>> The Zed editor, which is heavily performance-oriented, decided to use
>> ropes. They have are a number of blog entries that I find interesting,
>> for example
>>    https://zed.dev/blog/zed-decoded-rope-sumtree
>
> IIUC their goal there was a use a data structure that can do everything.
>
> They also have an ambition to support live collaboration, which we
> don't have anything for, and not for the reasons of performance.

I don't find it surprising that being fast isn't the only requirement
for a modern editor, TBH. Thing is that from what I observe, Zed seems
to have no problem with long lines. But maybe that's a wrong
observation. I never did anything that required long-line support.

Point I was trying to make is that apparently ropes can support long
lines. One could maybe learn from what they do; Rust sources are
available, Of course, additional data structures like ones to make
positions to actual text and so on belong into the picture. It not just
using ropes, but in this case the rope is the basic data structure, and
additional data structures depend on the properties of that.

>
>> VSCode uses persistent piece tables
>>    https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
>
> And this article compared to the previous "array of strings" implementation.
>

And shows that they are using piece tables, the alternative to ropes, if
one doesn't count the other two which are gap buffer and set of lines
for the moment. And I think they got long-lines support working, too.
And again there are additional data structure involved, and blah blah.

> Both editors' data structures (not ropes) seem to have something that
> can be used like our "newline cache", so if anything I would try to
> understand whether either has an advantage in that area.

For me that would be a too narrow perspective, I must admit. But I won't
do anything in that area anyway, so please ignore me :-).

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 22:11                                             ` Dmitry Gutov
  2024-12-12  3:49                                               ` Gerd Möllmann
@ 2024-12-12  6:01                                               ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-12  6:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: mh-gmane, emacs-devel, Pip Cet

> Date: Thu, 12 Dec 2024 00:11:54 +0200
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> > https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/
> 
> To quote from the bottom of the article:
> 
>    The way I see it, gap buffers are better for searching and memory
>    usage, but ropes are better at non-local editing patterns. Despite
>    their simplicity, gap buffers can hold their own in the modern world.
>    Maybe Emacs was on to something.
> 
> This is also my takeaway from reading a number of other texts on the 
> subject (not benchmarking personally, though, TBF).

Yes.  But one important aspect that blog doesn't touch is the
potential effect of changing the buffer text data structure on the
various Emacs display issues.  Some problems in the current display
code that cause slow redisplay in some situations (mainly, very long
lines) cannot really be solved as long as we stay with buffer text
stored as a long C string, with or without the gap.  This important
aspect of Emacs still awaits serious research of possible
alternatives, IMO.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11  5:27                                 ` Gap buffer problem? Gerd Möllmann
  2024-12-11  8:50                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-11 14:22                                   ` Eli Zaretskii
  2024-12-11 15:51                                     ` Gerd Möllmann
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 14:22 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Óscar Fuentes <ofv@wanadoo.es>,  Pip Cet
>  <pipcet@protonmail.com>
> Date: Wed, 11 Dec 2024 06:27:43 +0100
> 
> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
> 
> > To be fair, part of that may be the gap buffer problem rather than GC.
> 
> Could you please tell more about the gap buffer problem?
> 
> I've read a little about the tradeoffs between gap buffers, piece
> tables, ropes, but I'm wondering if there is something concrete already
> known for sure that is a performance problem in Emacs. Maybe a bug that
> has been analyzed or something.
> 
> (I'm asking because I just recently encountered a performance problem
> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
> editing there was so slow that it was absolutely no fun, and that on a
> an M1 pro. Haven't investigated the reason.)

Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
moves point, then inserts a small number of characters, then moves
point far away and again inserts a small number of characters, etc.,
I'd be very surprised if the gap buffer caused significant performance
problems on a modern CPU.

Can you profile that case and post the expanded profile?  I'm always
happy to be wrong about performance bottlenecks, and profiles are good
at proving me wrong.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 14:22                                   ` Eli Zaretskii
@ 2024-12-11 15:51                                     ` Gerd Möllmann
  2024-12-11 17:06                                       ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 15:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Óscar Fuentes <ofv@wanadoo.es>,  Pip Cet
>>  <pipcet@protonmail.com>
>> Date: Wed, 11 Dec 2024 06:27:43 +0100
>> 
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>> 
>> > To be fair, part of that may be the gap buffer problem rather than GC.
>> 
>> Could you please tell more about the gap buffer problem?
>> 
>> I've read a little about the tradeoffs between gap buffers, piece
>> tables, ropes, but I'm wondering if there is something concrete already
>> known for sure that is a performance problem in Emacs. Maybe a bug that
>> has been analyzed or something.
>> 
>> (I'm asking because I just recently encountered a performance problem
>> when adding something to xdisp.c:27339 (with cc-mode, Eglot, Corfu), and
>> editing there was so slow that it was absolutely no fun, and that on a
>> an M1 pro. Haven't investigated the reason.)
>
> Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
> moves point, then inserts a small number of characters, then moves
> point far away and again inserts a small number of characters, etc.,
> I'd be very surprised if the gap buffer caused significant performance
> problems on a modern CPU.
>
> Can you profile that case and post the expanded profile?  I'm always
> happy to be wrong about performance bottlenecks, and profiles are good
> at proving me wrong.

Maybe I'll try to investigate that further at some point. Such things
always tend to be so time consuming...



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 15:51                                     ` Gerd Möllmann
@ 2024-12-11 17:06                                       ` Eli Zaretskii
  2024-12-11 17:15                                         ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 17:06 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel, ofv, pipcet

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: emacs-devel@gnu.org,  ofv@wanadoo.es,  pipcet@protonmail.com
> Date: Wed, 11 Dec 2024 16:51:56 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
> > moves point, then inserts a small number of characters, then moves
> > point far away and again inserts a small number of characters, etc.,
> > I'd be very surprised if the gap buffer caused significant performance
> > problems on a modern CPU.
> >
> > Can you profile that case and post the expanded profile?  I'm always
> > happy to be wrong about performance bottlenecks, and profiles are good
> > at proving me wrong.
> 
> Maybe I'll try to investigate that further at some point. Such things
> always tend to be so time consuming...

I meant profiling with "M-x profile-start", then run your slow-down
recipe.  That should be easy and should not consume any significant
time.  Analyzing the profile could, but producing it shouldn't.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: Gap buffer problem?
  2024-12-11 17:06                                       ` Eli Zaretskii
@ 2024-12-11 17:15                                         ` Gerd Möllmann
  0 siblings, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-11 17:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, ofv, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: emacs-devel@gnu.org,  ofv@wanadoo.es,  pipcet@protonmail.com
>> Date: Wed, 11 Dec 2024 16:51:56 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Unless you have a huge (and I mean a HUGE) buffer, and some Lisp that
>> > moves point, then inserts a small number of characters, then moves
>> > point far away and again inserts a small number of characters, etc.,
>> > I'd be very surprised if the gap buffer caused significant performance
>> > problems on a modern CPU.
>> >
>> > Can you profile that case and post the expanded profile?  I'm always
>> > happy to be wrong about performance bottlenecks, and profiles are good
>> > at proving me wrong.
>> 
>> Maybe I'll try to investigate that further at some point. Such things
>> always tend to be so time consuming...
>
> I meant profiling with "M-x profile-start", then run your slow-down
> recipe.  That should be easy and should not consume any significant
> time.  Analyzing the profile could, but producing it shouldn't.

Plus making it reproducible, if it is.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 13:39                             ` Óscar Fuentes
  2024-12-10 14:39                               ` Eli Zaretskii
  2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
@ 2024-12-10 18:13                               ` Gerd Möllmann
  2 siblings, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-10 18:13 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

Óscar Fuentes <ofv@wanadoo.es> writes:

> My perception of the past week or two using igc is that those pauses are
> much less jarring, if perceptible at all. I need more time to make a
> definitive judgment, though.

Please make sure not to have --enable-checking=igc_debug and not to have
--with-mps=debug. They are expensive, and I'm not talking about some
dozen percent :-).



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 12:59                           ` Eli Zaretskii
  2024-12-10 13:39                             ` Óscar Fuentes
@ 2024-12-10 15:23                             ` Pip Cet via Emacs development discussions.
  2024-12-10 17:08                               ` Eli Zaretskii
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 15:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Kangas, luangruo, ali_gnu2, emacs-devel

On Tuesday, December 10th, 2024 at 12:59, Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Stefan Kangas stefankangas@gmail.com
> 
> > Date: Mon, 9 Dec 2024 19:09:59 -0500
> > Cc: pipcet@protonmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com,
> > emacs-devel@gnu.org
> > 
> > Eli Zaretskii eliz@gnu.org writes:
> > 
> > > > From: Stefan Kangas stefankangas@gmail.com
> > > > Date: Sun, 8 Dec 2024 23:59:14 -0500
> > > > Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> > > > 
> > > > Assuming that we are 100% sure that mpc will land, then I can agree that
> > > > making any changes here is basically wasted effort. Unless, of course,
> > > > the change would also simplify the mpc work (would it?).
> > > 
> > > The igc branch already dropped WIDE_EMACS_INT support, so it only
> > > supports USE_LSB anyway.
> > 
> > I thought that WIDE_EMACS_INT will remain supported in non-MPS
> > (i.e. "old GC") builds even after the igc merge? Am I mistaken?
> 
> Probably, but who will want to give up igc to get back WIDE_EMACS_INT
> (if indeed they are incompatible, which seems to be in disagreement)?

It's !USE_LSB_TAG that's incompatible with MPS, not WIDE_EMACS_INT per se. I don't think anyone suggested that there is a fundamental problem if we force USE_LSB_TAG to 1 and enable WIDE_EMACS_INT.

> > > > Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
> > > > on master.
> > > 
> > > That would put the WIDE_EMACS_INT configuration at risk, since that
> > > configuration will need changes.
> > 
> > That's why I proposed disabling it on master tentatively, with the
> > option to revert the change if we don't like it. Setting a flag back to
> > 0 is easy enough. But making the experiment I proposed might also
> > demonstrate that we're fine, after all.
> 
> I think we already know that we are "not fine"? Didn't someone say
> that stack scanning is broken?

!USE_LSB_TAG && !WIDE_EMACS_INT stack scanning is broken (but doesn't currently happen on actual machines)
USE_LSB_TAG && WIDE_EMACS_INT (currently impossible, but trivial to enable) stack scanning works
USE_LSB_TAG && !WIDE_EMACS_INT stack scanning works (this is the usual case)
!USE_LSB_TAG && WIDE_EMACS_INT scack scanning works (this is Eli's situation)

So following Stefan's suggestion would fix the broken case. I've already reported that I tested this with the patch I posted and it appears to work just fine, with or without MPS.

> > > My point is that all of that could be avoided entirely, given some
> > > development decisions which basically drop !USE_LSB_TAG
> > > configurations.
> > 
> > Is your thinking here that we could merge MPS, wait, and then when it
> > comes time to remove the old GC, we will get to drop !USE_LSB_TAG for
> > free? If yes, couldn't that leave us waiting for a very long time
> > indeed?
> 
> Maybe so, but why is such a long wait a problem? GC works, and
> works well. There are no pressing problems there, and we've lived
> with it for many years virtually without changes. What's the urge to
> make modifications there now, especially when there are chances we
> will be dropping this GC at some point?

The old !USE_LSB_TAG code, which is broken, interferes with GC development, both MPS and non-MPS.

> IMO, our main task here is to develop the application levels of Emacs,
> and infrastructure needed to enable such developments. We should only
> invest efforts in stuff like GC and other basics if we see significant
> issues, or could envision significant performance gains. There are no
> such issues or gains here, AFAIU. So diverting our humble resources
> to such jobs is a mistake, IMO.

Given how many GC developers we have already "lost", simplifying the GC code even a little so people can work with it is worth it, IMHO. And encouraging someone to invest resources into fixing a code path that will never again be used is a much greater mistake.

Pip



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 15:23                             ` Pip Cet via Emacs development discussions.
@ 2024-12-10 17:08                               ` Eli Zaretskii
  2024-12-10 18:03                                 ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-10 17:08 UTC (permalink / raw)
  To: Pip Cet; +Cc: stefankangas, luangruo, ali_gnu2, emacs-devel

> Date: Tue, 10 Dec 2024 15:23:45 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>, luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> > > I thought that WIDE_EMACS_INT will remain supported in non-MPS
> > > (i.e. "old GC") builds even after the igc merge? Am I mistaken?
> > 
> > Probably, but who will want to give up igc to get back WIDE_EMACS_INT
> > (if indeed they are incompatible, which seems to be in disagreement)?
> 
> It's !USE_LSB_TAG that's incompatible with MPS, not WIDE_EMACS_INT per se. I don't think anyone suggested that there is a fundamental problem if we force USE_LSB_TAG to 1 and enable WIDE_EMACS_INT.

That's not what Gerd says, AFAIU.  But if you are right, then how
about making the WIDE_EMACS_INT configuration on the igc branch use
USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
a build, if that would help.

> > Maybe so, but why is such a long wait a problem? GC works, and
> > works well. There are no pressing problems there, and we've lived
> > with it for many years virtually without changes. What's the urge to
> > make modifications there now, especially when there are chances we
> > will be dropping this GC at some point?
> 
> The old !USE_LSB_TAG code, which is broken, interferes with GC development, both MPS and non-MPS.

That work is on the igc branch.  My objection is against doing that on
master and/or with the "old" GC code.  In the HAVE_MPS branch of the
code, all the arguments I brought up against removing !USE_LSB_TAG are
null and void, and I therefore have no objections to doing that in
those parts of the code.

> > IMO, our main task here is to develop the application levels of Emacs,
> > and infrastructure needed to enable such developments. We should only
> > invest efforts in stuff like GC and other basics if we see significant
> > issues, or could envision significant performance gains. There are no
> > such issues or gains here, AFAIU. So diverting our humble resources
> > to such jobs is a mistake, IMO.
> 
> Given how many GC developers we have already "lost", simplifying the GC code even a little so people can work with it is worth it, IMHO. And encouraging someone to invest resources into fixing a code path that will never again be used is a much greater mistake.

Our perspectives are very different, so let's agree to disagree on
this.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 17:08                               ` Eli Zaretskii
@ 2024-12-10 18:03                                 ` Gerd Möllmann
  2024-12-10 19:34                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-10 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, stefankangas, luangruo, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Tue, 10 Dec 2024 15:23:45 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>, luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>
>> > > I thought that WIDE_EMACS_INT will remain supported in non-MPS
>> > > (i.e. "old GC") builds even after the igc merge? Am I mistaken?
>> >
>> > Probably, but who will want to give up igc to get back WIDE_EMACS_INT
>> > (if indeed they are incompatible, which seems to be in disagreement)?
>>
>> It's !USE_LSB_TAG that's incompatible with MPS, not WIDE_EMACS_INT per se. I don't think anyone suggested that there is a fundamental problem if we force USE_LSB_TAG to 1 and enable WIDE_EMACS_INT.
>
> That's not what Gerd says, AFAIU.  But if you are right, then how
> about making the WIDE_EMACS_INT configuration on the igc branch use
> USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
> a build, if that would help.

If a Lisp_Object looks like this

  0                 32                   64
  +------------------+-------------------+
  | tag | pointer    |  ...              |
  +------------------+-------------------+

there is a chance it could be made to work, if ugly. That's USE_LSB_TAG
== 1.

If it looks like this

  0                 32                   64
  +------------------+-------------------+
  |       pointer    |  ...         |tag |
  +------------------+-------------------+

it gets a lot more ugly. That's USE_LSB_TAG == 0.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 18:03                                 ` Gerd Möllmann
@ 2024-12-10 19:34                                   ` Pip Cet via Emacs development discussions.
  2024-12-10 19:59                                     ` Gerd Möllmann
  2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 19:34 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, luangruo, ali_gnu2, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> Date: Tue, 10 Dec 2024 15:23:45 +0000
>>> From: Pip Cet <pipcet@protonmail.com>
>>> Cc: Stefan Kangas <stefankangas@gmail.com>, luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>>
>>> > > I thought that WIDE_EMACS_INT will remain supported in non-MPS
>>> > > (i.e. "old GC") builds even after the igc merge? Am I mistaken?
>>> >
>>> > Probably, but who will want to give up igc to get back WIDE_EMACS_INT
>>> > (if indeed they are incompatible, which seems to be in disagreement)?
>>>
>>> It's !USE_LSB_TAG that's incompatible with MPS, not WIDE_EMACS_INT
>>> per se. I don't think anyone suggested that there is a fundamental
>>> problem if we force USE_LSB_TAG to 1 and enable WIDE_EMACS_INT.
>>
>> That's not what Gerd says, AFAIU.  But if you are right, then how
>> about making the WIDE_EMACS_INT configuration on the igc branch use
>> USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
>> a build, if that would help.

Thanks for the offer.  I definitely think we should move away from
USE_LSB_TAG=0 as much as possible, and if the only place where such a
change would not be vetoed is scratch/igc + WIDE_EMACS_INT, we can at
least fix it there.  If any issues arise, of course, it will be more
difficult to ascertain whether they were caused by the USE_LSB_TAG
change or the IGC changes themselves.

So I'll push that change in a bit, unless someone objects.

> If a Lisp_Object looks like this
>
>   0                 32                   64
>   +------------------+-------------------+
>   | tag | pointer    |  ...              |
>   +------------------+-------------------+
>
> there is a chance it could be made to work, if ugly. That's USE_LSB_TAG
> == 1.

It does appear to work.  I'm not sure how it is "ugly", to be honest,
since MPS only sees 32-bit words, and that's the tagged pointer and
0. No changes required.

> If it looks like this
>
>   0                 32                   64
>   +------------------+-------------------+
>   |       pointer    |  ...         |tag |
>   +------------------+-------------------+
>
> it gets a lot more ugly. That's USE_LSB_TAG == 0.

Given that gcc likes storing the two 32-bit words of a 64-bit integer in
non-adjacent places on the stack, it would be quite expensive to get
this working.

And if we decided to do that, it would become a lot more complicated to
change our tagging scheme (which we should do, some time after merging
MPS, to speed up EQ by having a "may be EQ to a different object" tag
or, ideally, bit: EQ could then be simplified to

if (x == y)
  return true;
else if (((x|y) & BIT) == 0)
  return false;

<expensive non-inlined code here>)

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 19:34                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-10 19:59                                     ` Gerd Möllmann
  2024-12-10 20:17                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-10 19:59 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, stefankangas, luangruo, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

>> If a Lisp_Object looks like this
>>
>>   0                 32                   64
>>   +------------------+-------------------+
>>   | tag | pointer    |  ...              |
>>   +------------------+-------------------+
>>
>> there is a chance it could be made to work, if ugly. That's USE_LSB_TAG
>> == 1.
>
> It does appear to work.  I'm not sure how it is "ugly", to be honest,
> since MPS only sees 32-bit words, and that's the tagged pointer and
> 0. No changes required.

I was just assuming it would end up ugly in some form. But I haven't
thought about it much. WIDE_INT and 32-bits are in an SEP field for
me :-).

>> If it looks like this
>>
>>   0                 32                   64
>>   +------------------+-------------------+
>>   |       pointer    |  ...         |tag |
>>   +------------------+-------------------+
>>
>> it gets a lot more ugly. That's USE_LSB_TAG == 0.
>
> Given that gcc likes storing the two 32-bit words of a 64-bit integer in
> non-adjacent places on the stack, it would be quite expensive to get
> this working.

Yeah, that's for sure. Nightmare.

> And if we decided to do that, it would become a lot more complicated to
> change our tagging scheme (which we should do, some time after merging
> MPS, to speed up EQ by having a "may be EQ to a different object" tag
> or, ideally, bit: EQ could then be simplified to
>
> if (x == y)
>   return true;
> else if (((x|y) & BIT) == 0)
>   return false;
>
> <expensive non-inlined code here>)

Hm, interesting idea. One would have to try it out of course to know,
but from a gut feeling, would you say one would notice a difference?
I don't have an "educated" gut feeling wrt EQ.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 19:59                                     ` Gerd Möllmann
@ 2024-12-10 20:17                                       ` Pip Cet via Emacs development discussions.
  2024-12-10 20:34                                         ` Gerd Möllmann
  0 siblings, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-10 20:17 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, luangruo, ali_gnu2, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>>> If a Lisp_Object looks like this
>>>
>>>   0                 32                   64
>>>   +------------------+-------------------+
>>>   | tag | pointer    |  ...              |
>>>   +------------------+-------------------+
>>>
>>> there is a chance it could be made to work, if ugly. That's USE_LSB_TAG
>>> == 1.
>>
>> It does appear to work.  I'm not sure how it is "ugly", to be honest,
>> since MPS only sees 32-bit words, and that's the tagged pointer and
>> 0. No changes required.
>
> I was just assuming it would end up ugly in some form. But I haven't
> thought about it much. WIDE_INT and 32-bits are in an SEP field for
> me :-).
>
>>> If it looks like this
>>>
>>>   0                 32                   64
>>>   +------------------+-------------------+
>>>   |       pointer    |  ...         |tag |
>>>   +------------------+-------------------+
>>>
>>> it gets a lot more ugly. That's USE_LSB_TAG == 0.
>>
>> Given that gcc likes storing the two 32-bit words of a 64-bit integer in
>> non-adjacent places on the stack, it would be quite expensive to get
>> this working.
>
> Yeah, that's for sure. Nightmare.
>
>> And if we decided to do that, it would become a lot more complicated to
>> change our tagging scheme (which we should do, some time after merging
>> MPS, to speed up EQ by having a "may be EQ to a different object" tag
>> or, ideally, bit: EQ could then be simplified to
>>
>> if (x == y)
>>   return true;
>> else if (((x|y) & BIT) == 0)
>>   return false;
>>
>> <expensive non-inlined code here>)
>
> Hm, interesting idea. One would have to try it out of course to know,
> but from a gut feeling, would you say one would notice a difference?
> I don't have an "educated" gut feeling wrt EQ.

My gut feeling is that EQ happens so often that it's worth
micro-optimizing. Andrea has started doing that by using
__builtin_expect, but the assembler code we produce still looks very
inefficient. In particular, we don't even perform a quick exit if the
arguments are BASE_EQ, or attempt to move the cold code into its own
function, which shouldn't be inlined (there are about 2000 locations GDB
thinks correspond to EQ calls in my current Emacs, so that's a lot of
duplicated code).

I was going to suggest a patch to change that...

Of course it's entirely possible that EQ just doesn't matter for
performance.

My entire post-MPS proposal is to have bignums, floats and
symbols-with-position as the "exotic" tags that (may) need special handling in
EQ. That leaves four tags for fixnums, strings, vectorlikes, symbols,
and cons cells, which doesn't work out.

I _think_ the least painful option is to give strings the "treat
specially in EQ" bit, since comparing strings with EQ, while legal, is
rare.

(and, yes, this approach would use Lisp_Type_Unused0 and reduce fixnum
range by one bit).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 20:17                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-10 20:34                                         ` Gerd Möllmann
  0 siblings, 0 replies; 137+ messages in thread
From: Gerd Möllmann @ 2024-12-10 20:34 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, stefankangas, luangruo, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

>> Hm, interesting idea. One would have to try it out of course to know,
>> but from a gut feeling, would you say one would notice a difference?
>> I don't have an "educated" gut feeling wrt EQ.
>
> My gut feeling is that EQ happens so often that it's worth
> micro-optimizing. Andrea has started doing that by using
> __builtin_expect, but the assembler code we produce still looks very
> inefficient. In particular, we don't even perform a quick exit if the
> arguments are BASE_EQ, or attempt to move the cold code into its own
> function, which shouldn't be inlined (there are about 2000 locations GDB
> thinks correspond to EQ calls in my current Emacs, so that's a lot of
> duplicated code).
>
> I was going to suggest a patch to change that...
>
> Of course it's entirely possible that EQ just doesn't matter for
> performance.

The proof is in the pudding, I guess.

> My entire post-MPS proposal is to have bignums, floats and
> symbols-with-position as the "exotic" tags that (may) need special handling in
> EQ. That leaves four tags for fixnums, strings, vectorlikes, symbols,
> and cons cells, which doesn't work out.
>
> I _think_ the least painful option is to give strings the "treat
> specially in EQ" bit, since comparing strings with EQ, while legal, is
> rare.
>
> (and, yes, this approach would use Lisp_Type_Unused0 and reduce fixnum
> range by one bit).

Oops :-).




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-10 18:03                                 ` Gerd Möllmann
  2024-12-10 19:34                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 17:43                                     ` Eli Zaretskii
  2024-12-14 14:30                                     ` Eli Zaretskii
  1 sibling, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-11 14:13 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, luangruo, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> Eli Zaretskii <eliz@gnu.org> writes:
>>> That's not what Gerd says, AFAIU.  But if you are right, then how
>>> about making the WIDE_EMACS_INT configuration on the igc branch use
>>> USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
>>> a build, if that would help.
>
> Thanks for the offer.  I definitely think we should move away from
> USE_LSB_TAG=0 as much as possible, and if the only place where such a
> change would not be vetoed is scratch/igc + WIDE_EMACS_INT, we can at
> least fix it there.  If any issues arise, of course, it will be more
> difficult to ascertain whether they were caused by the USE_LSB_TAG
> change or the IGC changes themselves.
>
> So I'll push that change in a bit, unless someone objects.

Just pushed it to the scratch/igc branch. It shouldn't have any effect
on ordinary 64-bit builds; some of the code is to cater to the
hypothetical big-endian 32-bit use case, and technically the x86 MPS
weak pointer "optimization" could bite us again, but recent GCC does not
generate the precise instructions that MPS emulates, so I'll risk it.

As I've just explained, bug reports for the WIDE_EMACS_INT case will be
difficult to deal with, as there are two major changes; let's see what
happens, but I suspect we'll end up having to ask users to build a !MPS
+ USE_LSB_TAG + WIDE_EMACS_INT configuration.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-11 17:43                                     ` Eli Zaretskii
  2024-12-14 14:30                                     ` Eli Zaretskii
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-11 17:43 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, stefankangas, luangruo, ali_gnu2, emacs-devel

> Date: Wed, 11 Dec 2024 14:13:11 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, stefankangas@gmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> Pip Cet <pipcet@protonmail.com> writes:
> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >>> That's not what Gerd says, AFAIU.  But if you are right, then how
> >>> about making the WIDE_EMACS_INT configuration on the igc branch use
> >>> USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
> >>> a build, if that would help.
> >
> > Thanks for the offer.  I definitely think we should move away from
> > USE_LSB_TAG=0 as much as possible, and if the only place where such a
> > change would not be vetoed is scratch/igc + WIDE_EMACS_INT, we can at
> > least fix it there.  If any issues arise, of course, it will be more
> > difficult to ascertain whether they were caused by the USE_LSB_TAG
> > change or the IGC changes themselves.
> >
> > So I'll push that change in a bit, unless someone objects.
> 
> Just pushed it to the scratch/igc branch. It shouldn't have any effect
> on ordinary 64-bit builds; some of the code is to cater to the
> hypothetical big-endian 32-bit use case, and technically the x86 MPS
> weak pointer "optimization" could bite us again, but recent GCC does not
> generate the precise instructions that MPS emulates, so I'll risk it.

The "normal" (i.e. without WIDE_EMACS_INT) 32-bit MS-Windows build is
now broken:

    CCLD     temacs.exe
    GEN      ../etc/DOC
  /bin/mkdir -p ../etc
  make -C ../lisp update-subdirs
  make[3]: Entering directory `/d/gnu/git/emacs/feature/lisp'
  make[3]: Leaving directory `/d/gnu/git/emacs/feature/lisp'
  cp -f temacs.exe bootstrap-emacs.exe
  rm -f bootstrap-emacs.pdmp
  ./temacs --batch  -l loadup --temacs=pbootstrap \
	  --bin-dest '/d/usr/bin/' --eln-dest '/d/usr/lib/emacs/31.0.50/'
  Loading loadup.el (source)...
  Dump mode: pbootstrap
  Using load-path (d:/gnu/git/emacs/feature/lisp d:/gnu/git/emacs/feature/lisp/emacs-lisp d:/gnu/git/emacs/feature/lisp/progmodes d:/gnu/git/emacs/feature/lisp/language d:/gnu/git/emacs/feature/lisp/international d:/gnu/git/emacs/feature/lisp/textmodes d:/gnu/git/emacs/feature/lisp/vc)
  Loading emacs-lisp/debug-early...
  Loading emacs-lisp/byte-run...
  Loading emacs-lisp/backquote...
  Loading subr...

  lisp.h:1273: Emacs fatal error: assertion failed: !FIXNUM_OVERFLOW_P (n)

  Backtrace:
  0124c3b2
  Makefile:1018: recipe for target `bootstrap-emacs.pdmp' failed
  make[2]: *** [bootstrap-emacs.pdmp] Error 3

Below I show the backtrace and some data from GDB.

Does such a build work on GNU/Linux?

Let me know if I can provide more data for debugging this.

  Thread 1 hit Breakpoint 1, terminate_due_to_signal (sig=sig@entry=22,
      backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:432
  432     {
  (gdb) bt
  #0  terminate_due_to_signal (sig=sig@entry=22,
      backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:432
  #1  0x00b1b53a in die (
      msg=msg@entry=0x10e4889 <i_fwd+849> "!FIXNUM_OVERFLOW_P (n)",
      file=file@entry=0x10e478d <i_fwd+597> "lisp.h", line=line@entry=1273)
      at alloc.c:8377
  #2  0x00bd7830 in make_fixnum (n=<optimized out>) at lisp.h:1273
  #3  0x00bdaf04 in make_fixnum (n=<optimized out>) at lisp.h:1273
  #4  weak_hash_table_entry (entry=...) at igc.c:4111
  #5  0x00b514e9 in WEAK_HASH_INDEX (h=<optimized out>, idx=<optimized out>)
      at fns.c:5487
  #6  0x00b52c4b in weak_hash_lookup_with_hash (h=h@entry=0xb0542b8,
      key=key@entry=XIL(0xb059b43), hash=hash@entry=make_fixnum(14948))
      at fns.c:5722
  #7  0x00b60f85 in Fputhash (key=XIL(0xb059b43), value=XIL(0xb059cfb),
      table=XIL(0xb0542bd)) at fns.c:6555
  #8  0x00b9bc8a in exec_byte_code (fun=<optimized out>, args_template=771,
      args_template@entry=0, nargs=<optimized out>, nargs@entry=0,
      args=<optimized out>, args@entry=0x0) at lisp.h:791
  #9  0x00b9e082 in Fbyte_code (bytestr=<optimized out>, vector=XIL(0xb059e9d),
      maxdepth=make_fixnum(4)) at bytecode.c:325
  #10 0x00b4be63 in eval_sub (form=form@entry=XIL(0xb059c6b)) at eval.c:2610
  #11 0x00b87f70 in readevalloop (readcharfun=readcharfun@entry=XIL(0x60a0),
      infile0=infile0@entry=0x767f048,
      sourcename=sourcename@entry=XIL(0xb059a34),
      printflag=printflag@entry=false, unibyte=unibyte@entry=XIL(0),
      readfun=readfun@entry=XIL(0), start=start@entry=XIL(0),
      end=<optimized out>, end@entry=XIL(0)) at lread.c:2540
  #12 0x00b889e5 in Fload (file=XIL(0xb0598fc), noerror=XIL(0),
      nomessage=XIL(0), nosuffix=XIL(0), must_suffix=<optimized out>)
      at lisp.h:1226
  #13 0x00b4be1a in eval_sub (form=form@entry=XIL(0xb0598eb)) at eval.c:2618
  #14 0x00b87f70 in readevalloop (readcharfun=readcharfun@entry=XIL(0x60a0),
      infile0=infile0@entry=0x767f638,
      sourcename=sourcename@entry=XIL(0xb04a314),
      printflag=printflag@entry=false, unibyte=unibyte@entry=XIL(0),
      readfun=readfun@entry=XIL(0), start=start@entry=XIL(0),
      end=<optimized out>, end@entry=XIL(0)) at lread.c:2540
  #15 0x00b889e5 in Fload (file=XIL(0xb049f84), noerror=XIL(0),
      nomessage=XIL(0), nosuffix=XIL(0), must_suffix=<optimized out>)
      at lisp.h:1226
  #16 0x00b4be1a in eval_sub (form=form@entry=XIL(0xb049fab)) at eval.c:2618
  #17 0x00b4dd99 in Feval (form=XIL(0xb049fab), lexical=lexical@entry=XIL(0x20))
      at eval.c:2463
  #18 0x00aa69a1 in top_level_2 () at lisp.h:1226
  #19 0x00b4613b in internal_condition_case (
      bfun=bfun@entry=0xaa6943 <top_level_2>, handlers=handlers@entry=XIL(0x60),
      hfun=hfun@entry=0xab0496 <cmd_error>) at eval.c:1618
  #20 0x00aa70b0 in top_level_1 (ignore=XIL(0)) at lisp.h:1226
  #21 0x00b46055 in internal_catch (tag=tag@entry=XIL(0xc540),
      func=func@entry=0xaa7087 <top_level_1>, arg=arg@entry=XIL(0))
      at eval.c:1297
  #22 0x00aa675f in command_loop () at lisp.h:1226
  #23 0x00ab0054 in recursive_edit_1 () at keyboard.c:760
  #24 0x00ab0342 in Frecursive_edit () at keyboard.c:843
  #25 0x00cf4375 in main (argc=<optimized out>, argv=<optimized out>)
      at emacs.c:2646
  (gdb) up
  #1  0x00b1b53a in die (
      msg=msg@entry=0x10e4889 <i_fwd+849> "!FIXNUM_OVERFLOW_P (n)",
      file=file@entry=0x10e478d <i_fwd+597> "lisp.h", line=line@entry=1273)
      at alloc.c:8377
  8377      terminate_due_to_signal (SIGABRT, INT_MAX);
  (gdb)
  #2  0x00bd7830 in make_fixnum (n=<optimized out>) at lisp.h:1273
  1273      eassert (!FIXNUM_OVERFLOW_P (n));
  (gdb)
  #3  0x00bdaf04 in make_fixnum (n=<optimized out>) at lisp.h:1273
  1273      eassert (!FIXNUM_OVERFLOW_P (n));
  (gdb)
  #4  weak_hash_table_entry (entry=...) at igc.c:4111
  4111          return make_fixnum (entry.intptr >> 1);
  (gdb) p entry
  $1 = {
    intptr = 4294967295,
    fixnum = make_fixnum(6)
  }
  (gdb) up
  #5  0x00b514e9 in WEAK_HASH_INDEX (h=<optimized out>, idx=<optimized out>)
      at fns.c:5487
  5487      return XFIXNUM (weak_hash_table_entry (h->strong->index[idx]));
  (gdb) up
  #6  0x00b52c4b in weak_hash_lookup_with_hash (h=h@entry=0xa8542b8,
      key=key@entry=XIL(0xa859b43), hash=hash@entry=make_fixnum(14948))
      at fns.c:5722
  5722      for (ptrdiff_t i = WEAK_HASH_INDEX (h, start_of_bucket);
  (gdb) up
  #7  0x00b60f85 in Fputhash (key=XIL(0xa859b43), value=XIL(0xa859cfb),
      table=XIL(0xa8542bd)) at fns.c:6555
  6555          ptrdiff_t i = weak_hash_lookup_with_hash (wh, key, hash);



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
  2024-12-11 17:43                                     ` Eli Zaretskii
@ 2024-12-14 14:30                                     ` Eli Zaretskii
  2024-12-15 10:55                                       ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-14 14:30 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, emacs-devel

> Date: Wed, 11 Dec 2024 14:13:11 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, stefankangas@gmail.com, luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> Pip Cet <pipcet@protonmail.com> writes:
> > Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >>> That's not what Gerd says, AFAIU.  But if you are right, then how
> >>> about making the WIDE_EMACS_INT configuration on the igc branch use
> >>> USE_LSB_TAG in the HAVE_MPS code branch?  I can volunteer to test such
> >>> a build, if that would help.
> >
> > Thanks for the offer.  I definitely think we should move away from
> > USE_LSB_TAG=0 as much as possible, and if the only place where such a
> > change would not be vetoed is scratch/igc + WIDE_EMACS_INT, we can at
> > least fix it there.  If any issues arise, of course, it will be more
> > difficult to ascertain whether they were caused by the USE_LSB_TAG
> > change or the IGC changes themselves.
> >
> > So I'll push that change in a bit, unless someone objects.
> 
> Just pushed it to the scratch/igc branch. It shouldn't have any effect
> on ordinary 64-bit builds; some of the code is to cater to the
> hypothetical big-endian 32-bit use case, and technically the x86 MPS
> weak pointer "optimization" could bite us again, but recent GCC does not
> generate the precise instructions that MPS emulates, so I'll risk it.
> 
> As I've just explained, bug reports for the WIDE_EMACS_INT case will be
> difficult to deal with, as there are two major changes; let's see what
> happens, but I suspect we'll end up having to ask users to build a !MPS
> + USE_LSB_TAG + WIDE_EMACS_INT configuration.

Thanks, I've now built the igc branch --with-wide-int, and it compiled
cleanly, with the exception of these 2 warnings:

    CC       igc.o
  igc.c: In function 'weak_hash_table_entry':
  igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   4102 |       client = (mps_addr_t)entry.intptr;
	|                ^
  igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
   4107 |       client = (mps_addr_t)real_ptr;
	|                ^

The warnings are real, because mps_addr_t is a 'void *', so a 32-bit
data type, whereas entry.intptr is EMACS_UINT, so an unsigned 64-bit
type.

I've started the produced Emacs and scrolled through a long file,
which seemed to work.  However, profiling doesn't work, whereas it did
in the "normal" 32-bit build.  (Note that SIGPROF is emulated on
Windows, so maybe that emulation somehow causes this problem when wide
ints are used with MPS.)

Thanks.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-14 14:30                                     ` Eli Zaretskii
@ 2024-12-15 10:55                                       ` Pip Cet via Emacs development discussions.
  2024-12-15 11:13                                         ` Eli Zaretskii
  0 siblings, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-15 10:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Just pushed it to the scratch/igc branch. It shouldn't have any effect
>> on ordinary 64-bit builds; some of the code is to cater to the
>> hypothetical big-endian 32-bit use case, and technically the x86 MPS
>> weak pointer "optimization" could bite us again, but recent GCC does not
>> generate the precise instructions that MPS emulates, so I'll risk it.
>>
>> As I've just explained, bug reports for the WIDE_EMACS_INT case will be
>> difficult to deal with, as there are two major changes; let's see what
>> happens, but I suspect we'll end up having to ask users to build a !MPS
>> + USE_LSB_TAG + WIDE_EMACS_INT configuration.
>
> Thanks, I've now built the igc branch --with-wide-int, and it compiled
> cleanly, with the exception of these 2 warnings:
>
>     CC       igc.o
>   igc.c: In function 'weak_hash_table_entry':
>   igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>    4102 |       client = (mps_addr_t)entry.intptr;
> 	|                ^
>   igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>    4107 |       client = (mps_addr_t)real_ptr;
> 	|                ^
>
> The warnings are real, because mps_addr_t is a 'void *', so a 32-bit
> data type, whereas entry.intptr is EMACS_UINT, so an unsigned 64-bit
> type.

Oh, sorry for causing those.

The intended behavior is to truncate the integer and use the 32 LSB
bits, which is safe on the machines MPS is ported to, and is expressed
using a cast to mps_addr_t.  So the code behaves correctly, but is
incorrect because it causes a compiler warning.

What's the preferred way of avoiding a compiler warning in this case?
A simple double cast (first to uintptr_t, then to mps_addr_t) should
work, right?

> I've started the produced Emacs and scrolled through a long file,
> which seemed to work.

Thank you for testing!

> However, profiling doesn't work, whereas it did
> in the "normal" 32-bit build.  (Note that SIGPROF is emulated on
> Windows, so maybe that emulation somehow causes this problem when wide
> ints are used with MPS.)

Thanks for letting me know! That certainly sounds like a regression we
should fix.  What kind of problem are we talking about?

If it's not reproducible on GNU/Linux (at first glance, the profiler
works fine in an i686 --with-wide-int configuration, but I don't know
what kind of problem you are experiencing), the next problem is that the
mingw32 (the OSDN kind, not the msys2 kind) toolchain is currently
unavailable (https://dl.osdn.net isn't responding at all right now,
rather than responding with an expired SSL certificate as it sometimes
does).

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-15 10:55                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-15 11:13                                         ` Eli Zaretskii
  2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
  2024-12-17 19:10                                           ` Paul Eggert
  0 siblings, 2 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-15 11:13 UTC (permalink / raw)
  To: Pip Cet, Paul Eggert; +Cc: gerd.moellmann, emacs-devel

> Date: Sun, 15 Dec 2024 10:55:49 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >     CC       igc.o
> >   igc.c: In function 'weak_hash_table_entry':
> >   igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> >    4102 |       client = (mps_addr_t)entry.intptr;
> > 	|                ^
> >   igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> >    4107 |       client = (mps_addr_t)real_ptr;
> > 	|                ^
> >
> > The warnings are real, because mps_addr_t is a 'void *', so a 32-bit
> > data type, whereas entry.intptr is EMACS_UINT, so an unsigned 64-bit
> > type.
> 
> Oh, sorry for causing those.
> 
> The intended behavior is to truncate the integer and use the 32 LSB
> bits, which is safe on the machines MPS is ported to, and is expressed
> using a cast to mps_addr_t.  So the code behaves correctly, but is
> incorrect because it causes a compiler warning.

What about the (hypothetical) case of big-endian systems?

> What's the preferred way of avoiding a compiler warning in this case?
> A simple double cast (first to uintptr_t, then to mps_addr_t) should
> work, right?

I'll defer to Paul (CC'ed), but my personal preference is also to
explicitly reset the ignored bits by bitwise AND.

> > However, profiling doesn't work, whereas it did
> > in the "normal" 32-bit build.  (Note that SIGPROF is emulated on
> > Windows, so maybe that emulation somehow causes this problem when wide
> > ints are used with MPS.)
> 
> Thanks for letting me know! That certainly sounds like a regression we
> should fix.  What kind of problem are we talking about?

The profile is (was) empty.

However, I repeated the test now, and I see that the profile does
work.  I guess yesterday wins my personal record of producing
irreproducible results: this one and the one with
completion-at-point-functions in another discussion.

Sorry for the noise.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-15 11:13                                         ` Eli Zaretskii
@ 2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
  2024-12-15 12:52                                             ` Eli Zaretskii
  2024-12-15 19:54                                             ` John ff
  2024-12-17 19:10                                           ` Paul Eggert
  1 sibling, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-15 12:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, gerd.moellmann, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sun, 15 Dec 2024 10:55:49 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >     CC       igc.o
>> >   igc.c: In function 'weak_hash_table_entry':
>> >   igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>> >    4102 |       client = (mps_addr_t)entry.intptr;
>> > 	|                ^
>> >   igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>> >    4107 |       client = (mps_addr_t)real_ptr;
>> > 	|                ^
>> >
>> > The warnings are real, because mps_addr_t is a 'void *', so a 32-bit
>> > data type, whereas entry.intptr is EMACS_UINT, so an unsigned 64-bit
>> > type.
>>
>> Oh, sorry for causing those.
>>
>> The intended behavior is to truncate the integer and use the 32 LSB
>> bits, which is safe on the machines MPS is ported to, and is expressed
>> using a cast to mps_addr_t.  So the code behaves correctly, but is
>> incorrect because it causes a compiler warning.
>
> What about the (hypothetical) case of big-endian systems?

Not a problem here, but there is some good news concerning the
"hypothetical" part: I'm testing 32-bit builds on a sparc system, so
it's no longer hypothetical (thanks again to the cfarm people for giving
me an account).

The bad news is that while MPS works when Emacs is run normally in a
non-wide-int build, running Emacs in GDB does not work: the siginfo_t
information passed to the SIGSEGV handler isn't preserved.  Currently,
the --with-wide-int build infloops rather than crashing or working, but
I can't attach a debugger, so I'll have to learn how to trigger (and
find) a core dump on this system.

It's possible this problem is an unavoidable vicious cycle: since
small integers are easily confused with pointers on this system, more
objects are "ambiguously" recognized as reachable even though they
aren't actually reachable; that causes more objects to be retained,
which causes more integer values to be treated as potential pointers,
resulting in even more retained objects, until finally we run out of
virtual memory.  At least that would explain the 2 GB core file...

>> What's the preferred way of avoiding a compiler warning in this case?
>> A simple double cast (first to uintptr_t, then to mps_addr_t) should
>> work, right?
>
> I'll defer to Paul (CC'ed), but my personal preference is also to
> explicitly reset the ignored bits by bitwise AND.

Either way sounds good to me, and I expect both ways will result in
future compiler warnings (hopefully, these future compilers will also
have a better way of indicating that a cast from a 64-bit integer to a
32-bit pointer is intended here).

>> > However, profiling doesn't work, whereas it did
>> > in the "normal" 32-bit build.  (Note that SIGPROF is emulated on
>> > Windows, so maybe that emulation somehow causes this problem when wide
>> > ints are used with MPS.)
>>
>> Thanks for letting me know! That certainly sounds like a regression we
>> should fix.  What kind of problem are we talking about?

> However, I repeated the test now, and I see that the profile does
> work.  I guess yesterday wins my personal record of producing
> irreproducible results: this one and the one with
> completion-at-point-functions in another discussion.
>
> Sorry for the noise.

No problem at all, and thank you!

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-15 12:52                                             ` Eli Zaretskii
  2024-12-15 19:54                                             ` John ff
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-15 12:52 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, gerd.moellmann, emacs-devel

> Date: Sun, 15 Dec 2024 12:09:33 +0000
> Cc: Paul Eggert <eggert@cs.ucla.edu>, gerd.moellmann@gmail.com,
>  emacs-devel@gnu.org
> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> 
> The bad news is that while MPS works when Emacs is run normally in a
> non-wide-int build, running Emacs in GDB does not work: the siginfo_t
> information passed to the SIGSEGV handler isn't preserved.

Worth reporting to GDB folks, I presume?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
  2024-12-15 12:52                                             ` Eli Zaretskii
@ 2024-12-15 19:54                                             ` John ff
  1 sibling, 0 replies; 137+ messages in thread
From: John ff @ 2024-12-15 19:54 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, Paul Eggert, gerd.moellmann, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]



⁣

On 15 Dec 2024, 12:23, at 12:23, "Pip Cet via Emacs development discussions." <emacs-devel@gnu.org> wrote:
>"Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> Date: Sun, 15 Dec 2024 10:55:49 +0000
>>> From: Pip Cet <pipcet@protonmail.com>
>>> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org
>>>
>>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>>
>>> >     CC       igc.o
>>> >   igc.c: In function 'weak_hash_table_entry':
>>> >   igc.c:4102:16: warning: cast to pointer from integer of
>different size [-Wint-to-pointer-cast]
>>> >    4102 |       client = (mps_addr_t)entry.intptr;
>>> > 	|                ^
>>> >   igc.c:4107:16: warning: cast to pointer from integer of
>different size [-Wint-to-pointer-cast]
>>> >    4107 |       client = (mps_addr_t)real_ptr;
>>> > 	|                ^
>>> >
>>> > The warnings are real, because mps_addr_t is a 'void *', so a
>32-bit
>>> > data type, whereas entry.intptr is EMACS_UINT, so an unsigned
>64-bit
>>> > type.
>>>
>>> Oh, sorry for causing those.
>>>
>>> The intended behavior is to truncate the integer and use the 32 LSB
>>> bits, which is safe on the machines MPS is ported to, and is
>expressed
>>> using a cast to mps_addr_t.  So the code behaves correctly, but is
>>> incorrect because it causes a compiler warning.
>>
>> What about the (hypothetical) case of big-endian systems?
>
>Not a problem here, but there is some good news concerning the
>"hypothetical" part: I'm testing 32-bit builds on a sparc system, so
>it's no longer hypothetical (thanks again to the cfarm people for
>giving
>me an account).
>
>The bad news is that while MPS works when Emacs is run normally in a
>non-wide-int build, running Emacs in GDB does not work: the siginfo_t
>information passed to the SIGSEGV handler isn't preserved.  Currently,
>the --with-wide-int build infloops rather than crashing or working, but
>I can't attach a debugger, so I'll have to learn how to trigger (and
>find) a core dump on this system.
>
>It's possible this problem is an unavoidable vicious cycle: since
>small integers are easily confused with pointers on this system, more
>objects are "ambiguously" recognized as reachable even though they
>aren't actually reachable; that causes more objects to be retained,
>which causes more integer values to be treated as potential pointers,
>resulting in even more retained objects, until finally we run out of
>virtual memory.  At least that would explain the 2 GB core file...
>
>>> What's the preferred way of avoiding a compiler warning in this
>case?
>>> A simple double cast (first to uintptr_t, then to mps_addr_t) should
>>> work, right?
>>
>> I'll defer to Paul (CC'ed), but my personal preference is also to
>> explicitly reset the ignored bits by bitwise AND.
>
>Either way sounds good to me, and I expect both ways will result in
>future compiler warnings (hopefully, these future compilers will also
>have a better way of indicating that a cast from a 64-bit integer to a
>32-bit pointer is intended here).
>
>>> > However, profiling doesn't work, whereas it did
>>> > in the "normal" 32-bit build.  (Note that SIGPROF is emulated on
>>> > Windows, so maybe that emulation somehow causes this problem when
>wide
>>> > ints are used with MPS.)
>>>
>>> Thanks for letting me know! That certainly sounds like a regression
>we
>>> should fix.  What kind of problem are we talking about?
>
>> However, I repeated the test now, and I see that the profile does
>> work.  I guess yesterday wins my personal record of producing
>> irreproducible results: this one and the one with
>> completion-at-point-functions in another discussion.
>>
>> Sorry for the noise.
>
>No problem at all, and thank you!
>
>Pip

[-- Attachment #2: Type: text/html, Size: 5258 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-15 11:13                                         ` Eli Zaretskii
  2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-17 19:10                                           ` Paul Eggert
  2024-12-17 19:43                                             ` Pip Cet via Emacs development discussions.
  2024-12-17 20:19                                             ` Eli Zaretskii
  1 sibling, 2 replies; 137+ messages in thread
From: Paul Eggert @ 2024-12-17 19:10 UTC (permalink / raw)
  To: Eli Zaretskii, Pip Cet; +Cc: gerd.moellmann, emacs-devel

On 2024-12-15 03:13, Eli Zaretskii wrote:
>> Date: Sun, 15 Dec 2024 10:55:49 +0000
>> From: Pip Cet<pipcet@protonmail.com>
>>
>> "Eli Zaretskii"<eliz@gnu.org> writes:
>>
>>>      CC       igc.o
>>>    igc.c: In function 'weak_hash_table_entry':
>>>    igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>>>     4102 |       client = (mps_addr_t)entry.intptr;
>>> 	|                ^
>>>    igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>>>     4107 |       client = (mps_addr_t)real_ptr;
>>> 	|                ^
...
>> What's the preferred way of avoiding a compiler warning in this case?
>> A simple double cast (first to uintptr_t, then to mps_addr_t) should
>> work, right?
> I'll defer to Paul (CC'ed), but my personal preference is also to
> explicitly reset the ignored bits by bitwise AND.

The usual way I avoid such warnings is a single cast of the pointer to 
uintptr_t (or to intptr_t, if the eventual destination is signed).

There is no need for two casts, or for a bitwise AND, and I usually 
avoid these needless operations as they can be more trouble than they're 
worth: they introduce more hassle for maintainers and more possibilities 
for bugs.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 19:10                                           ` Paul Eggert
@ 2024-12-17 19:43                                             ` Pip Cet via Emacs development discussions.
  2024-12-17 20:00                                               ` Paul Eggert
  2024-12-17 20:19                                             ` Eli Zaretskii
  1 sibling, 1 reply; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-17 19:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

"Paul Eggert" <eggert@cs.ucla.edu> writes:

> On 2024-12-15 03:13, Eli Zaretskii wrote:
>>> Date: Sun, 15 Dec 2024 10:55:49 +0000
>>> From: Pip Cet<pipcet@protonmail.com>
>>>
>>> "Eli Zaretskii"<eliz@gnu.org> writes:
>>>
>>>>      CC       igc.o
>>>>    igc.c: In function 'weak_hash_table_entry':
>>>>    igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>>>>     4102 |       client = (mps_addr_t)entry.intptr;
>>>> 	|                ^
>>>>    igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>>>>     4107 |       client = (mps_addr_t)real_ptr;
>>>> 	|                ^
> ...
>>> What's the preferred way of avoiding a compiler warning in this case?
>>> A simple double cast (first to uintptr_t, then to mps_addr_t) should
>>> work, right?
>> I'll defer to Paul (CC'ed), but my personal preference is also to
>> explicitly reset the ignored bits by bitwise AND.
>
> The usual way I avoid such warnings is a single cast of the pointer to
> uintptr_t (or to intptr_t, if the eventual destination is signed).

> There is no need for two casts, or for a bitwise AND, and I usually

In this case, the eventual destination is a pointer, so I don't see how
a single cast would work.  Changing mps_addr_t to be an integer seems a
bit drastic.

(The change I'm currently using does use an intermediate uintptr_t
variable, but it's still two casts).

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 19:43                                             ` Pip Cet via Emacs development discussions.
@ 2024-12-17 20:00                                               ` Paul Eggert
  0 siblings, 0 replies; 137+ messages in thread
From: Paul Eggert @ 2024-12-17 20:00 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On 2024-12-17 11:43, Pip Cet wrote:
> In this case, the eventual destination is a pointer, so I don't see how
> a single cast would work.

Sorry, didn't know that.

(The change I'm currently using does use an intermediate uintptr_t
variable, but it's still two casts).

If the source is a wide integer and there's an intermediate uintptr_t 
variable (which sounds like a good idea), then you should need only one 
cast: from uintptr_t to void * (or whatever pointer type you prefer).

I try to avoid casts in C whenever possible, since they're too powerful. 
So I'd prefer a one-cast to a two-cast approach.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 19:10                                           ` Paul Eggert
  2024-12-17 19:43                                             ` Pip Cet via Emacs development discussions.
@ 2024-12-17 20:19                                             ` Eli Zaretskii
  2024-12-17 21:14                                               ` Paul Eggert
  1 sibling, 1 reply; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-17 20:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pipcet, gerd.moellmann, emacs-devel

> Date: Tue, 17 Dec 2024 11:10:45 -0800
> Cc: gerd.moellmann@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> >>>      CC       igc.o
> >>>    igc.c: In function 'weak_hash_table_entry':
> >>>    igc.c:4102:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> >>>     4102 |       client = (mps_addr_t)entry.intptr;
> >>> 	|                ^
> >>>    igc.c:4107:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> >>>     4107 |       client = (mps_addr_t)real_ptr;
> >>> 	|                ^
> ...
> >> What's the preferred way of avoiding a compiler warning in this case?
> >> A simple double cast (first to uintptr_t, then to mps_addr_t) should
> >> work, right?
> > I'll defer to Paul (CC'ed), but my personal preference is also to
> > explicitly reset the ignored bits by bitwise AND.
> 
> The usual way I avoid such warnings is a single cast of the pointer to 
> uintptr_t (or to intptr_t, if the eventual destination is signed).

But then why doesn't the cast to mps_addr_t work to avoid the warning?
mps_addr_t is 'void *', a 32-bit pointer.  Is this something specific
to pointers, while casting to a 32-bit integer doesn't trigger the
warnings?  If so, what is the rationale for warning about pointers,
but not about integers?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 20:19                                             ` Eli Zaretskii
@ 2024-12-17 21:14                                               ` Paul Eggert
  0 siblings, 0 replies; 137+ messages in thread
From: Paul Eggert @ 2024-12-17 21:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, emacs-devel

On 2024-12-17 12:19, Eli Zaretskii wrote:
> why doesn't the cast to mps_addr_t work to avoid the warning?
> mps_addr_t is 'void *', a 32-bit pointer.  Is this something specific
> to pointers, while casting to a 32-bit integer doesn't trigger the
> warnings?

Yes, the idea is for GCC to diagnose casts between pointers and 
wrong-sized integers.

> what is the rationale for warning about pointers,
> but not about integers?

Warning about integer conversion is controlled by -Wconversion. However, 
that GCC option generates far too many false positives for useful code, 
so I don't recommend it. I just tried -Wconversion on Emacs master and 
got over 10,000 bogus warnings. Changing Emacs to pacify -Wconversion 
would cause more trouble than it'd cure.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  4:59                     ` Stefan Kangas
  2024-12-09 14:39                       ` Eli Zaretskii
@ 2024-12-09 16:21                       ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-09 16:21 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Eli Zaretskii, luangruo, ali_gnu2, emacs-devel

"Stefan Kangas" <stefankangas@gmail.com> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>>> Date: Sun, 08 Dec 2024 17:37:50 +0000
>>> From: Pip Cet <pipcet@protonmail.com>
>>> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
>>>
>>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>>
>>> >> So let's remove it, and switch WIDE_EMACS_INT builds to USE_LSB?
>>> >
>>> > That'd be a waste of effort.
>>>
>>> It'd be a good investment of effort today, in exchange for making the GC
>>> code significantly easier to understand and maintain in the future. It
>>> would certainly not be without its benefits, so calling it a "waste of
>>> effort" is unfair.
>>
>> I disagree.  We've lived with this GC code for a long time, and I
>> don't see any complications due to !USE_LSB.  And if we are going to
>> switch to igc at some point, investment in GC is even less sensible.
>
> Assuming that we are 100% sure that mpc will land, then I can agree that

Even if mps does land on master, the old GC will remain in place for a
very long time, so I don't think we should declare the old GC a
do-not-touch zone just yet.

> making any changes here is basically wasted effort.  Unless, of course,
> the change would also simplify the mpc work (would it?).

I believe it would, yes.

> On the other hand, IIUC, we have some way to go with making the merging
> of the mpc branch a guarantee.  While I'm an enthusiastic supporter of
> the great work that's being done on the mpc branch, isn't hedging our
> bets prudent until that work is done?
>
> Or am I misunderstanding how close we are to merging the mpc branch?

My current understanding is that Eli expressed requirements for how
things like the signalling issue should be fixed. While I have a
solution that appears to work, it doesn't meet these requirements.

>>> If performance and wasted memory aren't issues, then it's a tradeoff
>>> between leaving old code untouched and simplifying it to enable future
>>> development.
>>
>> The existing code doesn't preclude nor interfere with future
>> development.  So yes, leaving working code untouched is the preference
>> here.
>
> Based on my limited mucking around in the GC, it does interfere somewhat
> because you do need to understand both configurations, at least on a
> high level, and once you do you need to mentally filter that stuff out
> when reading the code.  So I think I'd appreciate the simplification, at
> least.

I agree with Stefan here. Also, let's keep in mind that !USE_LSB_TAG in
its original use case is currently broken, and has been for a long time.

> If the only known drawbacks are stability concerns, we could also
> consider an intermediate step along these lines:
>
> Leave the USE_LSB_TAG code as is, but set it to 1 in all configurations
> on master.  See what issues crop up, if any.  If anything does come up,
> ask Pip Cet to fix it (he volunteered, IIUC), and if things are starting
> to look too hairy, revert EMACS_WIDE_INT back to !USE_LSB_TAG.  If
> nothing too bad comes up, we can then consider removing the associated
> code in Emacs 32.

I think that would be a good approach!

I'd just like to add that stability concerns go both ways: it's a good
reason to move the very few remaining users of !USE_LSB_TAG to use the
same code (and experience the same problems) as all other users, rather
than splitting what time we have for GC work between the two code paths.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 18:41                   ` Eli Zaretskii
  2024-12-08 19:15                     ` Gerd Möllmann
  2024-12-09  4:59                     ` Stefan Kangas
@ 2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
  2024-12-17 14:16                       ` Eli Zaretskii
  2024-12-18  0:55                       ` Po Lu
  2 siblings, 2 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-17 13:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

"Eli Zaretskii" <eliz@gnu.org> writes:

>> > Modern x86 CPUs can handle 64-bit values just fine, thank you.
>>
>> Modern x86 CPUs running 32-bit code (x86, not x32) still need two
>> register names for each 64-bit value.  With 8 GPRs, that's a significant
>> problem. So, no, "just fine" isn't accurate here.
>
> I again disagree.  And you forget other registers.

I think this is a perfect example of why discussions with Eli are so
hard. What do you disagree with? Which mysterious "other registers" do
you mean? Why do you think I "forget" about them, with the implication
that I do not understand the x86 architecture?

(Of course I'm not forgetting about other registers: x87 FPU registers
(which mishandle NaN bit patterns), MMX registers (which are unusable
because FPU might be in use), and XMM registers (which do not support
integer comparisons setting flags) are all irrelevant in the context
we're discussing.  I know you know this, which makes your statement even
less appropriate.)

As for the actual problem, I've continued working on this, and I believe
I've come up with a solution which

1. drops WIDE_EMACS_INT
2. drops !USE_LSB_TAG
3. allows the use of "small bignums" for buffer and string positions
4. makes eq = eql for "small bignums", where appropriate
5. adjusts the hash table code to do the same
6. enables us to introduce further "exotic" objects (non-trivial 'eq')
7. speeds up and simplifies EQ by branching on a tag bit rather than a
global variable
8. possibly (see below) reduces the fixnum range by one bit (invisible to users)

I'll post about it in a separate thread once I've decided on a few minor
issues:

1. what to do about native compilation.  The native compilation code
currently produces incorrect code and obviously it is (or should be) a
priority to fix that before touching the code in other ways.
2. whether to avoid using GMP for "small" bignums.  It would save some
memory. This would require an extra tag which would reduce the fixnum
range by one bit.
3. what to do about most-positive-fixnum and most-negative-fixnum. These
are the only places that "leak" the size of a fixnum to Lisp in the new
code, and it's not clear whether we should return the most positive
fixnum that fits into a Lisp_Object or the most positive "small bignum",
or maybe always return the 62-bit most-positive-fixnum.

Pip

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
@ 2024-12-17 14:16                       ` Eli Zaretskii
  2024-12-18  0:55                       ` Po Lu
  1 sibling, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-17 14:16 UTC (permalink / raw)
  To: Pip Cet; +Cc: luangruo, ali_gnu2, emacs-devel

> Date: Tue, 17 Dec 2024 13:12:57 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: luangruo@yahoo.com, ali_gnu2@emvision.com, emacs-devel@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> > Modern x86 CPUs can handle 64-bit values just fine, thank you.
> >>
> >> Modern x86 CPUs running 32-bit code (x86, not x32) still need two
> >> register names for each 64-bit value.  With 8 GPRs, that's a significant
> >> problem. So, no, "just fine" isn't accurate here.
> >
> > I again disagree.  And you forget other registers.
> 
> I think this is a perfect example of why discussions with Eli are so
> hard.

You always have an option not to argue.

> As for the actual problem, I've continued working on this, and I believe
> I've come up with a solution which
> 
> 1. drops WIDE_EMACS_INT
> 2. drops !USE_LSB_TAG
> 3. allows the use of "small bignums" for buffer and string positions
> 4. makes eq = eql for "small bignums", where appropriate
> 5. adjusts the hash table code to do the same
> 6. enables us to introduce further "exotic" objects (non-trivial 'eq')
> 7. speeds up and simplifies EQ by branching on a tag bit rather than a
> global variable
> 8. possibly (see below) reduces the fixnum range by one bit (invisible to users)

I don't like this solution, because it again will cause us a lot of
code churn for very little gain.  How many people use WIDE_EMACS_INT
these days?  I could be the only one.  That code works, and works well
enough.  Certainly well enough for me, and I have yet to hear
complaints from anyone else.  And !USE_LSB_TAG was dropped in the igc
code, so when we merge that, every platform which uses MPS will not
have !USE_LSB_TAG.

So what is left is to remove WIDE_EMACS_INT, which we should do in
some not too distant future, when its last user (yours truly) retires
his 32-bit development environment.  (We should announce this
prominently and enough time in advance, so that in the unlikely case
that there are any other users of that configuration, they will have
ample opportunity to try to convince us to delay or rethink.)

So my suggestion is that you stop wasting your time on the above,
because I'm not interested in considering such significant changes for
so little gain.  What you did on the igc branch regarding
WIDE_EMACS_INT and !USE_LSB_TAG (thanks!) should be enough for us, and
let's turn our attention and resources to more promising directions.

Thanks.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
  2024-12-17 14:16                       ` Eli Zaretskii
@ 2024-12-18  0:55                       ` Po Lu
  2024-12-18  9:24                         ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-12-18  0:55 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> > Modern x86 CPUs can handle 64-bit values just fine, thank you.
>>>
>>> Modern x86 CPUs running 32-bit code (x86, not x32) still need two
>>> register names for each 64-bit value.  With 8 GPRs, that's a significant
>>> problem. So, no, "just fine" isn't accurate here.
>>
>> I again disagree.  And you forget other registers.
>
> I think this is a perfect example of why discussions with Eli are so
> hard. What do you disagree with? Which mysterious "other registers" do
> you mean? Why do you think I "forget" about them, with the implication
> that I do not understand the x86 architecture?

I think it's clear that users of the 32-bit PC architecture are expected
to sacrifice some performance by their choice.  Why cannot the question
whether the tradeoffs are acceptable be reserved to those users?  The
mere existence of the USE_WIDE_INT MinGW configuration is evidence
enough that users exist who do not need our arbitrary judgement.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-18  0:55                       ` Po Lu
@ 2024-12-18  9:24                         ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-18  9:24 UTC (permalink / raw)
  To: Po Lu; +Cc: Eli Zaretskii, ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>>>> > Modern x86 CPUs can handle 64-bit values just fine, thank you.
>>>>
>>>> Modern x86 CPUs running 32-bit code (x86, not x32) still need two
>>>> register names for each 64-bit value.  With 8 GPRs, that's a significant
>>>> problem. So, no, "just fine" isn't accurate here.
>>>
>>> I again disagree.  And you forget other registers.
>>
>> I think this is a perfect example of why discussions with Eli are so
>> hard. What do you disagree with? Which mysterious "other registers" do
>> you mean? Why do you think I "forget" about them, with the implication
>> that I do not understand the x86 architecture?
>
> I think it's clear that users of the 32-bit PC architecture are expected
> to sacrifice some performance by their choice.  Why cannot the question
> whether the tradeoffs are acceptable be reserved to those users?  The
> mere existence of the USE_WIDE_INT MinGW configuration is evidence
> enough that users exist who do not need our arbitrary judgement.

I have no idea what you're responding to here.  Of course users will
continue to be able to use 32-bit PCs, and there is no "arbitrary
judgement" in anything I proposed.  I propose to remove WIDE_EMACS_INT
(if that is what you're talking about?  I honestly don't know) because
it's no longer necessary to make any tradeoffs there, not because I
think that no one should ever have used it.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 16:49               ` Eli Zaretskii
  2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
@ 2024-12-08 18:47                 ` Pip Cet via Emacs development discussions.
  2024-12-09  1:13                 ` Po Lu
  2 siblings, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-08 18:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: luangruo, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:
>>> > In fact, one of my strongest reservations about the igc branch is that
>>> > it will most probably force me to lose WIDE_EMACS_INT.
>>>
>>> I believe that problem is exclusively due to the fact that
>>> WIDE_EMACS_INT implies USE_LSB=0. Dropping !USE_LSB should allow us to
>>> use WIDE_EMACS_INT normally in MPS builds, I think.
>>
>> No, there's also a built-in assumption in MPS about the size of a
>> word.
>
> That's very vague. If there is an assumption that EMACS_INT ==
> mps_word_t, it would certainly not be built into MPS, which doesn't know
> about EMACS_INT at all.  But as it is, I have no idea where you even
> suspect this "built-in" assumption is made.

FWIW, my MPS branch works fine in this constellation (32-bit x86,
WIDE_EMACS_INT, USE_LSB_TAG) on GNU/Linux.  If there is an issue, it
must be quite subtle. Or specific to mingw32, which would mean it has to
wait until some day, if ever, that toolchain becomes available on the
internet again.

commit 4370e866d8557b55c948e740d119e170338b91fd
Author: Pip Cet <pipcet@protonmail.com>
Date:   Sun Dec 8 17:45:12 2024 +0000

    try enabling MPS for WIDE_EMACS_INT + USE_LSB_TAG builds

diff --git a/src/igc.c b/src/igc.c
index 4589cfd0085..e97277d962c 100644
--- a/src/igc.c
+++ b/src/igc.c
@@ -70,9 +70,6 @@
 #ifndef USE_LSB_TAG
 # error "USE_LSB_TAG required"
 #endif
-#ifdef WIDE_EMACS_INT
-# error "WIDE_EMACS_INT not supported"
-#endif
 #if USE_STACK_LISP_OBJECTS
 # error "USE_STACK_LISP_OBJECTS not supported"
 #endif
diff --git a/src/lisp.h b/src/lisp.h
index d4638fa160c..0541f8f901b 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -280,7 +280,7 @@ #define VAL_MAX (EMACS_INT_MAX >> (GCTYPEBITS - 1))
     b. slower, because it typically requires extra masking.
    So, USE_LSB_TAG is true only on hosts where it might be useful.  */
 DEFINE_GDB_SYMBOL_BEGIN (bool, USE_LSB_TAG)
-#define USE_LSB_TAG (VAL_MAX / 2 < INTPTR_MAX)
+#define USE_LSB_TAG 1
 DEFINE_GDB_SYMBOL_END (USE_LSB_TAG)
 
 /* Mask for the value (as opposed to the type bits) of a Lisp object.  */





^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 16:49               ` Eli Zaretskii
  2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
  2024-12-08 18:47                 ` Pip Cet via Emacs development discussions.
@ 2024-12-09  1:13                 ` Po Lu
  2 siblings, 0 replies; 137+ messages in thread
From: Po Lu @ 2024-12-09  1:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, ali_gnu2, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> As 32-bit systems go away, it will become harder to write Lisp code that
>> works correctly in !WIDE_EMACS_INT 32-bit builds, so we may well have to
>> make WIDE_EMACS_INT the default at some point.
>
> If you are trying to convince me to switch to 64-bit development
> environment, you are wasting your time.  I have my very good reasons,
> and don't plan on doing so any time soon.
>
> And 64-but Windows supports 32-bit code perfectly for my needs.

Moreover, 32-bit Android systems remain widespread, and many people
build 32-bit Emacs binaries for more optimal memory utilization.  In
fact Android OEMs sometimes install 32-bit operating systems on 64-bit
capable hardware to optimize memory usage, and consequently, speaking of
the demise of 32-bit configurations is a pointless exercise.  And armv7
is not nearly so register-starved as 32-bit x86...

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 13:52         ` Pip Cet via Emacs development discussions.
  2024-12-08 14:52           ` Eli Zaretskii
@ 2024-12-09  1:08           ` Po Lu
  1 sibling, 0 replies; 137+ messages in thread
From: Po Lu @ 2024-12-09  1:08 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> The only platforms that "need" to use !USE_LSB are those that cannot
> guarantee 8-byte alignment for static objects, which is why I asked
> about those.  If those exist, we should have received bug reports
> indicating that !WIDE_EMACS_INT builds don't work on such platforms.
>
> In particular, WIDE_EMACS_INT shouldn't imply !USE_LSB.  That it
> currently does is a very questionable optimization at best (fixnum
> manipulation may be very slightly faster with !USE_LSB, but pointer
> manipulation will be slower and requires extra registers, which is an
> issue on i386).
>
> For example, NILP() would only need to look at a single 32-bit word for
> the WIDE_EMACS_INT + USE_LSB configuration. I strongly suspect that
> effect alone would make WIDE_EMACS_INT + USE_LSB faster than
> WIDE_EMACS_INT + !USE_LSB (of course, the relevant optimization would
> have to be made first).
>
> (Of course, WIDE_EMACS_INT is almost always a bad deal, anyway.  As far
> as I can tell, the justification for its continued existence is that
> some C code assumes buffer positions are fixnums (and, because we expose
> fixnum-ness to Lisp, some broken Lisp code might do that, too). If we
> had implemented fixnums to be transparent, we could simply remove
> WIDE_EMACS_INT, but that mistake has been made...)

Why is WIDE_EMACS_INT a bad deal?  Its effect is just as you describe:
it enables 32-bit systems to access files larger than the standard
fixnum limit on those systems.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
  2024-12-08 13:05       ` Eli Zaretskii
@ 2024-12-09  0:58       ` Po Lu
  2024-12-09  3:28         ` Eli Zaretskii
  2024-12-09  1:01       ` Po Lu
  2 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-12-09  0:58 UTC (permalink / raw)
  To: Pip Cet; +Cc: ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> But while we're talking about rare and unusual systems, !USE_LSB builds
> are currently broken except for the WIDE_EMACS_INT case, because the
> stack scanning code makes no attempt to remove MSB tags.  It may be time
> to simply remove MSB tag support, unless there are systems around that
> actually fail to align static objects to 8-byte boundaries (but such
> systems would have been broken for a while now).

Aren't the MS-DOS builds !USE_LSB?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  0:58       ` Po Lu
@ 2024-12-09  3:28         ` Eli Zaretskii
  0 siblings, 0 replies; 137+ messages in thread
From: Eli Zaretskii @ 2024-12-09  3:28 UTC (permalink / raw)
  To: Po Lu; +Cc: pipcet, ali_gnu2, emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Mon, 09 Dec 2024 08:58:32 +0800
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > But while we're talking about rare and unusual systems, !USE_LSB builds
> > are currently broken except for the WIDE_EMACS_INT case, because the
> > stack scanning code makes no attempt to remove MSB tags.  It may be time
> > to simply remove MSB tag support, unless there are systems around that
> > actually fail to align static objects to 8-byte boundaries (but such
> > systems would have been broken for a while now).
> 
> Aren't the MS-DOS builds !USE_LSB?

No.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
  2024-12-08 13:05       ` Eli Zaretskii
  2024-12-09  0:58       ` Po Lu
@ 2024-12-09  1:01       ` Po Lu
  2024-12-09 13:11         ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 137+ messages in thread
From: Po Lu @ 2024-12-09  1:01 UTC (permalink / raw)
  To: Pip Cet; +Cc: ali_gnu2, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> "Po Lu" <luangruo@yahoo.com> writes:
>
>> pdumper-dumped binaries appear to crash in an x86 Solaris 10 zone,
>> though I don't really use this configuration and I'm not interested in
>> trying the portable dumper on sparc:
>>
>> core 'core' of 7021:        ../../src/bootstrap-emacs -batch --no-site-file --no-site-lisp -f batc
>>  00007fffaf433dc2 ???????? ()
>>  00007fffaf5eb3d7 ???????? ()
>>  00007fffaf5ec590 ???????? ()
>>  00007fffae3f351a _lwp_kill () + a
>>  00007fffae3981b9 raise () + 19
>>  00000000008baf90 terminate_due_to_signal () + c0
>>  000000000090236e ???????? ()
>>  0000000000902334 deliver_thread_signal () + 74
>>  00000000009023b0 deliver_fatal_thread_signal () + 10
>>  00000000009024ef handle_sigsegv () + 4f
>>  00007fffae3edd16 __sighndlr () + 6
>>  00007fffae3e25e2 call_user_handler () + 252
>>  00007fffae3e280e sigacthandler () + ee
>>  00007fffaf5ea82d ???????? ()
>>  ffffffffffffffff ???????? ()
>>  00000000009c77e7 lisp_align_malloc () + 4d7
>>  00000000009c9dd2 make_float () + 42
>>  00000000009d2e9d init_alloc () + d
>>  00000000008bd373 main () + bb3
>>  00000000006d15ab ???????? ()
>
> FWIW, this issue doesn't appear to happen on a "fresh" Solaris 10
> install, in a qemu virtual machine, on x86.  I used the
> sol-10-u11-ga-x86-dvd.iso image, installed to a new disk, then installed
> OpenCSW and built Emacs from the master branch with and without
> CFLAGS="-m64" (plus the linker path selection). Both builds appear to
> work.

That's a very different configuration from a Solaris 10 zone, which is a
modern Solaris 11 kernel hosting a Solaris 10 userspace with a number of
compatibility libraries loaded into running processes.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: pdumper on Solaris 10
  2024-12-09  1:01       ` Po Lu
@ 2024-12-09 13:11         ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 137+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-09 13:11 UTC (permalink / raw)
  To: Po Lu; +Cc: ali_gnu2, emacs-devel

"Po Lu" <luangruo@yahoo.com> writes:

> That's a very different configuration from a Solaris 10 zone, which is a
> modern Solaris 11 kernel hosting a Solaris 10 userspace with a number of
> compatibility libraries loaded into running processes.

Thanks for explaining. My understanding is cfarm210 is a Solaris 10 zone
(but on sparc64), and the problem doesn't appear there, so it might be
specific to Solaris 10 zones on Solaris 11 on x86. I'm not sure whether
I can build such a zone to try reproducing it.

Pip




^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2024-12-18  9:24 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.39.1723910423.12184.emacs-devel@gnu.org>
2024-08-17 22:49 ` Emacs-devel Digest, Vol 246, Issue 17 ali_gnu2
2024-08-18  0:10   ` Po Lu
2024-08-18  0:19     ` Po Lu
2024-08-18  1:15     ` Solaris dldump (was: Pure space) ali_gnu2
2024-08-18  1:25       ` Solaris dldump Po Lu
2024-08-18 22:27         ` Stefan Kangas
2024-08-18 23:56           ` Po Lu
2024-08-19 11:18             ` Eli Zaretskii
2024-08-19 12:09               ` Po Lu
2024-08-19 12:50                 ` Eli Zaretskii
2024-08-19 11:44             ` Pip Cet
2024-08-19 11:57               ` Po Lu
2024-08-19 12:10                 ` Pip Cet
2024-08-19 12:55                   ` Eli Zaretskii
2024-08-19 13:46                     ` Pip Cet
2024-08-19 14:39                       ` Eli Zaretskii
2024-08-19 15:26                         ` Corwin Brust
2024-08-19 15:31                           ` Corwin Brust
2024-08-19 20:51                       ` Stefan Kangas
2024-08-19 20:35             ` Stefan Kangas
2024-12-08 12:17     ` pdumper on Solaris 10 Pip Cet via Emacs development discussions.
2024-12-08 13:05       ` Eli Zaretskii
2024-12-08 13:52         ` Pip Cet via Emacs development discussions.
2024-12-08 14:52           ` Eli Zaretskii
2024-12-08 16:17             ` Pip Cet via Emacs development discussions.
2024-12-08 16:49               ` Eli Zaretskii
2024-12-08 17:37                 ` Pip Cet via Emacs development discussions.
2024-12-08 18:41                   ` Eli Zaretskii
2024-12-08 19:15                     ` Gerd Möllmann
2024-12-08 20:38                       ` Eli Zaretskii
2024-12-09  3:09                         ` Gerd Möllmann
2024-12-09  3:32                           ` Eli Zaretskii
2024-12-09  3:43                             ` Gerd Möllmann
2024-12-09  4:53                               ` Stefan Kangas
2024-12-09  5:26                                 ` Gerd Möllmann
2024-12-09 13:58                                 ` Eli Zaretskii
2024-12-10  0:02                                   ` Po Lu
2024-12-09  9:56                           ` Pip Cet via Emacs development discussions.
2024-12-10  0:04                             ` Po Lu
2024-12-10  3:34                               ` Eli Zaretskii
2024-12-11  1:13                                 ` Po Lu
2024-12-11 11:29                                   ` Pip Cet via Emacs development discussions.
2024-12-09  4:59                     ` Stefan Kangas
2024-12-09 14:39                       ` Eli Zaretskii
2024-12-09 21:06                         ` Merging MPS a.k.a. scratch/igc, yet again Stefan Kangas
2024-12-09 21:49                           ` Óscar Fuentes
2024-12-10  4:17                             ` Xiyue Deng
2024-12-10  4:26                               ` Sean Whitton
2024-12-10  4:42                               ` chad
2024-12-10 13:10                               ` Óscar Fuentes
2024-12-10 15:10                                 ` Pip Cet via Emacs development discussions.
2024-12-10 15:37                                   ` Óscar Fuentes
2024-12-10 15:47                                     ` Pip Cet via Emacs development discussions.
2024-12-10 17:16                                     ` Eli Zaretskii
2024-12-12  4:37                                   ` Xiyue Deng
2024-12-10 13:20                               ` Eli Zaretskii
2024-12-10 14:46                               ` Pip Cet via Emacs development discussions.
2024-12-10 13:09                             ` Eli Zaretskii
2024-12-10 13:20                               ` Óscar Fuentes
2024-12-10 14:41                                 ` Eli Zaretskii
2024-12-09 23:13                           ` chad
2024-12-10 12:41                           ` Eli Zaretskii
2024-12-10  0:09                         ` pdumper on Solaris 10 Stefan Kangas
2024-12-10 12:59                           ` Eli Zaretskii
2024-12-10 13:39                             ` Óscar Fuentes
2024-12-10 14:39                               ` Eli Zaretskii
2024-12-10 15:21                                 ` Óscar Fuentes
2024-12-10 16:39                                   ` Eli Zaretskii
2024-12-10 15:38                               ` Pip Cet via Emacs development discussions.
2024-12-10 16:04                                 ` Óscar Fuentes
2024-12-10 17:23                                   ` Eli Zaretskii
2024-12-11  5:27                                 ` Gap buffer problem? Gerd Möllmann
2024-12-11  8:50                                   ` Pip Cet via Emacs development discussions.
2024-12-11  9:35                                     ` Gerd Möllmann
2024-12-11 11:50                                       ` Pip Cet via Emacs development discussions.
2024-12-11 13:22                                         ` Gerd Möllmann
2024-12-11 14:53                                           ` Pip Cet via Emacs development discussions.
2024-12-11 15:33                                             ` Gerd Möllmann
2024-12-11 16:58                                               ` Eli Zaretskii
2024-12-11 17:13                                                 ` Gerd Möllmann
2024-12-11 17:45                                                   ` Robert Pluim
2024-12-11 18:11                                                     ` Gerd Möllmann
2024-12-11 19:08                                                     ` Eli Zaretskii
2024-12-11 17:41                                                 ` Pip Cet via Emacs development discussions.
2024-12-11 19:04                                                   ` Eli Zaretskii
2024-12-11 19:54                                                     ` Pip Cet via Emacs development discussions.
2024-12-11 20:26                                                       ` Eli Zaretskii
2024-12-11 22:07                                                       ` Dmitry Gutov
2024-12-11 19:09                                                   ` Gerd Möllmann
2024-12-12  8:55                                                     ` Robert Pluim
2024-12-12 10:14                                                       ` Gerd Möllmann
2024-12-11 12:27                                       ` Pip Cet via Emacs development discussions.
2024-12-11 13:27                                         ` Gerd Möllmann
2024-12-11 15:06                                           ` Marcus Harnisch
2024-12-11 22:11                                             ` Dmitry Gutov
2024-12-12  3:49                                               ` Gerd Möllmann
2024-12-12 19:07                                                 ` Dmitry Gutov
2024-12-12 19:30                                                   ` Eli Zaretskii
2024-12-12 19:40                                                   ` Gerd Möllmann
2024-12-12  6:01                                               ` Eli Zaretskii
2024-12-11 14:22                                   ` Eli Zaretskii
2024-12-11 15:51                                     ` Gerd Möllmann
2024-12-11 17:06                                       ` Eli Zaretskii
2024-12-11 17:15                                         ` Gerd Möllmann
2024-12-10 18:13                               ` pdumper on Solaris 10 Gerd Möllmann
2024-12-10 15:23                             ` Pip Cet via Emacs development discussions.
2024-12-10 17:08                               ` Eli Zaretskii
2024-12-10 18:03                                 ` Gerd Möllmann
2024-12-10 19:34                                   ` Pip Cet via Emacs development discussions.
2024-12-10 19:59                                     ` Gerd Möllmann
2024-12-10 20:17                                       ` Pip Cet via Emacs development discussions.
2024-12-10 20:34                                         ` Gerd Möllmann
2024-12-11 14:13                                   ` Pip Cet via Emacs development discussions.
2024-12-11 17:43                                     ` Eli Zaretskii
2024-12-14 14:30                                     ` Eli Zaretskii
2024-12-15 10:55                                       ` Pip Cet via Emacs development discussions.
2024-12-15 11:13                                         ` Eli Zaretskii
2024-12-15 12:09                                           ` Pip Cet via Emacs development discussions.
2024-12-15 12:52                                             ` Eli Zaretskii
2024-12-15 19:54                                             ` John ff
2024-12-17 19:10                                           ` Paul Eggert
2024-12-17 19:43                                             ` Pip Cet via Emacs development discussions.
2024-12-17 20:00                                               ` Paul Eggert
2024-12-17 20:19                                             ` Eli Zaretskii
2024-12-17 21:14                                               ` Paul Eggert
2024-12-09 16:21                       ` Pip Cet via Emacs development discussions.
2024-12-17 13:12                     ` Pip Cet via Emacs development discussions.
2024-12-17 14:16                       ` Eli Zaretskii
2024-12-18  0:55                       ` Po Lu
2024-12-18  9:24                         ` Pip Cet via Emacs development discussions.
2024-12-08 18:47                 ` Pip Cet via Emacs development discussions.
2024-12-09  1:13                 ` Po Lu
2024-12-09  1:08           ` Po Lu
2024-12-09  0:58       ` Po Lu
2024-12-09  3:28         ` Eli Zaretskii
2024-12-09  1:01       ` Po Lu
2024-12-09 13:11         ` Pip Cet via Emacs development discussions.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).