Preferred approach to inclusion of data in ELPA package

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Preferred approach to inclusion of data in ELPA package
@ 2023-08-17 13:58 Hugo Thunnissen
  2023-08-17 21:14 ` Philip Kaludercic
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-17 13:58 UTC (permalink / raw)
  To: emacs-devel

Hi all,

For my package phpinspect.el I am now looking into the creation of an 
index of PHP's built in functions and classes. I'm considering different 
ways of distributing this dataset, but I'm not entirely sure what would 
be the preferred way. Finding out the signatures/properties of built in 
functions and classes is straightforward: I can generate valid PHP stubs 
for them which can then be parsed an indexed by my package just like any 
other PHP code. What I'm not sure about is what the best way would be to 
distribute this data. Options I'm considering are:

1. Distribute the stubs with the package and parse them each time **when 
the package is loaded**.

2. Parse and index the stubs, then serialize the resulting index into a 
gzipped lisp data file that is checked into version control, and is 
loaded **when the package is loaded**. (BTW, should such a .eld file be 
byte compiled for any reason?)

3. Parse and index the stubs, then serialize the resulting index 
**during compile time**. Either by generating lisp code using a macro, 
or by serializing the index into a .eld file. This guarantees the index 
staying up to date with the contents of the stub files whenever the 
package is compiled.

Some more info: I expect the initial dataset to be a file with about 
2000 stub functions and 200something stub classes, but it will grow as 
PHP grows and as phpinspect starts to cover more of PHP's features (for 
example, constant variables may also be included at some point in the 
near future, growing the index by a bit). I guesstimate that it would 
take less than 300ms to parse a set of files like that on most modern 
hardware, but I don't have the benchmarks to back that up yet.

I'm personally leaning towards option 3 and using a macro during compile 
time, but I could be nudged either way. Which approach would be 
preferable and why? Is there a common practice for things like this?

Thanks,

- Hugo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
@ 2023-08-17 21:14 ` Philip Kaludercic
  2023-08-19 11:26   ` Hugo Thunnissen
  2023-08-20 13:30 ` sbaugh
  2023-08-20 22:59 ` Dmitry Gutov
  2 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-17 21:14 UTC (permalink / raw)
  To: Hugo Thunnissen; +Cc: emacs-devel

Hugo Thunnissen <devel@hugot.nl> writes:

> Hi all,
>
> For my package phpinspect.el I am now looking into the creation of an
> index of PHP's built in functions and classes. I'm considering
> different ways of distributing this dataset, but I'm not entirely sure
> what would be the preferred way. Finding out the signatures/properties
> of built in functions and classes is straightforward: I can generate
> valid PHP stubs for them which can then be parsed an indexed by my
> package just like any other PHP code. What I'm not sure about is what
> the best way would be to distribute this data. Options I'm considering
> are:
>
> 1. Distribute the stubs with the package and parse them each time
> **when the package is loaded**.
>
> 2. Parse and index the stubs, then serialize the resulting index into
> a gzipped lisp data file that is checked into version control, and is
> loaded **when the package is loaded**. (BTW, should such a .eld file
> be byte compiled for any reason?)
>
> 3. Parse and index the stubs, then serialize the resulting index
> **during compile time**. Either by generating lisp code using a macro,
> or by serializing the index into a .eld file. This guarantees the
> index staying up to date with the contents of the stub files whenever
> the package is compiled.
>
> Some more info: I expect the initial dataset to be a file with about
> 2000 stub functions and 200something stub classes, but it will grow as
> PHP grows and as phpinspect starts to cover more of PHP's features
> (for example, constant variables may also be included at some point in
> the near future, growing the index by a bit). I guesstimate that it
> would take less than 300ms to parse a set of files like that on most
> modern hardware, but I don't have the benchmarks to back that up yet.
>
> I'm personally leaning towards option 3 and using a macro during
> compile time, but I could be nudged either way. Which approach would
> be preferable and why? Is there a common practice for things like
> this?

Another idea is to have a Makefile generate the file, like the one you
describe in option 2., that is generate whenever the package is built
and bundled into a tarball for distribution.  That way you don't have to
store a binary blob in your repository, and you can avoid burdening the
user with additional computations at either compile or runtime.

Does the generation require any special functionality/tools/code to be
provided on the device the index is generated on?

> Thanks,
>
> - Hugo



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-17 21:14 ` Philip Kaludercic
@ 2023-08-19 11:26   ` Hugo Thunnissen
  2023-08-19 15:51     ` Philip Kaludercic
  0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-19 11:26 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: emacs-devel

On 8/17/23 23:14, Philip Kaludercic wrote:
>
> Another idea is to have a Makefile generate the file, like the one you
> describe in option 2., that is generate whenever the package is built
> and bundled into a tarball for distribution.  That way you don't have to
> store a binary blob in your repository, and you can avoid burdening the
> user with additional computations at either compile or runtime.
>
> Does the generation require any special functionality/tools/code to be
> provided on the device the index is generated on?

The php function/class stubs are generated with a php script, but I'm 
checking the resulting stubs file into git. The index itself can be 
built with just my package based on the stubs file.

Some more context, as I built and bench-marked a prototype: The 
resulting index file is 3.1MB of s-expressions  which when compressed 
with gzip becomes a file of 172K (there's a lot of duplicate 
symbols/strings in there). Loading this file takes about 30% less time 
than building the index from scratch (300ms vs 430-450ms on my laptop 
with Core i5-8250U, byte compiled). I suppose this could be further 
optimized with a more efficient serialization format, but I don't want 
to spend much time on implementing that as I'm working towards an 
initial package release.

How would having a Makefile like you suggest work in practice? Would I 
need to request that the ELPA maintainers add my Makefile to the build 
process of my package somehow? Or is there a standard automated way to 
have Makefiles be executed during an ELPA build?

Also: If the former is the case, is the reduction in load time that this 
brings even significant enough to be worth the bother or should I just 
hold off on this while I look for a more efficient solution?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-19 11:26   ` Hugo Thunnissen
@ 2023-08-19 15:51     ` Philip Kaludercic
  2023-08-20 19:24       ` Hugo Thunnissen
  0 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-19 15:51 UTC (permalink / raw)
  To: Hugo Thunnissen; +Cc: emacs-devel

Hugo Thunnissen <devel@hugot.nl> writes:

> On 8/17/23 23:14, Philip Kaludercic wrote:
>>
>> Another idea is to have a Makefile generate the file, like the one you
>> describe in option 2., that is generate whenever the package is built
>> and bundled into a tarball for distribution.  That way you don't have to
>> store a binary blob in your repository, and you can avoid burdening the
>> user with additional computations at either compile or runtime.
>>
>> Does the generation require any special functionality/tools/code to be
>> provided on the device the index is generated on?
>
> The php function/class stubs are generated with a php script, but I'm
> checking the resulting stubs file into git. The index itself can be
> built with just my package based on the stubs file.

I saw that, and the commit did not look that nice, but I cannot say that
I have looked into the issue in sufficient detail to say with certainty
or not that there is no better solution.

> Some more context, as I built and bench-marked a prototype: The
> resulting index file is 3.1MB of s-expressions  which when compressed
> with gzip becomes a file of 172K (there's a lot of duplicate
> symbols/strings in there). Loading this file takes about 30% less time
> than building the index from scratch (300ms vs 430-450ms on my laptop
> with Core i5-8250U, byte compiled). I suppose this could be further
> optimized with a more efficient serialization format, but I don't want
> to spend much time on implementing that as I'm working towards an
> initial package release.
>
> How would having a Makefile like you suggest work in practice? Would I
> need to request that the ELPA maintainers add my Makefile to the build
> process of my package somehow? Or is there a standard automated way to
> have Makefiles be executed during an ELPA build?

An ELPA package specification can include :make and :shell-command
queries, that are executed on the ELPA build server, with restricted
permissions.  If you take a look at
https://git.savannah.gnu.org/cgit/emacs/elpa.git/tree/elpa-packages and
their respective repositories, you will find a few examples.

> Also: If the former is the case, is the reduction in load time that
> this brings even significant enough to be worth the bother or should I
> just hold off on this while I look for a more efficient solution?

I'd say it would be worth it, if the resulting package would be smaller
and would load quicker.  After all, the performance on your laptop might
not be that significant of a difference, while for someone else with an
older or slower device, a 30%-speedup is pretty significant.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
  2023-08-17 21:14 ` Philip Kaludercic
@ 2023-08-20 13:30 ` sbaugh
  2023-08-20 17:47   ` Hugo Thunnissen
  2023-08-20 22:59 ` Dmitry Gutov
  2 siblings, 1 reply; 12+ messages in thread
From: sbaugh @ 2023-08-20 13:30 UTC (permalink / raw)
  To: emacs-devel

Hugo Thunnissen <devel@hugot.nl> writes:
> Hi all,
>
> For my package phpinspect.el I am now looking into the creation of an
> index of PHP's built in functions and classes. I'm considering
> different ways of distributing this dataset, but I'm not entirely sure
> what would be the preferred way. Finding out the signatures/properties
> of built in functions and classes is straightforward: I can generate
> valid PHP stubs for them which can then be parsed an indexed by my
> package just like any other PHP code.

Isn't this data dependent on the version of PHP that the user is using
phpinspect.el with?  So distributing a single canonical "set of stubs"
would be inaccurate.

Is it possible to automatically generate even the set of stubs on the
user's computer, by running PHP?  Doing that operation on the user's
computer, and caching the output so that subsequent loads are fast,
seems like the best option to me.  Then you completely avoid this
problem of distributing data, and you also get behavior which reflects
the version of PHP the user is working with.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-20 13:30 ` sbaugh
@ 2023-08-20 17:47   ` Hugo Thunnissen
  2023-08-20 19:43     ` Hugo Thunnissen
  0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 17:47 UTC (permalink / raw)
  To: sbaugh, emacs-devel

On 8/20/23 15:30, sbaugh@catern.com wrote:
> Hugo Thunnissen<devel@hugot.nl>  writes:
>> Hi all,
>>
>> For my package phpinspect.el I am now looking into the creation of an
>> index of PHP's built in functions and classes. I'm considering
>> different ways of distributing this dataset, but I'm not entirely sure
>> what would be the preferred way. Finding out the signatures/properties
>> of built in functions and classes is straightforward: I can generate
>> valid PHP stubs for them which can then be parsed an indexed by my
>> package just like any other PHP code.
> Isn't this data dependent on the version of PHP that the user is using
> phpinspect.el with?  So distributing a single canonical "set of stubs"
> would be inaccurate.

It is, but I think providing stubs for the latest stable version is 
acceptable. PHP is backwards compatible to a very high degree, so even 
stubs sourced from a different PHP version than their own will be useful 
for people.

> Is it possible to automatically generate even the set of stubs on the
> user's computer, by running PHP?  Doing that operation on the user's
> computer, and caching the output so that subsequent loads are fast,
> seems like the best option to me.  Then you completely avoid this
> problem of distributing data, and you also get behavior which reflects
> the version of PHP the user is working with.
>
I want to avoid directly executing PHP in an automated fashion, as 
phpinspect currently does not in any way depend on PHP. And with good 
reason. It is not uncommon for people to run PHP:

- In containers

- In virtual machines (there's a variety of ready-to-go "LAMP/XAMPP" 
virtual machine environments for example)

- on remote systems with network mounted filesystems (nothing like going 
live the minute you hit save amirite? ;))

- Maybe, in a future where I get it working: via TRAMP

That being said, there is room for improvement.

One option I'm considering is to make it straightforward for users to 
generate/add their own stubs. This is probably a good idea regardless, 
as even if the user's PHP version matches that of the PHP installation 
that the stubs were sourced from, some users may be using lesser 
known/non-standard php extensions and may want to add stubs for them.

Generating the stubs is quite straightforward, so most of the work would 
be to wrap this process in a nice UI within Emacs. Preferrably one that 
allows the management of multiple different sets of stubs. To give an 
idea of the process, this is a section of the current Makefile:

```
./stubs/builtins.php: ./scripts/generate-builtin-stubs.php
     mkdir -p ./stubs/
     php ./scripts/generate-builtin-stubs.php > ./stubs/builtins.php

./data/builtin-stubs-index.eld.gz: ./stubs/builtins.php | ./.deps
     mkdir -p ./data/
     $(RUN_EMACS) -l phpinspect-cache -f phpinspect-dump-stub-index
```

In an ideal world, this feature should probably be paired with the 
ability to configure/detect the PHP version and extensions that a 
project requires. Projects that use composer usually state this in their 
composer.json file, but I'd have to decide whether I want to make 
composer a hard requirement for this feature or not.

Another possibility is to add support for PHPStorm's stubs 
(https://github.com/JetBrains/phpstorm-stub). Their solution has been to 
generate stubs for every version of PHP + lots of extensions and 
distribute them with their IDE (see git branches for every version). I 
think this is a little overkill of a solution though. And I don't like 
the idea of users having to download all of these stubs from somewhere 
just to make this feature of the package functional. Also, would the 
apache2 license of these stubs cause any licensing problems? I imagine 
it would be fine if the stubs are not distributed with the package, but 
IANAL.

I'm open to discussing better solutions and/or implementing them in the 
future. I'm currently working towards a first public ELPA release, so I 
might save large/complex improvements for a version after the first release.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-19 15:51     ` Philip Kaludercic
@ 2023-08-20 19:24       ` Hugo Thunnissen
  2023-08-20 20:42         ` Philip Kaludercic
  0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 19:24 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: emacs-devel


On 8/19/23 17:51, Philip Kaludercic wrote:
> Hugo Thunnissen <devel@hugot.nl> writes:
>
>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>> Another idea is to have a Makefile generate the file, like the one you
>>> describe in option 2., that is generate whenever the package is built
>>> and bundled into a tarball for distribution.  That way you don't have to
>>> store a binary blob in your repository, and you can avoid burdening the
>>> user with additional computations at either compile or runtime.
>>>
>>> Does the generation require any special functionality/tools/code to be
>>> provided on the device the index is generated on?
>> The php function/class stubs are generated with a php script, but I'm
>> checking the resulting stubs file into git. The index itself can be
>> built with just my package based on the stubs file.
> I saw that, and the commit did not look that nice, but I cannot say that
> I have looked into the issue in sufficient detail to say with certainty
> or not that there is no better solution.
>
There are alternatives or improvements to this approach, which I 
mentioned in my response to sbaug 
(https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html) . 
I don't think I'll be able to get around having to distribute-, or 
having the user download/generate some kind of index though.
>
>> Also: If the former is the case, is the reduction in load time that
>> this brings even significant enough to be worth the bother or should I
>> just hold off on this while I look for a more efficient solution?
> I'd say it would be worth it, if the resulting package would be smaller
> and would load quicker.  After all, the performance on your laptop might
> not be that significant of a difference, while for someone else with an
> older or slower device, a 30%-speedup is pretty significant.
>
You're right, on less performant systems it could make a more 
significant difference. And good news, after a few improvements the load 
time is now down to ~150ms on my laptop. I also chose to load the data 
when the mode is first initialized, so simply loading my package won't 
cause the index to be loaded with it. The dumping of the index is done 
automatically when it is not present, so it would technically be fine to 
just distribute the PHP stubs with the package instead of the .eld index 
file. This would just make the user wait a little longer the first time 
they use the mode.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-20 17:47   ` Hugo Thunnissen
@ 2023-08-20 19:43     ` Hugo Thunnissen
  0 siblings, 0 replies; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 19:43 UTC (permalink / raw)
  To: sbaugh, emacs-devel


>
> Another possibility is to add support for PHPStorm's stubs 
> (https://github.com/JetBrains/phpstorm-stub).
Whoops, that should be https://github.com/JetBrains/phpstorm-stubs



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-20 19:24       ` Hugo Thunnissen
@ 2023-08-20 20:42         ` Philip Kaludercic
  2023-08-29  8:41           ` Hugo Thunnissen
  0 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-20 20:42 UTC (permalink / raw)
  To: Hugo Thunnissen; +Cc: emacs-devel

Hugo Thunnissen <devel@hugot.nl> writes:

> On 8/19/23 17:51, Philip Kaludercic wrote:
>> Hugo Thunnissen <devel@hugot.nl> writes:
>>
>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>> Another idea is to have a Makefile generate the file, like the one you
>>>> describe in option 2., that is generate whenever the package is built
>>>> and bundled into a tarball for distribution.  That way you don't have to
>>>> store a binary blob in your repository, and you can avoid burdening the
>>>> user with additional computations at either compile or runtime.
>>>>
>>>> Does the generation require any special functionality/tools/code to be
>>>> provided on the device the index is generated on?
>>> The php function/class stubs are generated with a php script, but I'm
>>> checking the resulting stubs file into git. The index itself can be
>>> built with just my package based on the stubs file.
>> I saw that, and the commit did not look that nice, but I cannot say that
>> I have looked into the issue in sufficient detail to say with certainty
>> or not that there is no better solution.
>>
> There are alternatives or improvements to this approach, which I
> mentioned in my response to sbaug
> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
> . I don't think I'll be able to get around having to distribute-, or
> having the user download/generate some kind of index though.

If PHP were available in the ELPA build environment, do you think that
would change anything?

>>> Also: If the former is the case, is the reduction in load time that
>>> this brings even significant enough to be worth the bother or should I
>>> just hold off on this while I look for a more efficient solution?
>> I'd say it would be worth it, if the resulting package would be smaller
>> and would load quicker.  After all, the performance on your laptop might
>> not be that significant of a difference, while for someone else with an
>> older or slower device, a 30%-speedup is pretty significant.
>
> You're right, on less performant systems it could make a more
> significant difference. And good news, after a few improvements the
> load time is now down to ~150ms on my laptop. I also chose to load the
> data when the mode is first initialized, so simply loading my package
> won't cause the index to be loaded with it. The dumping of the index
> is done automatically when it is not present, so it would technically
> be fine to just distribute the PHP stubs with the package instead of
> the .eld index file. This would just make the user wait a little
> longer the first time they use the mode.

Would it be possible to load the information when it is required
(e.g. necessary for completion)?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
  2023-08-17 21:14 ` Philip Kaludercic
  2023-08-20 13:30 ` sbaugh
@ 2023-08-20 22:59 ` Dmitry Gutov
  2 siblings, 0 replies; 12+ messages in thread
From: Dmitry Gutov @ 2023-08-20 22:59 UTC (permalink / raw)
  To: Hugo Thunnissen, emacs-devel

On 17/08/2023 16:58, Hugo Thunnissen wrote:
> 1. Distribute the stubs with the package and parse them each time **when 
> the package is loaded**.
> 
> 2. Parse and index the stubs, then serialize the resulting index into a 
> gzipped lisp data file that is checked into version control, and is 
> loaded **when the package is loaded**. (BTW, should such a .eld file be 
> byte compiled for any reason?)
> 
> 3. Parse and index the stubs, then serialize the resulting index 
> **during compile time**. Either by generating lisp code using a macro, 
> or by serializing the index into a .eld file. This guarantees the index 
> staying up to date with the contents of the stub files whenever the 
> package is compiled.

How about parsing them lazily? When the package is loaded and the 
processed info isn't there, you parse the stubs and save the result to a 
file. Next time you see it exists and read/load it.

This can come with some "staleness" mechanics (do the stubs require 
frequent updates?), as well as the user's ability to delete them anyway 
and force to be generated anew.

Offhand, I don't see any downsides for this behavior: the installation 
we be faster by avoiding this step, the delay when the generation 
happens will be localized to the user's loading the mode (so they will 
be able to guess the reason easily), and it's not like generation during 
byte-compilation would proceed any faster?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-20 20:42         ` Philip Kaludercic
@ 2023-08-29  8:41           ` Hugo Thunnissen
  2023-08-30 19:27             ` Philip Kaludercic
  0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-29  8:41 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: emacs-devel

Philip Kaludercic <philipk@posteo.net> writes:

> Hugo Thunnissen <devel@hugot.nl> writes:
>
>> On 8/19/23 17:51, Philip Kaludercic wrote:
>>> Hugo Thunnissen <devel@hugot.nl> writes:
>>>
>>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>>> Another idea is to have a Makefile generate the file, like the one you
>>>>> describe in option 2., that is generate whenever the package is built
>>>>> and bundled into a tarball for distribution.  That way you don't have to
>>>>> store a binary blob in your repository, and you can avoid burdening the
>>>>> user with additional computations at either compile or runtime.
>>>>>
>>>>> Does the generation require any special functionality/tools/code to be
>>>>> provided on the device the index is generated on?
>>>> The php function/class stubs are generated with a php script, but I'm
>>>> checking the resulting stubs file into git. The index itself can be
>>>> built with just my package based on the stubs file.
>>> I saw that, and the commit did not look that nice, but I cannot say that
>>> I have looked into the issue in sufficient detail to say with certainty
>>> or not that there is no better solution.
>>>
>> There are alternatives or improvements to this approach, which I
>> mentioned in my response to sbaug
>> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
>> . I don't think I'll be able to get around having to distribute-, or
>> having the user download/generate some kind of index though.
>
> If PHP were available in the ELPA build environment, do you think that
> would change anything?
>

Theoretically, yes, as I would not have to check the stubs into version
control. Is this a realistic scenario though? To generate stubs, I
currently require a PHP installation with a whole slew of extensions
installed to be able to generate stubs for them. Currently, I only do
this for PHP 8.2, but I will probably end up having to do this for more
versions of PHP. That is a whole lot of packages that would have to be
present in ELPA's build environment. And the list of extensions is not
guaranteed to be static either.

My current idea is to create container images for each version of PHP
and generate stub files by executing the generation scripts in these
containers. To give you an idea of the list of required packages, this
is an example of the installation step in a Dockerfile I'm using:

RUN apt-get update && apt-get -y install \
    php8.2-memcached \
    php-redis \
    php8.2-bcmath \
    php8.2-bz2 \
    php8.2-cli \
    php8.2-common \
    php8.2-curl \
    php8.2-gmp \
    php8.2-intl \
    php-json \
    php8.2-mbstring \
    php8.2-mysql \
    php8.2-odbc \
    php8.2-opcache \
    php8.2-pgsql \
    php8.2-readline \
    php8.2-tidy \
    php8.2-xml \
    php8.2-xsl \
    php8.2-zip \
    php8.2-gd \
    php-bcmath \
    php-apcu \
    php-cli \
    php-imagick \
    php-intl \
    php-xdebug \
    php-amqp

>>>> Also: If the former is the case, is the reduction in load time that
>>>> this brings even significant enough to be worth the bother or should I
>>>> just hold off on this while I look for a more efficient solution?
>>> I'd say it would be worth it, if the resulting package would be smaller
>>> and would load quicker.  After all, the performance on your laptop might
>>> not be that significant of a difference, while for someone else with an
>>> older or slower device, a 30%-speedup is pretty significant.
>>
>> You're right, on less performant systems it could make a more
>> significant difference. And good news, after a few improvements the
>> load time is now down to ~150ms on my laptop. I also chose to load the
>> data when the mode is first initialized, so simply loading my package
>> won't cause the index to be loaded with it. The dumping of the index
>> is done automatically when it is not present, so it would technically
>> be fine to just distribute the PHP stubs with the package instead of
>> the .eld index file. This would just make the user wait a little
>> longer the first time they use the mode.
>
> Would it be possible to load the information when it is required
> (e.g. necessary for completion)?

This is a good idea. Classes within the project's scope are already
parsed/indexed on demand like that, I should apply this to stub classes
as well.

Global/native functions are another story though. Since they're not
namespaced, it's hard to break them into smaller loadable sets. Aside
from that, function completion generally requires the list of functions
to be loaded indiscriminately.  Luckily, the functions only make up a
little over 1/4th of the stubs (~2000 out of ~7300 lines).



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Preferred approach to inclusion of data in ELPA package
  2023-08-29  8:41           ` Hugo Thunnissen
@ 2023-08-30 19:27             ` Philip Kaludercic
  0 siblings, 0 replies; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-30 19:27 UTC (permalink / raw)
  To: Hugo Thunnissen; +Cc: emacs-devel

Hugo Thunnissen <devel@hugot.nl> writes:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> Hugo Thunnissen <devel@hugot.nl> writes:
>>
>>> On 8/19/23 17:51, Philip Kaludercic wrote:
>>>> Hugo Thunnissen <devel@hugot.nl> writes:
>>>>
>>>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>>>> Another idea is to have a Makefile generate the file, like the one you
>>>>>> describe in option 2., that is generate whenever the package is built
>>>>>> and bundled into a tarball for distribution.  That way you don't have to
>>>>>> store a binary blob in your repository, and you can avoid burdening the
>>>>>> user with additional computations at either compile or runtime.
>>>>>>
>>>>>> Does the generation require any special functionality/tools/code to be
>>>>>> provided on the device the index is generated on?
>>>>> The php function/class stubs are generated with a php script, but I'm
>>>>> checking the resulting stubs file into git. The index itself can be
>>>>> built with just my package based on the stubs file.
>>>> I saw that, and the commit did not look that nice, but I cannot say that
>>>> I have looked into the issue in sufficient detail to say with certainty
>>>> or not that there is no better solution.
>>>>
>>> There are alternatives or improvements to this approach, which I
>>> mentioned in my response to sbaug
>>> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
>>> . I don't think I'll be able to get around having to distribute-, or
>>> having the user download/generate some kind of index though.
>>
>> If PHP were available in the ELPA build environment, do you think that
>> would change anything?
>>
>
> Theoretically, yes, as I would not have to check the stubs into version
> control. Is this a realistic scenario though? To generate stubs, I
> currently require a PHP installation with a whole slew of extensions
> installed to be able to generate stubs for them. Currently, I only do
> this for PHP 8.2, but I will probably end up having to do this for more
> versions of PHP. That is a whole lot of packages that would have to be
> present in ELPA's build environment. And the list of extensions is not
> guaranteed to be static either.

In that case I had underestimated the effort.  While not perfect, it
seems the best approach for now might well be to track the generated
file in the Git repository :/

> My current idea is to create container images for each version of PHP
> and generate stub files by executing the generation scripts in these
> containers. To give you an idea of the list of required packages, this
> is an example of the installation step in a Dockerfile I'm using:
>
> RUN apt-get update && apt-get -y install \
>     php8.2-memcached \
>     php-redis \
>     php8.2-bcmath \
>     php8.2-bz2 \
>     php8.2-cli \
>     php8.2-common \
>     php8.2-curl \
>     php8.2-gmp \
>     php8.2-intl \
>     php-json \
>     php8.2-mbstring \
>     php8.2-mysql \
>     php8.2-odbc \
>     php8.2-opcache \
>     php8.2-pgsql \
>     php8.2-readline \
>     php8.2-tidy \
>     php8.2-xml \
>     php8.2-xsl \
>     php8.2-zip \
>     php8.2-gd \
>     php-bcmath \
>     php-apcu \
>     php-cli \
>     php-imagick \
>     php-intl \
>     php-xdebug \
>     php-amqp
>
>>>>> Also: If the former is the case, is the reduction in load time that
>>>>> this brings even significant enough to be worth the bother or should I
>>>>> just hold off on this while I look for a more efficient solution?
>>>> I'd say it would be worth it, if the resulting package would be smaller
>>>> and would load quicker.  After all, the performance on your laptop might
>>>> not be that significant of a difference, while for someone else with an
>>>> older or slower device, a 30%-speedup is pretty significant.
>>>
>>> You're right, on less performant systems it could make a more
>>> significant difference. And good news, after a few improvements the
>>> load time is now down to ~150ms on my laptop. I also chose to load the
>>> data when the mode is first initialized, so simply loading my package
>>> won't cause the index to be loaded with it. The dumping of the index
>>> is done automatically when it is not present, so it would technically
>>> be fine to just distribute the PHP stubs with the package instead of
>>> the .eld index file. This would just make the user wait a little
>>> longer the first time they use the mode.
>>
>> Would it be possible to load the information when it is required
>> (e.g. necessary for completion)?
>
> This is a good idea. Classes within the project's scope are already
> parsed/indexed on demand like that, I should apply this to stub classes
> as well.
>
> Global/native functions are another story though. Since they're not
> namespaced, it's hard to break them into smaller loadable sets. Aside
> from that, function completion generally requires the list of functions
> to be loaded indiscriminately.  Luckily, the functions only make up a
> little over 1/4th of the stubs (~2000 out of ~7300 lines).



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-30 19:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
2023-08-17 21:14 ` Philip Kaludercic
2023-08-19 11:26   ` Hugo Thunnissen
2023-08-19 15:51     ` Philip Kaludercic
2023-08-20 19:24       ` Hugo Thunnissen
2023-08-20 20:42         ` Philip Kaludercic
2023-08-29  8:41           ` Hugo Thunnissen
2023-08-30 19:27             ` Philip Kaludercic
2023-08-20 13:30 ` sbaugh
2023-08-20 17:47   ` Hugo Thunnissen
2023-08-20 19:43     ` Hugo Thunnissen
2023-08-20 22:59 ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).