* Preferred approach to inclusion of data in ELPA package
@ 2023-08-17 13:58 Hugo Thunnissen
2023-08-17 21:14 ` Philip Kaludercic
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-17 13:58 UTC (permalink / raw)
To: emacs-devel
Hi all,
For my package phpinspect.el I am now looking into the creation of an
index of PHP's built in functions and classes. I'm considering different
ways of distributing this dataset, but I'm not entirely sure what would
be the preferred way. Finding out the signatures/properties of built in
functions and classes is straightforward: I can generate valid PHP stubs
for them which can then be parsed an indexed by my package just like any
other PHP code. What I'm not sure about is what the best way would be to
distribute this data. Options I'm considering are:
1. Distribute the stubs with the package and parse them each time **when
the package is loaded**.
2. Parse and index the stubs, then serialize the resulting index into a
gzipped lisp data file that is checked into version control, and is
loaded **when the package is loaded**. (BTW, should such a .eld file be
byte compiled for any reason?)
3. Parse and index the stubs, then serialize the resulting index
**during compile time**. Either by generating lisp code using a macro,
or by serializing the index into a .eld file. This guarantees the index
staying up to date with the contents of the stub files whenever the
package is compiled.
Some more info: I expect the initial dataset to be a file with about
2000 stub functions and 200something stub classes, but it will grow as
PHP grows and as phpinspect starts to cover more of PHP's features (for
example, constant variables may also be included at some point in the
near future, growing the index by a bit). I guesstimate that it would
take less than 300ms to parse a set of files like that on most modern
hardware, but I don't have the benchmarks to back that up yet.
I'm personally leaning towards option 3 and using a macro during compile
time, but I could be nudged either way. Which approach would be
preferable and why? Is there a common practice for things like this?
Thanks,
- Hugo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
@ 2023-08-17 21:14 ` Philip Kaludercic
2023-08-19 11:26 ` Hugo Thunnissen
2023-08-20 13:30 ` sbaugh
2023-08-20 22:59 ` Dmitry Gutov
2 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-17 21:14 UTC (permalink / raw)
To: Hugo Thunnissen; +Cc: emacs-devel
Hugo Thunnissen <devel@hugot.nl> writes:
> Hi all,
>
> For my package phpinspect.el I am now looking into the creation of an
> index of PHP's built in functions and classes. I'm considering
> different ways of distributing this dataset, but I'm not entirely sure
> what would be the preferred way. Finding out the signatures/properties
> of built in functions and classes is straightforward: I can generate
> valid PHP stubs for them which can then be parsed an indexed by my
> package just like any other PHP code. What I'm not sure about is what
> the best way would be to distribute this data. Options I'm considering
> are:
>
> 1. Distribute the stubs with the package and parse them each time
> **when the package is loaded**.
>
> 2. Parse and index the stubs, then serialize the resulting index into
> a gzipped lisp data file that is checked into version control, and is
> loaded **when the package is loaded**. (BTW, should such a .eld file
> be byte compiled for any reason?)
>
> 3. Parse and index the stubs, then serialize the resulting index
> **during compile time**. Either by generating lisp code using a macro,
> or by serializing the index into a .eld file. This guarantees the
> index staying up to date with the contents of the stub files whenever
> the package is compiled.
>
> Some more info: I expect the initial dataset to be a file with about
> 2000 stub functions and 200something stub classes, but it will grow as
> PHP grows and as phpinspect starts to cover more of PHP's features
> (for example, constant variables may also be included at some point in
> the near future, growing the index by a bit). I guesstimate that it
> would take less than 300ms to parse a set of files like that on most
> modern hardware, but I don't have the benchmarks to back that up yet.
>
> I'm personally leaning towards option 3 and using a macro during
> compile time, but I could be nudged either way. Which approach would
> be preferable and why? Is there a common practice for things like
> this?
Another idea is to have a Makefile generate the file, like the one you
describe in option 2., that is generate whenever the package is built
and bundled into a tarball for distribution. That way you don't have to
store a binary blob in your repository, and you can avoid burdening the
user with additional computations at either compile or runtime.
Does the generation require any special functionality/tools/code to be
provided on the device the index is generated on?
> Thanks,
>
> - Hugo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-17 21:14 ` Philip Kaludercic
@ 2023-08-19 11:26 ` Hugo Thunnissen
2023-08-19 15:51 ` Philip Kaludercic
0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-19 11:26 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: emacs-devel
On 8/17/23 23:14, Philip Kaludercic wrote:
>
> Another idea is to have a Makefile generate the file, like the one you
> describe in option 2., that is generate whenever the package is built
> and bundled into a tarball for distribution. That way you don't have to
> store a binary blob in your repository, and you can avoid burdening the
> user with additional computations at either compile or runtime.
>
> Does the generation require any special functionality/tools/code to be
> provided on the device the index is generated on?
The php function/class stubs are generated with a php script, but I'm
checking the resulting stubs file into git. The index itself can be
built with just my package based on the stubs file.
Some more context, as I built and bench-marked a prototype: The
resulting index file is 3.1MB of s-expressions which when compressed
with gzip becomes a file of 172K (there's a lot of duplicate
symbols/strings in there). Loading this file takes about 30% less time
than building the index from scratch (300ms vs 430-450ms on my laptop
with Core i5-8250U, byte compiled). I suppose this could be further
optimized with a more efficient serialization format, but I don't want
to spend much time on implementing that as I'm working towards an
initial package release.
How would having a Makefile like you suggest work in practice? Would I
need to request that the ELPA maintainers add my Makefile to the build
process of my package somehow? Or is there a standard automated way to
have Makefiles be executed during an ELPA build?
Also: If the former is the case, is the reduction in load time that this
brings even significant enough to be worth the bother or should I just
hold off on this while I look for a more efficient solution?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-19 11:26 ` Hugo Thunnissen
@ 2023-08-19 15:51 ` Philip Kaludercic
2023-08-20 19:24 ` Hugo Thunnissen
0 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-19 15:51 UTC (permalink / raw)
To: Hugo Thunnissen; +Cc: emacs-devel
Hugo Thunnissen <devel@hugot.nl> writes:
> On 8/17/23 23:14, Philip Kaludercic wrote:
>>
>> Another idea is to have a Makefile generate the file, like the one you
>> describe in option 2., that is generate whenever the package is built
>> and bundled into a tarball for distribution. That way you don't have to
>> store a binary blob in your repository, and you can avoid burdening the
>> user with additional computations at either compile or runtime.
>>
>> Does the generation require any special functionality/tools/code to be
>> provided on the device the index is generated on?
>
> The php function/class stubs are generated with a php script, but I'm
> checking the resulting stubs file into git. The index itself can be
> built with just my package based on the stubs file.
I saw that, and the commit did not look that nice, but I cannot say that
I have looked into the issue in sufficient detail to say with certainty
or not that there is no better solution.
> Some more context, as I built and bench-marked a prototype: The
> resulting index file is 3.1MB of s-expressions which when compressed
> with gzip becomes a file of 172K (there's a lot of duplicate
> symbols/strings in there). Loading this file takes about 30% less time
> than building the index from scratch (300ms vs 430-450ms on my laptop
> with Core i5-8250U, byte compiled). I suppose this could be further
> optimized with a more efficient serialization format, but I don't want
> to spend much time on implementing that as I'm working towards an
> initial package release.
>
> How would having a Makefile like you suggest work in practice? Would I
> need to request that the ELPA maintainers add my Makefile to the build
> process of my package somehow? Or is there a standard automated way to
> have Makefiles be executed during an ELPA build?
An ELPA package specification can include :make and :shell-command
queries, that are executed on the ELPA build server, with restricted
permissions. If you take a look at
https://git.savannah.gnu.org/cgit/emacs/elpa.git/tree/elpa-packages and
their respective repositories, you will find a few examples.
> Also: If the former is the case, is the reduction in load time that
> this brings even significant enough to be worth the bother or should I
> just hold off on this while I look for a more efficient solution?
I'd say it would be worth it, if the resulting package would be smaller
and would load quicker. After all, the performance on your laptop might
not be that significant of a difference, while for someone else with an
older or slower device, a 30%-speedup is pretty significant.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
2023-08-17 21:14 ` Philip Kaludercic
@ 2023-08-20 13:30 ` sbaugh
2023-08-20 17:47 ` Hugo Thunnissen
2023-08-20 22:59 ` Dmitry Gutov
2 siblings, 1 reply; 12+ messages in thread
From: sbaugh @ 2023-08-20 13:30 UTC (permalink / raw)
To: emacs-devel
Hugo Thunnissen <devel@hugot.nl> writes:
> Hi all,
>
> For my package phpinspect.el I am now looking into the creation of an
> index of PHP's built in functions and classes. I'm considering
> different ways of distributing this dataset, but I'm not entirely sure
> what would be the preferred way. Finding out the signatures/properties
> of built in functions and classes is straightforward: I can generate
> valid PHP stubs for them which can then be parsed an indexed by my
> package just like any other PHP code.
Isn't this data dependent on the version of PHP that the user is using
phpinspect.el with? So distributing a single canonical "set of stubs"
would be inaccurate.
Is it possible to automatically generate even the set of stubs on the
user's computer, by running PHP? Doing that operation on the user's
computer, and caching the output so that subsequent loads are fast,
seems like the best option to me. Then you completely avoid this
problem of distributing data, and you also get behavior which reflects
the version of PHP the user is working with.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-20 13:30 ` sbaugh
@ 2023-08-20 17:47 ` Hugo Thunnissen
2023-08-20 19:43 ` Hugo Thunnissen
0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 17:47 UTC (permalink / raw)
To: sbaugh, emacs-devel
On 8/20/23 15:30, sbaugh@catern.com wrote:
> Hugo Thunnissen<devel@hugot.nl> writes:
>> Hi all,
>>
>> For my package phpinspect.el I am now looking into the creation of an
>> index of PHP's built in functions and classes. I'm considering
>> different ways of distributing this dataset, but I'm not entirely sure
>> what would be the preferred way. Finding out the signatures/properties
>> of built in functions and classes is straightforward: I can generate
>> valid PHP stubs for them which can then be parsed an indexed by my
>> package just like any other PHP code.
> Isn't this data dependent on the version of PHP that the user is using
> phpinspect.el with? So distributing a single canonical "set of stubs"
> would be inaccurate.
It is, but I think providing stubs for the latest stable version is
acceptable. PHP is backwards compatible to a very high degree, so even
stubs sourced from a different PHP version than their own will be useful
for people.
> Is it possible to automatically generate even the set of stubs on the
> user's computer, by running PHP? Doing that operation on the user's
> computer, and caching the output so that subsequent loads are fast,
> seems like the best option to me. Then you completely avoid this
> problem of distributing data, and you also get behavior which reflects
> the version of PHP the user is working with.
>
I want to avoid directly executing PHP in an automated fashion, as
phpinspect currently does not in any way depend on PHP. And with good
reason. It is not uncommon for people to run PHP:
- In containers
- In virtual machines (there's a variety of ready-to-go "LAMP/XAMPP"
virtual machine environments for example)
- on remote systems with network mounted filesystems (nothing like going
live the minute you hit save amirite? ;))
- Maybe, in a future where I get it working: via TRAMP
That being said, there is room for improvement.
One option I'm considering is to make it straightforward for users to
generate/add their own stubs. This is probably a good idea regardless,
as even if the user's PHP version matches that of the PHP installation
that the stubs were sourced from, some users may be using lesser
known/non-standard php extensions and may want to add stubs for them.
Generating the stubs is quite straightforward, so most of the work would
be to wrap this process in a nice UI within Emacs. Preferrably one that
allows the management of multiple different sets of stubs. To give an
idea of the process, this is a section of the current Makefile:
```
./stubs/builtins.php: ./scripts/generate-builtin-stubs.php
mkdir -p ./stubs/
php ./scripts/generate-builtin-stubs.php > ./stubs/builtins.php
./data/builtin-stubs-index.eld.gz: ./stubs/builtins.php | ./.deps
mkdir -p ./data/
$(RUN_EMACS) -l phpinspect-cache -f phpinspect-dump-stub-index
```
In an ideal world, this feature should probably be paired with the
ability to configure/detect the PHP version and extensions that a
project requires. Projects that use composer usually state this in their
composer.json file, but I'd have to decide whether I want to make
composer a hard requirement for this feature or not.
Another possibility is to add support for PHPStorm's stubs
(https://github.com/JetBrains/phpstorm-stub). Their solution has been to
generate stubs for every version of PHP + lots of extensions and
distribute them with their IDE (see git branches for every version). I
think this is a little overkill of a solution though. And I don't like
the idea of users having to download all of these stubs from somewhere
just to make this feature of the package functional. Also, would the
apache2 license of these stubs cause any licensing problems? I imagine
it would be fine if the stubs are not distributed with the package, but
IANAL.
I'm open to discussing better solutions and/or implementing them in the
future. I'm currently working towards a first public ELPA release, so I
might save large/complex improvements for a version after the first release.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-19 15:51 ` Philip Kaludercic
@ 2023-08-20 19:24 ` Hugo Thunnissen
2023-08-20 20:42 ` Philip Kaludercic
0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 19:24 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: emacs-devel
On 8/19/23 17:51, Philip Kaludercic wrote:
> Hugo Thunnissen <devel@hugot.nl> writes:
>
>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>> Another idea is to have a Makefile generate the file, like the one you
>>> describe in option 2., that is generate whenever the package is built
>>> and bundled into a tarball for distribution. That way you don't have to
>>> store a binary blob in your repository, and you can avoid burdening the
>>> user with additional computations at either compile or runtime.
>>>
>>> Does the generation require any special functionality/tools/code to be
>>> provided on the device the index is generated on?
>> The php function/class stubs are generated with a php script, but I'm
>> checking the resulting stubs file into git. The index itself can be
>> built with just my package based on the stubs file.
> I saw that, and the commit did not look that nice, but I cannot say that
> I have looked into the issue in sufficient detail to say with certainty
> or not that there is no better solution.
>
There are alternatives or improvements to this approach, which I
mentioned in my response to sbaug
(https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html) .
I don't think I'll be able to get around having to distribute-, or
having the user download/generate some kind of index though.
>
>> Also: If the former is the case, is the reduction in load time that
>> this brings even significant enough to be worth the bother or should I
>> just hold off on this while I look for a more efficient solution?
> I'd say it would be worth it, if the resulting package would be smaller
> and would load quicker. After all, the performance on your laptop might
> not be that significant of a difference, while for someone else with an
> older or slower device, a 30%-speedup is pretty significant.
>
You're right, on less performant systems it could make a more
significant difference. And good news, after a few improvements the load
time is now down to ~150ms on my laptop. I also chose to load the data
when the mode is first initialized, so simply loading my package won't
cause the index to be loaded with it. The dumping of the index is done
automatically when it is not present, so it would technically be fine to
just distribute the PHP stubs with the package instead of the .eld index
file. This would just make the user wait a little longer the first time
they use the mode.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-20 17:47 ` Hugo Thunnissen
@ 2023-08-20 19:43 ` Hugo Thunnissen
0 siblings, 0 replies; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-20 19:43 UTC (permalink / raw)
To: sbaugh, emacs-devel
>
> Another possibility is to add support for PHPStorm's stubs
> (https://github.com/JetBrains/phpstorm-stub).
Whoops, that should be https://github.com/JetBrains/phpstorm-stubs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-20 19:24 ` Hugo Thunnissen
@ 2023-08-20 20:42 ` Philip Kaludercic
2023-08-29 8:41 ` Hugo Thunnissen
0 siblings, 1 reply; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-20 20:42 UTC (permalink / raw)
To: Hugo Thunnissen; +Cc: emacs-devel
Hugo Thunnissen <devel@hugot.nl> writes:
> On 8/19/23 17:51, Philip Kaludercic wrote:
>> Hugo Thunnissen <devel@hugot.nl> writes:
>>
>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>> Another idea is to have a Makefile generate the file, like the one you
>>>> describe in option 2., that is generate whenever the package is built
>>>> and bundled into a tarball for distribution. That way you don't have to
>>>> store a binary blob in your repository, and you can avoid burdening the
>>>> user with additional computations at either compile or runtime.
>>>>
>>>> Does the generation require any special functionality/tools/code to be
>>>> provided on the device the index is generated on?
>>> The php function/class stubs are generated with a php script, but I'm
>>> checking the resulting stubs file into git. The index itself can be
>>> built with just my package based on the stubs file.
>> I saw that, and the commit did not look that nice, but I cannot say that
>> I have looked into the issue in sufficient detail to say with certainty
>> or not that there is no better solution.
>>
> There are alternatives or improvements to this approach, which I
> mentioned in my response to sbaug
> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
> . I don't think I'll be able to get around having to distribute-, or
> having the user download/generate some kind of index though.
If PHP were available in the ELPA build environment, do you think that
would change anything?
>>> Also: If the former is the case, is the reduction in load time that
>>> this brings even significant enough to be worth the bother or should I
>>> just hold off on this while I look for a more efficient solution?
>> I'd say it would be worth it, if the resulting package would be smaller
>> and would load quicker. After all, the performance on your laptop might
>> not be that significant of a difference, while for someone else with an
>> older or slower device, a 30%-speedup is pretty significant.
>
> You're right, on less performant systems it could make a more
> significant difference. And good news, after a few improvements the
> load time is now down to ~150ms on my laptop. I also chose to load the
> data when the mode is first initialized, so simply loading my package
> won't cause the index to be loaded with it. The dumping of the index
> is done automatically when it is not present, so it would technically
> be fine to just distribute the PHP stubs with the package instead of
> the .eld index file. This would just make the user wait a little
> longer the first time they use the mode.
Would it be possible to load the information when it is required
(e.g. necessary for completion)?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
2023-08-17 21:14 ` Philip Kaludercic
2023-08-20 13:30 ` sbaugh
@ 2023-08-20 22:59 ` Dmitry Gutov
2 siblings, 0 replies; 12+ messages in thread
From: Dmitry Gutov @ 2023-08-20 22:59 UTC (permalink / raw)
To: Hugo Thunnissen, emacs-devel
On 17/08/2023 16:58, Hugo Thunnissen wrote:
> 1. Distribute the stubs with the package and parse them each time **when
> the package is loaded**.
>
> 2. Parse and index the stubs, then serialize the resulting index into a
> gzipped lisp data file that is checked into version control, and is
> loaded **when the package is loaded**. (BTW, should such a .eld file be
> byte compiled for any reason?)
>
> 3. Parse and index the stubs, then serialize the resulting index
> **during compile time**. Either by generating lisp code using a macro,
> or by serializing the index into a .eld file. This guarantees the index
> staying up to date with the contents of the stub files whenever the
> package is compiled.
How about parsing them lazily? When the package is loaded and the
processed info isn't there, you parse the stubs and save the result to a
file. Next time you see it exists and read/load it.
This can come with some "staleness" mechanics (do the stubs require
frequent updates?), as well as the user's ability to delete them anyway
and force to be generated anew.
Offhand, I don't see any downsides for this behavior: the installation
we be faster by avoiding this step, the delay when the generation
happens will be localized to the user's loading the mode (so they will
be able to guess the reason easily), and it's not like generation during
byte-compilation would proceed any faster?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-20 20:42 ` Philip Kaludercic
@ 2023-08-29 8:41 ` Hugo Thunnissen
2023-08-30 19:27 ` Philip Kaludercic
0 siblings, 1 reply; 12+ messages in thread
From: Hugo Thunnissen @ 2023-08-29 8:41 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: emacs-devel
Philip Kaludercic <philipk@posteo.net> writes:
> Hugo Thunnissen <devel@hugot.nl> writes:
>
>> On 8/19/23 17:51, Philip Kaludercic wrote:
>>> Hugo Thunnissen <devel@hugot.nl> writes:
>>>
>>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>>> Another idea is to have a Makefile generate the file, like the one you
>>>>> describe in option 2., that is generate whenever the package is built
>>>>> and bundled into a tarball for distribution. That way you don't have to
>>>>> store a binary blob in your repository, and you can avoid burdening the
>>>>> user with additional computations at either compile or runtime.
>>>>>
>>>>> Does the generation require any special functionality/tools/code to be
>>>>> provided on the device the index is generated on?
>>>> The php function/class stubs are generated with a php script, but I'm
>>>> checking the resulting stubs file into git. The index itself can be
>>>> built with just my package based on the stubs file.
>>> I saw that, and the commit did not look that nice, but I cannot say that
>>> I have looked into the issue in sufficient detail to say with certainty
>>> or not that there is no better solution.
>>>
>> There are alternatives or improvements to this approach, which I
>> mentioned in my response to sbaug
>> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
>> . I don't think I'll be able to get around having to distribute-, or
>> having the user download/generate some kind of index though.
>
> If PHP were available in the ELPA build environment, do you think that
> would change anything?
>
Theoretically, yes, as I would not have to check the stubs into version
control. Is this a realistic scenario though? To generate stubs, I
currently require a PHP installation with a whole slew of extensions
installed to be able to generate stubs for them. Currently, I only do
this for PHP 8.2, but I will probably end up having to do this for more
versions of PHP. That is a whole lot of packages that would have to be
present in ELPA's build environment. And the list of extensions is not
guaranteed to be static either.
My current idea is to create container images for each version of PHP
and generate stub files by executing the generation scripts in these
containers. To give you an idea of the list of required packages, this
is an example of the installation step in a Dockerfile I'm using:
RUN apt-get update && apt-get -y install \
php8.2-memcached \
php-redis \
php8.2-bcmath \
php8.2-bz2 \
php8.2-cli \
php8.2-common \
php8.2-curl \
php8.2-gmp \
php8.2-intl \
php-json \
php8.2-mbstring \
php8.2-mysql \
php8.2-odbc \
php8.2-opcache \
php8.2-pgsql \
php8.2-readline \
php8.2-tidy \
php8.2-xml \
php8.2-xsl \
php8.2-zip \
php8.2-gd \
php-bcmath \
php-apcu \
php-cli \
php-imagick \
php-intl \
php-xdebug \
php-amqp
>>>> Also: If the former is the case, is the reduction in load time that
>>>> this brings even significant enough to be worth the bother or should I
>>>> just hold off on this while I look for a more efficient solution?
>>> I'd say it would be worth it, if the resulting package would be smaller
>>> and would load quicker. After all, the performance on your laptop might
>>> not be that significant of a difference, while for someone else with an
>>> older or slower device, a 30%-speedup is pretty significant.
>>
>> You're right, on less performant systems it could make a more
>> significant difference. And good news, after a few improvements the
>> load time is now down to ~150ms on my laptop. I also chose to load the
>> data when the mode is first initialized, so simply loading my package
>> won't cause the index to be loaded with it. The dumping of the index
>> is done automatically when it is not present, so it would technically
>> be fine to just distribute the PHP stubs with the package instead of
>> the .eld index file. This would just make the user wait a little
>> longer the first time they use the mode.
>
> Would it be possible to load the information when it is required
> (e.g. necessary for completion)?
This is a good idea. Classes within the project's scope are already
parsed/indexed on demand like that, I should apply this to stub classes
as well.
Global/native functions are another story though. Since they're not
namespaced, it's hard to break them into smaller loadable sets. Aside
from that, function completion generally requires the list of functions
to be loaded indiscriminately. Luckily, the functions only make up a
little over 1/4th of the stubs (~2000 out of ~7300 lines).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Preferred approach to inclusion of data in ELPA package
2023-08-29 8:41 ` Hugo Thunnissen
@ 2023-08-30 19:27 ` Philip Kaludercic
0 siblings, 0 replies; 12+ messages in thread
From: Philip Kaludercic @ 2023-08-30 19:27 UTC (permalink / raw)
To: Hugo Thunnissen; +Cc: emacs-devel
Hugo Thunnissen <devel@hugot.nl> writes:
> Philip Kaludercic <philipk@posteo.net> writes:
>
>> Hugo Thunnissen <devel@hugot.nl> writes:
>>
>>> On 8/19/23 17:51, Philip Kaludercic wrote:
>>>> Hugo Thunnissen <devel@hugot.nl> writes:
>>>>
>>>>> On 8/17/23 23:14, Philip Kaludercic wrote:
>>>>>> Another idea is to have a Makefile generate the file, like the one you
>>>>>> describe in option 2., that is generate whenever the package is built
>>>>>> and bundled into a tarball for distribution. That way you don't have to
>>>>>> store a binary blob in your repository, and you can avoid burdening the
>>>>>> user with additional computations at either compile or runtime.
>>>>>>
>>>>>> Does the generation require any special functionality/tools/code to be
>>>>>> provided on the device the index is generated on?
>>>>> The php function/class stubs are generated with a php script, but I'm
>>>>> checking the resulting stubs file into git. The index itself can be
>>>>> built with just my package based on the stubs file.
>>>> I saw that, and the commit did not look that nice, but I cannot say that
>>>> I have looked into the issue in sufficient detail to say with certainty
>>>> or not that there is no better solution.
>>>>
>>> There are alternatives or improvements to this approach, which I
>>> mentioned in my response to sbaug
>>> (https://lists.gnu.org/archive/html/emacs-devel/2023-08/msg00748.html)
>>> . I don't think I'll be able to get around having to distribute-, or
>>> having the user download/generate some kind of index though.
>>
>> If PHP were available in the ELPA build environment, do you think that
>> would change anything?
>>
>
> Theoretically, yes, as I would not have to check the stubs into version
> control. Is this a realistic scenario though? To generate stubs, I
> currently require a PHP installation with a whole slew of extensions
> installed to be able to generate stubs for them. Currently, I only do
> this for PHP 8.2, but I will probably end up having to do this for more
> versions of PHP. That is a whole lot of packages that would have to be
> present in ELPA's build environment. And the list of extensions is not
> guaranteed to be static either.
In that case I had underestimated the effort. While not perfect, it
seems the best approach for now might well be to track the generated
file in the Git repository :/
> My current idea is to create container images for each version of PHP
> and generate stub files by executing the generation scripts in these
> containers. To give you an idea of the list of required packages, this
> is an example of the installation step in a Dockerfile I'm using:
>
> RUN apt-get update && apt-get -y install \
> php8.2-memcached \
> php-redis \
> php8.2-bcmath \
> php8.2-bz2 \
> php8.2-cli \
> php8.2-common \
> php8.2-curl \
> php8.2-gmp \
> php8.2-intl \
> php-json \
> php8.2-mbstring \
> php8.2-mysql \
> php8.2-odbc \
> php8.2-opcache \
> php8.2-pgsql \
> php8.2-readline \
> php8.2-tidy \
> php8.2-xml \
> php8.2-xsl \
> php8.2-zip \
> php8.2-gd \
> php-bcmath \
> php-apcu \
> php-cli \
> php-imagick \
> php-intl \
> php-xdebug \
> php-amqp
>
>>>>> Also: If the former is the case, is the reduction in load time that
>>>>> this brings even significant enough to be worth the bother or should I
>>>>> just hold off on this while I look for a more efficient solution?
>>>> I'd say it would be worth it, if the resulting package would be smaller
>>>> and would load quicker. After all, the performance on your laptop might
>>>> not be that significant of a difference, while for someone else with an
>>>> older or slower device, a 30%-speedup is pretty significant.
>>>
>>> You're right, on less performant systems it could make a more
>>> significant difference. And good news, after a few improvements the
>>> load time is now down to ~150ms on my laptop. I also chose to load the
>>> data when the mode is first initialized, so simply loading my package
>>> won't cause the index to be loaded with it. The dumping of the index
>>> is done automatically when it is not present, so it would technically
>>> be fine to just distribute the PHP stubs with the package instead of
>>> the .eld index file. This would just make the user wait a little
>>> longer the first time they use the mode.
>>
>> Would it be possible to load the information when it is required
>> (e.g. necessary for completion)?
>
> This is a good idea. Classes within the project's scope are already
> parsed/indexed on demand like that, I should apply this to stub classes
> as well.
>
> Global/native functions are another story though. Since they're not
> namespaced, it's hard to break them into smaller loadable sets. Aside
> from that, function completion generally requires the list of functions
> to be loaded indiscriminately. Luckily, the functions only make up a
> little over 1/4th of the stubs (~2000 out of ~7300 lines).
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-08-30 19:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 13:58 Preferred approach to inclusion of data in ELPA package Hugo Thunnissen
2023-08-17 21:14 ` Philip Kaludercic
2023-08-19 11:26 ` Hugo Thunnissen
2023-08-19 15:51 ` Philip Kaludercic
2023-08-20 19:24 ` Hugo Thunnissen
2023-08-20 20:42 ` Philip Kaludercic
2023-08-29 8:41 ` Hugo Thunnissen
2023-08-30 19:27 ` Philip Kaludercic
2023-08-20 13:30 ` sbaugh
2023-08-20 17:47 ` Hugo Thunnissen
2023-08-20 19:43 ` Hugo Thunnissen
2023-08-20 22:59 ` Dmitry Gutov
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).