emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* multipage html output
@ 2024-07-03  9:44 Orm Finnendahl
  2024-07-03 10:33 ` Dr. Arne Babenhauserheide
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-03  9:44 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

 after my clunky publishing chain from org to gitbook with multipage
page output broke down recently I finally decided to tackle adding an
export backend for multipage html output to org-export.

It is done now and mainly working. The backend uses all the
funcionality of the ox html exporter, only slightly modifying the code
in places where it is necessary for multipage output. In addition I
tried to make it as general, as possible to enable adding other
multipage backends (like for md output) easily.

Before sharing it I thought it might be a good idea to think about
integrating it properly/officially into org. I would be willing to
provide the code, docs, patches, etc.

There are a couple of decisions to make (should it be integrated as an
option into the html output backend or should it be a separate backend
altogether?  What options concerning footnotes, toc, etc. should be
provided?  etc...) and this mail is basically asking about how to
proceed.

My questions:

- Is there widespread interest to fully integrate it into org mode?

- If so, whom should I contact, or is it expected that I just go ahead
  and supply merge requests?

I'm a bit hesitant putting in the extra work of fully integrating it
without approval by the maintainers to go ahead.

In case someone wants to take a peek at the current state of the code
you can check out my github repository here:

https://github.com/ormf/ox-html-multipage

Be aware and warned that the code is in constant flux, not finalized
and there still are some open questions for me what would be the best
way to integrate the code into the old export engine, like whether
adding optional args to the transcoding functions or using properties
in the info channel, etc... Once it is finalized, the current single
page html export will work exactly as before (it already does, but
while checking it out I am modifying the html templates for the
multipage navigation, toc, etc.)

Hope to hear from you, especially if the maintainers are reading this.

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03  9:44 multipage html output Orm Finnendahl
@ 2024-07-03 10:33 ` Dr. Arne Babenhauserheide
  2024-07-03 10:58 ` Christian Moe
  2024-07-03 21:11 ` Rudolf Adamkovič
  2 siblings, 0 replies; 22+ messages in thread
From: Dr. Arne Babenhauserheide @ 2024-07-03 10:33 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 440 bytes --]

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

> https://github.com/ormf/ox-html-multipage

Do I understand it right, that this exports a single org file into
multiple HTML files in the html subfolder?

In the interest of making it possible to build upon the code, can you
make the license GPL v2.0 *or later*?

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03  9:44 multipage html output Orm Finnendahl
  2024-07-03 10:33 ` Dr. Arne Babenhauserheide
@ 2024-07-03 10:58 ` Christian Moe
  2024-07-03 11:05   ` Ihor Radchenko
  2024-07-03 21:11 ` Rudolf Adamkovič
  2 siblings, 1 reply; 22+ messages in thread
From: Christian Moe @ 2024-07-03 10:58 UTC (permalink / raw)
  To: emacs-orgmode


Orm Finnendahl writes:

> Hi,
>
>  after my clunky publishing chain from org to gitbook with multipage
> page output broke down recently I finally decided to tackle adding an
> export backend for multipage html output to org-export.
>
> (... snip ...)
>
> - Is there widespread interest to fully integrate it into org mode?

It would be nice to have.

Conceptually, I'd see it as fitting into org-publish, perhaps, rather
than as an exporter? With org-publish-project-alist as a convenient
place to set up various options?

Yours,
Christian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03 10:58 ` Christian Moe
@ 2024-07-03 11:05   ` Ihor Radchenko
  2024-07-03 14:34     ` Christian Moe
  2024-07-04  9:50     ` Orm Finnendahl
  0 siblings, 2 replies; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-03 11:05 UTC (permalink / raw)
  To: Christian Moe; +Cc: emacs-orgmode

Christian Moe <mail@christianmoe.com> writes:

>>  after my clunky publishing chain from org to gitbook with multipage
>> page output broke down recently I finally decided to tackle adding an
>> export backend for multipage html output to org-export.
>>
>> (... snip ...)
>>
>> - Is there widespread interest to fully integrate it into org mode?
>
> It would be nice to have.
>
> Conceptually, I'd see it as fitting into org-publish, perhaps, rather
> than as an exporter? With org-publish-project-alist as a convenient
> place to set up various options?

Not really. ox-publish is more about exporting multiple input
.org/non-.org files into outputs.

I'd rather see this kind of feature being a part of ox.el - an option to
export one .org to many smaller files. Currently, we only have an option
to export one .org (or part of it) to a single string/file. (And then,
ox-odt has to try various kludges to make things work as expected with
.odt, which consist of multiple files under the hood).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03 11:05   ` Ihor Radchenko
@ 2024-07-03 14:34     ` Christian Moe
  2024-07-04  9:50     ` Orm Finnendahl
  1 sibling, 0 replies; 22+ messages in thread
From: Christian Moe @ 2024-07-03 14:34 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode


Ihor Radchenko writes:

> Christian Moe <mail@christianmoe.com> writes:
>
>>>  after my clunky publishing chain from org to gitbook with multipage
>>> page output broke down recently I finally decided to tackle adding an
>>> export backend for multipage html output to org-export.
>>>
>>> (... snip ...)
>>>
>>> - Is there widespread interest to fully integrate it into org mode?
>>
>> It would be nice to have.
>>
>> Conceptually, I'd see it as fitting into org-publish, perhaps, rather
>> than as an exporter? With org-publish-project-alist as a convenient
>> place to set up various options?
>
> Not really. ox-publish is more about exporting multiple input
> .org/non-.org files into outputs.

I was thinking in terms of purpose: organizing export of multiple
outputs to be published together. It does that with multiple inputs
because, as you say, one-to-one export is the option we currently have.

> I'd rather see this kind of feature being a part of ox.el - an option to
> export one .org to many smaller files. Currently, we only have an option
> to export one .org (or part of it) to a single string/file. (And then,
> ox-odt has to try various kludges to make things work as expected with
> .odt, which consist of multiple files under the hood).

Yes, I suppose the code for multipage export belongs on the ox.el
level. And then one would want to be able to use it out of the box
without necessarily having to configure a publishing project, just
relying on sensible defaults. So I take that back.

(There might be some considerations for ox-publish when using
multipage/chunked export *inside* a publishing project, e.g. regarding
which levels of output to include in a sitemap, but that's for another
day.)

Yours,
Christian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03  9:44 multipage html output Orm Finnendahl
  2024-07-03 10:33 ` Dr. Arne Babenhauserheide
  2024-07-03 10:58 ` Christian Moe
@ 2024-07-03 21:11 ` Rudolf Adamkovič
  2 siblings, 0 replies; 22+ messages in thread
From: Rudolf Adamkovič @ 2024-07-03 21:11 UTC (permalink / raw)
  To: Orm Finnendahl, emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

> - Is there widespread interest to fully integrate it into org mode?

Definitely. :)

Rudy
-- 
"It is no paradox to say that in our most theoretical moods we may be
nearest to our most practical applications."  --- Alfred North
Whitehead, 1861-1947

Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
http://adamkovic.org


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-03 11:05   ` Ihor Radchenko
  2024-07-03 14:34     ` Christian Moe
@ 2024-07-04  9:50     ` Orm Finnendahl
  2024-07-04 11:41       ` Ihor Radchenko
  1 sibling, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-04  9:50 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

Am Mittwoch, den 03. Juli 2024 um 11:05:39 Uhr (+0000) schrieb Ihor
Radchenko:
> 
> Not really. ox-publish is more about exporting multiple input
> .org/non-.org files into outputs.
> 
> I'd rather see this kind of feature being a part of ox.el - an option to
> export one .org to many smaller files. Currently, we only have an option
> to export one .org (or part of it) to a single string/file. (And then,
> ox-odt has to try various kludges to make things work as expected with
> .odt, which consist of multiple files under the hood).

 that is/was my intention: Basically there was only a very small
change to ox.el necessary to make it work (it's mentioned in the
comment on top of ox-multipage-html in my github repository):

Currently `org-export-as' combines parsing the org document into a
global parse tree with all additional options applied and serializing
that into the final output target format. My code simply splits the
code sections of these tasks into two separate functions, which are
called by org-export-as, `org-export--collect-tree-info' and
`org-export--transcode-headline'. The advantage of this approach is
that it is fully compatible with the prior code, but gives the
necessary flexibility to the backend export code to split up the
global parse tree before serializing.

The multipage html backend (ox-html-multipage.el) takes care of
generating the global parse tree with org-export--headline, divides
that tree into the subtrees of the individual pages, then calls the
serializing function for each of the subtrees and writes the results
to file. Is that along the lines of what you meant?

In the meantime I thought about the proposed backend. Maybe it's a
good idea to integrate the single page *and* the multipage backend
into one backend altogether: The Backend *always* produces multipage
output, but you can define the level at which the pages are split with
an #+OPTION: in the org file. Setting the default level to 0 if the
option is not set will generate the exact same output as the old
backend without breaking anything for anybody. I'm quite sure it'll
work and as I said it's mainly done and wouldn't require a lot of
work.

What do you think?

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-04  9:50     ` Orm Finnendahl
@ 2024-07-04 11:41       ` Ihor Radchenko
  2024-07-04 13:33         ` Orm Finnendahl
  0 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-04 11:41 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

>> I'd rather see this kind of feature being a part of ox.el - an option to
>> export one .org to many smaller files. Currently, we only have an option
>> to export one .org (or part of it) to a single string/file. (And then,
>> ox-odt has to try various kludges to make things work as expected with
>> .odt, which consist of multiple files under the hood).
>
>  that is/was my intention: Basically there was only a very small
> change to ox.el necessary to make it work (it's mentioned in the
> comment on top of ox-multipage-html in my github repository):
>
> Currently `org-export-as' combines parsing the org document into a
> global parse tree with all additional options applied and serializing
> that into the final output target format. My code simply splits the
> code sections of these tasks into two separate functions, which are
> called by org-export-as, `org-export--collect-tree-info' and
> `org-export--transcode-headline'. The advantage of this approach is
> that it is fully compatible with the prior code, but gives the
> necessary flexibility to the backend export code to split up the
> global parse tree before serializing.

This makes sense.

Although, multipage export may imply two different things:
1. An ability to produce multiple pages from parts of the original Org
   file.
2. An ability to produce multiple pages from a single part of Org file.
   For example, consider an Org document with images exported to
   ODT. The images should be stored alongside XML content file and
   referenced from there. So, export produces multiple files from the
   same document/subtree.
   
Your approach only addresses (1), but not (2).

That said, even having (1) is a welcome improvement.

> The multipage html backend (ox-html-multipage.el) takes care of
> generating the global parse tree with org-export--headline, divides
> that tree into the subtrees of the individual pages, then calls the
> serializing function for each of the subtrees and writes the results
> to file. Is that along the lines of what you meant?

Yes, but we also need to carefully discuss the rules how the full parse
tree is separated into subtrees. Your proof of concept code hard-codes
these rules.

> In the meantime I thought about the proposed backend. Maybe it's a
> good idea to integrate the single page *and* the multipage backend
> into one backend altogether: The Backend *always* produces multipage
> output, but you can define the level at which the pages are split with
> an #+OPTION: in the org file. Setting the default level to 0 if the
> option is not set will generate the exact same output as the old
> backend without breaking anything for anybody. I'm quite sure it'll
> work and as I said it's mainly done and wouldn't require a lot of
> work.

1. Most of the existing backends are written to produce a single
   page. So, our design of ox.el part should be able to handle
   those. What you proposed (calling the same backend on pre-split parse
   tree) sounds good in this context.

2. Some backends, as you proposed, may target multipage export from the
   very beginning. So, we need to provide some way for the backend (in
   org-export-define*-backend) to specify that it wants to split the
   original parse tree. I imagine some kind of option with default
   values configured via backend, but optionally overwritten by user
   settings/in-buffer keywords.

3. Your suggestion to add a new export option for splitting based on
   headline level is one idea.

   Another idea is to split out subtrees with :EXPORT_FILE_NAME:
   property.

4. One possible extra feature might be exporting only a part of the
   original Org file to separate pages. Say, only pages with specific
   tag. The whole original Org file is also exported, replacing the
   split-out parts with, for example, links. This will generalize
   "index" pages from ox-publish.

5. We need to consider the rules used to generate export file names.
   Currently, we choose between :EXPORT_FILE_NAME: property,
   #+EXPORT_FILE_NAME: keyword, and the original file name.

   As I see in your code, you also introduced deriving file name from
   the headline title.

6. I can see people flipping between exporting the whole document and
   multipage document. We probably need some kind of easy switch in M-x
   org-export-dispatch to choose how to export.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-04 11:41       ` Ihor Radchenko
@ 2024-07-04 13:33         ` Orm Finnendahl
  2024-07-04 16:20           ` Ihor Radchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-04 13:33 UTC (permalink / raw)
  To: emacs-orgmode

Hi Ihor,

 thanks for your time to study the code and your very valuable input,
much appreciated!

Am Donnerstag, den 04. Juli 2024 um 11:41:35 Uhr (+0000) schrieb Ihor
Radchenko:
>
> 2. An ability to produce multiple pages from a single part of Org file.
>    For example, consider an Org document with images exported to
>    ODT. The images should be stored alongside XML content file and
>    referenced from there. So, export produces multiple files from the
>    same document/subtree.
>    
> Your approach only addresses (1), but not (2).

Sure. I'm not at all familiar with the peculiarities of other output
backends, but see your point. If you can give any hints or have any
ideas *how* we could find general rules for separating the subtrees,
which cover foreseeable use cases, or devise a flexible mechanism for
doing so, I'd be glad to help setting them up and implementing them. I
definitely agree, the code should be as general as possible while
providing complete backward compatibility.

> 1. Most of the existing backends are written to produce a single
>    page. So, our design of ox.el part should be able to handle
>    those. What you proposed (calling the same backend on pre-split parse
>    tree) sounds good in this context.

Ok.

> 2. Some backends, as you proposed, may target multipage export from the
>    very beginning. So, we need to provide some way for the backend (in
>    org-export-define*-backend) to specify that it wants to split the
>    original parse tree. I imagine some kind of option with default
>    values configured via backend, but optionally overwritten by user
>    settings/in-buffer keywords.

I'll look into that and maybe I can come up with something. I was
hesitant to propose anything as I tried to stay as limited as possible
and not get too deep into changing things. If you have suggestions,
please let me know.

> 3. Your suggestion to add a new export option for splitting based on
>    headline level is one idea.
> 
>    Another idea is to split out subtrees with :EXPORT_FILE_NAME:
>    property.

I'm not sure I fully understand what you mean: Do you mean specifying
different :EXPORT_FILE_NAME: properties throughout the same document
and then export accordingly?

> 4. One possible extra feature might be exporting only a part of the
>    original Org file to separate pages. Say, only pages with specific
>    tag. The whole original Org file is also exported, replacing the
>    split-out parts with, for example, links. This will generalize
>    "index" pages from ox-publish.

Very nice idea! MAybe along these lines is that I thought about
"Master" org files which combine different documentations by linking
to them in some sort of top menu which is included on every page of
all these documentations and then being able to generate a single
documentation without having to recompile everything. But for now I'd
prefer to first get it working and then think about such extensions (I
have more ideas for different extensions and "plugins" which could be
useful). It shouldn't be too hard to implement at a later point and
probably also wouldn't need a complete rewrite.

> 5. We need to consider the rules used to generate export file names.
>    Currently, we choose between :EXPORT_FILE_NAME: property,
>    #+EXPORT_FILE_NAME: keyword, and the original file name.
> 
>    As I see in your code, you also introduced deriving file name from
>    the headline title.

Exactly. I wanted to make sure, the file names are sorted correctly,
are unique and the title is relatable to the section it names on the
directory level. I also thought about making it user-configurable, but
first wanted to implement a working solution.

> 6. I can see people flipping between exporting the whole document and
>    multipage document. We probably need some kind of easy switch in M-x
>    org-export-dispatch to choose how to export.

Sure, that is the disadvantage of my proposal to make everything a
"multipage" document. Another disadvantage is that when the user
chooses to open the final document or display it in a buffer the user
can't choose whether to only open/display one page or every exported
page. In most circumstances it should be advisable to just
open/display the first page. We can also just add a switch between
single-page and multipage, with multipage always just exporting to
file, but that also has disadvantages.

As the code I proposed is encapsulated in the html backend and not
spreading all over the place, I will now first go ahead to finalize
the existing code to a fully working setup. ASFAICT adapting that to
other needs shouldn't require a complete rewrite. And I might be
around for a while ;-)

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-04 13:33         ` Orm Finnendahl
@ 2024-07-04 16:20           ` Ihor Radchenko
  2024-07-07 19:33             ` Orm Finnendahl
  2024-07-07 20:50             ` Orm Finnendahl
  0 siblings, 2 replies; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-04 16:20 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

> Sure. I'm not at all familiar with the peculiarities of other output
> backends, but see your point. If you can give any hints or have any
> ideas *how* we could find general rules for separating the subtrees,
> which cover foreseeable use cases, or devise a flexible mechanism for
> doing so, I'd be glad to help setting them up and implementing them. I
> definitely agree, the code should be as general as possible while
> providing complete backward compatibility.

I think that the easiest would be adding a new option to
`org-export-options-alist' - it is already extendable for individual
backends and allows users to tweak things via in-buffer keywords,
properties, variables, and export options.

The most generic rule would be some kind of function that takes AST
node as input and returns whether that node should be going to a separate
file or not, and if yes, tell (1) which export backend to use to export
that subtree to a file (may as well allow exporting to different
formats, while we are at it); (2) what are the export parameters to be
used for that export, (possibly) including the file path.

Then, in addition to the most generic (and most flexible) "rule being an
Elisp function", we can allow some simplified semantics to define rules.

The semantics should probably give a couple of toggles to customize:
(1) which subtrees are selected for export; (2) which export backend is
used (3) how their file names are generated; (4) (optional) how they are
represented when exporting the whole original file; e.g. whether to put
links to exported files in place of their subtrees; (5) (optional) how
the original file is represented in the exported subtrees; e.g. whether
to put backlink to parent file

The subtree selection may boil down to the usual TAGS matcher (or
function), as described in "11.3.3 Matching tags and properties" section
of the manual. This will cover the previously discussed separation based
on headline level, a tag, or a property.

The export backend selection may be realized by allowing multiple rules
with each rule defining selection/backend/file name/....

In terms of the value semantics in Elisp, I am thinking about something
re-using backend definition format:

(setq org-export-pages
      '(:selector "LEVEL=2+blog+TODO=DONE"
        :backend html
         ;; completely remove the exported subtree is original document
         ;; is being exported.
        :page-transcoder nil
         ;; or :page-transcoder #'org-export-page-as-heading-with-link
        :export-file-name "%{TITLE}-%{page-number}" ;; or some other kind of template syntax
        )

       '(:selector a-function-accepting-ast-node
         :source-backend any 
         :backend
         (:parent html ;; `org-export-define-derived-backend'-like semantics
          :options-alist
          ;; Do not export private headings in HTML pages.
          ((:exclude-tags "EXCLUDE_TAGS" nil (cons "private" org-export-exclude-tags) split))))

        '(:selector "+export_ascii_page"
          :source-backend html ; only use this rule when exporting to html
          :backend
          (:parent ascii
           ((template .
              (lambda (contents info)
                (format "Paged out from %s\n%s"
                   (plist-get
                     ;; INFO channel for parent document
                     (plist-get info :page-source)
                     :title)
                   (org-ascii-template contents info)))))))))

>> 2. Some backends, as you proposed, may target multipage export from the
>>    very beginning. So, we need to provide some way for the backend (in
>>    org-export-define*-backend) to specify that it wants to split the
>>    original parse tree. I imagine some kind of option with default
>>    values configured via backend, but optionally overwritten by user
>>    settings/in-buffer keywords.
>
> I'll look into that and maybe I can come up with something. I was
> hesitant to propose anything as I tried to stay as limited as possible
> and not get too deep into changing things. If you have suggestions,
> please let me know.

One way could be simply adding an option like :selector above to the
backend definition. Then, it will be used as default selector:

(setq org-export-pages
  (:selector default :backend html) ; export pages to html with default selector
)

or even

(setq org-export-pages
  (:backend html) ; export pages to html with default selector
)

or just

;; export using the same target backend as selected in the export menu
(setq org-export-pages t)
;; (setq org-export-pages nil) - existing single page export
;; (setq org-export-pages 'only-pages) - only export pages, ignore original file

>> 3. Your suggestion to add a new export option for splitting based on
>>    headline level is one idea.
>> 
>>    Another idea is to split out subtrees with :EXPORT_FILE_NAME:
>>    property.
>
> I'm not sure I fully understand what you mean: Do you mean specifying
> different :EXPORT_FILE_NAME: properties throughout the same document
> and then export accordingly?

Yes. It is re-using the existing idea with subtree export

13.2 Export Settings

‘EXPORT_FILE_NAME’
     The name of the output file to be generated.  Otherwise, Org
     generates the file name based on the buffer name and the extension
     based on the backend format.

If a subtree has that property set, it is used as output file name.
Since there is usually no reason to set this property unless you also
want to export subtree to individual file, it makes sense to use this as
selector for what to export as pages.

Example:

#+TITLE: Index document

* Emacs notes
** Emacs blog post #1
:PROPERTIES:
:EXPORT_FILE_NAME: my-first-post
:END:
...
** Fleeting note at [2024-06-20 Thu 22:16]
Some notes, no need to export them.

* Personal notes
** Personal blog post #1
:PROPERTIES:
:EXPORT_FILE_NAME: private/personal-post-trial
:END:
...

>> 6. I can see people flipping between exporting the whole document and
>>    multipage document. We probably need some kind of easy switch in M-x
>>    org-export-dispatch to choose how to export.
>
> Sure, that is the disadvantage of my proposal to make everything a
> "multipage" document. Another disadvantage is that when the user
> chooses to open the final document or display it in a buffer the user
> can't choose whether to only open/display one page or every exported
> page. In most circumstances it should be advisable to just
> open/display the first page. We can also just add a switch between
> single-page and multipage, with multipage always just exporting to
> file, but that also has disadvantages.

What to open is a minor detail, really. It can be worked out any moment
we need to. The most sensible default, IMHO, it to open dired with the
containing directory with all the exported pages.

> As the code I proposed is encapsulated in the html backend and not
> spreading all over the place, I will now first go ahead to finalize
> the existing code to a fully working setup. ASFAICT adapting that to
> other needs shouldn't require a complete rewrite. And I might be
> around for a while ;-)

I advice against doing this.
While reading your code, I saw that you used some html-specific
functions for modifications in ox.el. If you start by modifying ox.el in
Org git repo directly, simply doing "make compile" will warn about
instances of using functions not defined in ox.el.
Another advantage of editing the ox.el and using Org repository is that
you can run "make test" any time and see if you managed to break Org :)

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
@ 2024-07-06  5:47 Pedro Andres Aranda Gutierrez
  2024-07-06  9:04 ` Orm Finnendahl
  0 siblings, 1 reply; 22+ messages in thread
From: Pedro Andres Aranda Gutierrez @ 2024-07-06  5:47 UTC (permalink / raw)
  To: orm.finnendahl; +Cc: Ihor Radchenko, Org Mode List

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

Sorry for bumping in, I've been more off than on in the last couple of
weeks...
Just a stupid question: have you considered any marker to force a page
break?
That would make this functionality portable to other exporters like LaTeX,
where
you can force a page break with \clearpage or \cleardoublepage.

(Hopefully) my .2 cents, /PA

-- 
Fragen sind nicht da, um beantwortet zu werden,
Fragen sind da um gestellt zu werden
Georg Kreisler

Headaches with a Juju log:
unit-basic-16: 09:17:36 WARNING juju.worker.uniter.operation we should run
a leader-deposed hook here, but we can't yet

[-- Attachment #2: Type: text/html, Size: 947 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-06  5:47 Pedro Andres Aranda Gutierrez
@ 2024-07-06  9:04 ` Orm Finnendahl
  0 siblings, 0 replies; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-06  9:04 UTC (permalink / raw)
  To: Org Mode List

Hi,

Am Samstag, den 06. Juli 2024 um 07:47:43 Uhr (+0200) schrieb Pedro Andres Aranda Gutierrez:
> Sorry for bumping in, I've been more off than on in the last couple of
> weeks...
> Just a stupid question: have you considered any marker to force a page
> break?
> That would make this functionality portable to other exporters like LaTeX,
> where
> you can force a page break with \clearpage or \cleardoublepage.

 although this is of course possible, currently I'm not planning to
implement it.

Regarding html export I see some problems with that idea:

1. It would either open a new can of worms if this page would be added
   to the toc with all sorts of ensuing problems like naming, etc. and
   getting out of sync with the Latex document's toc.

or

2. Those additinal pages don't get added to the toc and are only
   reachable by navigation elements, which I consider suboptimal (and
   you'd still have to name them).

In any case, currently I'm facing many problems concerning the
glorious hairy details and am glad if I can sort them out in a way
that they are general enough to be added to ox. Adding additional
engines to handle page breaks the way you envision should then be
feasible without reinventing the wheel.

--
Orm



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-04 16:20           ` Ihor Radchenko
@ 2024-07-07 19:33             ` Orm Finnendahl
  2024-07-08 15:29               ` Ihor Radchenko
  2024-07-07 20:50             ` Orm Finnendahl
  1 sibling, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-07 19:33 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Hi,

 this is a report of my current state with the html multipage export
backend: I finished most of the heavy lifting and am currently trying
to integrate it with the old backend into a single file.

For now I plan to use a custom menu-entry ('m') in the export dialog
rather than doing it with an option in the file. The main reason is
that I like to be able to switch between output formats easily without
having to change the document. But that's debatable. I could also
implement it with an option in the document and I'm open for opinions.

For the backend I'm planning to realize the following options
(implemented as custom variables, which can be overwritten in the
document):

- org-html-multipage-export-directory

  The directory for the exported files (relative or absolute).

- org-html-multipage-head

  (similar to HTML_HEAD but will be used instead of the HTML_HEAD for
  custom css/js)

- org-html-multipage-front-matter

  A list to specify pages in front of the headlines of the
  document. Possible values are 'title, 'title-toc and 'toc. title-toc
  is a combined page containing the title and the toc. Multiple
  entries are possible.

- org-html-multipage-join-first-subsection

  Boolean: Non-nil means that the first subsection of a section
  without a body will be joined on the section page (recursively). See
  my generated example pages linked below (Chapters 4, 5 and 7 for a
  recursive example)

- org-html-multipage-split

  How to split the document. Possible values are

  'toc for generating a page for each toc entry.
  
  'export-filename for splitting into pages along :EXPORT_FILENAME:
  properties. The autogenerated filename mechanism for the other
  options will be overwritten in this case.

  A number for the depth to split (similar to the value for h: or
  toc:) I haven't tested all options yet but will see whether/how it
  works.

- org-html-multipage-open

  Whether and where to open the first page of the document after
  export. Possible values are 'browser 'buffer or nil. (As Ihor
  mentioned this is a minor issue).

This is fairly straightforward for me to realize (it's mostly done
already). The suggestions of Ihor are excellent, but IIUC they
implement a larger and more general context, which of course is
desirable. I have to study the ideas more thoroughly to see, how
difficult/time consuming it will be to implement. It might be that it
is better to do it in two steps to keep it manageable for me. I'm
pretty sure that the current approach can be adapted to the larger
context easily so the work is not in vain.

In addition I have a question about the html output layout
structure. Here is an example of a file generated with the current
code with some preliminary layout. It might give an idea about my use
case:

https://www.selma.hfmdk-frankfurt.de/finnendahl/klangsynthesebuch/01_00_00_vorwort.html#orge24571b

Regardless of the colours, the file has a slightly different hierarchy
than the single page html template of ORGMODE and is more oriented
towards the layout of documentation nowadays with a (hideable) toc at
the side on every page rather than the texinfo oriented layout used by
the orgmode manual. If my code gets accepted/merged to org what should
be the default layout shipped with multipage output? FYI: The
visibility of the toc entries is managed by the css and the whole toc
is included on each page (and its visibility could be managed with js
as well). Should I rather go for the classic texinfo view?

And now just a short answer to Ihor's remarks.

Am Donnerstag, den 04. Juli 2024 um 16:20:29 Uhr (+0000) schrieb Ihor
Radchenko:
> While reading your code, I saw that you used some html-specific
> functions for modifications in ox.el. If you start by modifying ox.el in
> Org git repo directly, simply doing "make compile" will warn about
> instances of using functions not defined in ox.el.
> Another advantage of editing the ox.el and using Org repository is that
> you can run "make test" any time and see if you managed to break Org :)

Of course. I never intended to corrupt ox.el with html specific stuff,
that was just preliminary while getting acquainted with the
code. Currently I'm in the process of separating everything and
reducing it to the minimal requirements for change. I'll let you know
when it's done.

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-04 16:20           ` Ihor Radchenko
  2024-07-07 19:33             ` Orm Finnendahl
@ 2024-07-07 20:50             ` Orm Finnendahl
  2024-07-08 15:05               ` Ihor Radchenko
  1 sibling, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-07 20:50 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Hi Ihor,

 I'm trying to grasp what you are proposing and have some questions to
make sure I've understood (please correct me if I'm wrong):

- Your idea is to add an option to the backend definition called
  org-export-pages which is a plist containing information about the
  way to export the document in case some "multipage" option is chosen
  in the export dialog.

- Am I right that you suggest that all these org-export-pages
  properties can be overwritten in the header of the org file?

- If that is correct I assume multipage export should then be a
  generic option common to different export backends (if defined)
  (something like "export-as-multipage") and the question is how to
  specify that when exporting. Should this option just be listed in
  the export dialog for every export backend which supports it (like
  in my current approach for html) and when choosing it the rules of
  the current definition of org-export-pages in the current context
  are used?

- This implies that the code handling this is done in ox.el like this:

  The export-pages function in ox.el
  
  1. generates the parse-tree
  
  2. extracts the subtrees according to the rules

  3. calls org-export-to-file on the backends for each of them.

  4. optionally also exports the whole document, maybe stripped from
     its exported sections (replaced by links, etc.)

If this is the way you suggest it, it doesn't sound too complicated as
most of it is done already.

My only concern is that in this case org-export-pages is not really
backend specific and therefore the place for it semantically shouldn't
be in the definition of the backend, but separate from it.

The backend should just define a general function for exporting a
subtree to a file for the multipage case as this might differ from the
definition for single file output of the complete parse-tree (with the
name of this general multipage export function being the same in all
backends which support multipage output).

This would also imply a mechanism to define different org-export-pages
plists and select from them before exporting by calling a generic
backend-agnostic org-export-to-pages function in ox.el. This is very
elegant but also somewhat different from the current layout of
org-export which is single-page single-backend centered. Hmm...

--
Orm


Am Donnerstag, den 04. Juli 2024 um 16:20:29 Uhr (+0000) schrieb Ihor Radchenko:
> Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:
> 
> > Sure. I'm not at all familiar with the peculiarities of other output
> > backends, but see your point. If you can give any hints or have any
> > ideas *how* we could find general rules for separating the subtrees,
> > which cover foreseeable use cases, or devise a flexible mechanism for
> > doing so, I'd be glad to help setting them up and implementing them. I
> > definitely agree, the code should be as general as possible while
> > providing complete backward compatibility.
> 
> I think that the easiest would be adding a new option to
> `org-export-options-alist' - it is already extendable for individual
> backends and allows users to tweak things via in-buffer keywords,
> properties, variables, and export options.
> 
> The most generic rule would be some kind of function that takes AST
> node as input and returns whether that node should be going to a separate
> file or not, and if yes, tell (1) which export backend to use to export
> that subtree to a file (may as well allow exporting to different
> formats, while we are at it); (2) what are the export parameters to be
> used for that export, (possibly) including the file path.
> 
> Then, in addition to the most generic (and most flexible) "rule being an
> Elisp function", we can allow some simplified semantics to define rules.
> 
> The semantics should probably give a couple of toggles to customize:
> (1) which subtrees are selected for export; (2) which export backend is
> used (3) how their file names are generated; (4) (optional) how they are
> represented when exporting the whole original file; e.g. whether to put
> links to exported files in place of their subtrees; (5) (optional) how
> the original file is represented in the exported subtrees; e.g. whether
> to put backlink to parent file
> 
> The subtree selection may boil down to the usual TAGS matcher (or
> function), as described in "11.3.3 Matching tags and properties" section
> of the manual. This will cover the previously discussed separation based
> on headline level, a tag, or a property.
> 
> The export backend selection may be realized by allowing multiple rules
> with each rule defining selection/backend/file name/....
> 
> In terms of the value semantics in Elisp, I am thinking about something
> re-using backend definition format:
> 
> (setq org-export-pages
>       '(:selector "LEVEL=2+blog+TODO=DONE"
>         :backend html
>          ;; completely remove the exported subtree is original document
>          ;; is being exported.
>         :page-transcoder nil
>          ;; or :page-transcoder #'org-export-page-as-heading-with-link
>         :export-file-name "%{TITLE}-%{page-number}" ;; or some other kind of template syntax
>         )
> 
>        '(:selector a-function-accepting-ast-node
>          :source-backend any 
>          :backend
>          (:parent html ;; `org-export-define-derived-backend'-like semantics
>           :options-alist
>           ;; Do not export private headings in HTML pages.
>           ((:exclude-tags "EXCLUDE_TAGS" nil (cons "private" org-export-exclude-tags) split))))
> 
>         '(:selector "+export_ascii_page"
>           :source-backend html ; only use this rule when exporting to html
>           :backend
>           (:parent ascii
>            ((template .
>               (lambda (contents info)
>                 (format "Paged out from %s\n%s"
>                    (plist-get
>                      ;; INFO channel for parent document
>                      (plist-get info :page-source)
>                      :title)
>                    (org-ascii-template contents info)))))))))
> 
> >> 2. Some backends, as you proposed, may target multipage export from the
> >>    very beginning. So, we need to provide some way for the backend (in
> >>    org-export-define*-backend) to specify that it wants to split the
> >>    original parse tree. I imagine some kind of option with default
> >>    values configured via backend, but optionally overwritten by user
> >>    settings/in-buffer keywords.
> >
> > I'll look into that and maybe I can come up with something. I was
> > hesitant to propose anything as I tried to stay as limited as possible
> > and not get too deep into changing things. If you have suggestions,
> > please let me know.
> 
> One way could be simply adding an option like :selector above to the
> backend definition. Then, it will be used as default selector:
> 
> (setq org-export-pages
>   (:selector default :backend html) ; export pages to html with default selector
> )
> 
> or even
> 
> (setq org-export-pages
>   (:backend html) ; export pages to html with default selector
> )
> 
> or just
> 
> ;; export using the same target backend as selected in the export menu
> (setq org-export-pages t)
> ;; (setq org-export-pages nil) - existing single page export
> ;; (setq org-export-pages 'only-pages) - only export pages, ignore original file
> 
> >> 3. Your suggestion to add a new export option for splitting based on
> >>    headline level is one idea.
> >> 
> >>    Another idea is to split out subtrees with :EXPORT_FILE_NAME:
> >>    property.
> >
> > I'm not sure I fully understand what you mean: Do you mean specifying
> > different :EXPORT_FILE_NAME: properties throughout the same document
> > and then export accordingly?
> 
> Yes. It is re-using the existing idea with subtree export
> 
> 13.2 Export Settings
> 
> ‘EXPORT_FILE_NAME’
>      The name of the output file to be generated.  Otherwise, Org
>      generates the file name based on the buffer name and the extension
>      based on the backend format.
> 
> If a subtree has that property set, it is used as output file name.
> Since there is usually no reason to set this property unless you also
> want to export subtree to individual file, it makes sense to use this as
> selector for what to export as pages.
> 
> Example:
> 
> #+TITLE: Index document
> 
> * Emacs notes
> ** Emacs blog post #1
> :PROPERTIES:
> :EXPORT_FILE_NAME: my-first-post
> :END:
> ...
> ** Fleeting note at [2024-06-20 Thu 22:16]
> Some notes, no need to export them.
> 
> * Personal notes
> ** Personal blog post #1
> :PROPERTIES:
> :EXPORT_FILE_NAME: private/personal-post-trial
> :END:
> ...
> 
> >> 6. I can see people flipping between exporting the whole document and
> >>    multipage document. We probably need some kind of easy switch in M-x
> >>    org-export-dispatch to choose how to export.
> >
> > Sure, that is the disadvantage of my proposal to make everything a
> > "multipage" document. Another disadvantage is that when the user
> > chooses to open the final document or display it in a buffer the user
> > can't choose whether to only open/display one page or every exported
> > page. In most circumstances it should be advisable to just
> > open/display the first page. We can also just add a switch between
> > single-page and multipage, with multipage always just exporting to
> > file, but that also has disadvantages.
> 
> What to open is a minor detail, really. It can be worked out any moment
> we need to. The most sensible default, IMHO, it to open dired with the
> containing directory with all the exported pages.
> 
> > As the code I proposed is encapsulated in the html backend and not
> > spreading all over the place, I will now first go ahead to finalize
> > the existing code to a fully working setup. ASFAICT adapting that to
> > other needs shouldn't require a complete rewrite. And I might be
> > around for a while ;-)
> 
> I advice against doing this.
> While reading your code, I saw that you used some html-specific
> functions for modifications in ox.el. If you start by modifying ox.el in
> Org git repo directly, simply doing "make compile" will warn about
> instances of using functions not defined in ox.el.
> Another advantage of editing the ox.el and using Org repository is that
> you can run "make test" any time and see if you managed to break Org :)
> 
> -- 
> Ihor Radchenko // yantar92,
> Org mode contributor,
> Learn more about Org mode at <https://orgmode.org/>.
> Support Org development at <https://liberapay.com/org-mode>,
> or support my work at <https://liberapay.com/yantar92>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-07 20:50             ` Orm Finnendahl
@ 2024-07-08 15:05               ` Ihor Radchenko
  2024-07-08 15:41                 ` Orm Finnendahl
  0 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-08 15:05 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

>  I'm trying to grasp what you are proposing and have some questions to
> make sure I've understood (please correct me if I'm wrong):

(Just for some context, do not take my ideas as something you must
follow 100% accurately. I am largely brainstorming here. So, feel free
to disagree, propose anything alternative, etc; My main focus in this
discussion is that multipage export should be backend-agnostic if
possible)

> - Your idea is to add an option to the backend definition called
>   org-export-pages which is a plist containing information about the
>   way to export the document in case some "multipage" option is chosen
>   in the export dialog.

Yup. Not an "option" in a sense of variable, but a proper export option
that can be set via (1) variable; (2) backend option plist (in other
words, overridden by backends); (3) in-buffer keyword, locally.

> - Am I right that you suggest that all these org-export-pages
>   properties can be overwritten in the header of the org file?

Yes. But that may be controlled by the backends, as with any other
export option. To illustrate, there is CREATOR option that ox-html
re-defines like the following:

;; Original global definition in ox.el
    (:creator "CREATOR" nil org-export-creator-string)

;; Override inside ox.el.  In this example, it uses a backend-specific
;; customization instead of `org-export-creator-string', but anything
;; at all can be overridden.
    (:creator "CREATOR" nil org-html-creator-string)

In both cases, the :creator export option can be set in buffer via,
#+CREATOR: name

> - If that is correct I assume multipage export should then be a
>   generic option common to different export backends (if defined)
>   (something like "export-as-multipage") and the question is how to
>   specify that when exporting. Should this option just be listed in
>   the export dialog for every export backend which supports it (like
>   in my current approach for html) and when choosing it the rules of
>   the current definition of org-export-pages in the current context
>   are used?

Yes. Something similar to `org-export-visible-only',
`org-export-body-only', etc. These customizations can be toggled
interactively, from `org-export-dispatch'.

A question for future is whether we want more than just "t" or "nil"
toggle, but it should not be too hard to generalize if we simply start
from just t/nil.

We might also consider adding MULTIPAGE as an additional argument to the
API function (just like BODY-ONLY, VISIBLE-ONLY, SUBTREEP that we
already use), but that's probably an implementation idea we may or may
not need to use.

> - This implies that the code handling this is done in ox.el like this:
>
>   The export-pages function in ox.el
>   
>   1. generates the parse-tree
>   
>   2. extracts the subtrees according to the rules
>
>   3. calls org-export-to-file on the backends for each of them.
>
>   4. optionally also exports the whole document, maybe stripped from
>      its exported sections (replaced by links, etc.)
>
> If this is the way you suggest it, it doesn't sound too complicated as
> most of it is done already.

Yes, roughly like this.
Ideally, we should simply modify `org-export-as', but handling output
file name may be a bit tricky - it is somewhat awkwardly placed in the
current ox.el API (see the discussion in https://list.orgmode.org/orgmode/25393.61240.135445.401251@gargle.gargle.HOWL/T/#u).

> My only concern is that in this case org-export-pages is not really
> backend specific and therefore the place for it semantically shouldn't
> be in the definition of the backend, but separate from it.

I guess that backends may provide some defaults that make more sense for
those backends only. But otherwise splitting the full AST before
individual page export might be simply handled in ox.el.

> The backend should just define a general function for exporting a
> subtree to a file for the multipage case as this might differ from the
> definition for single file output of the complete parse-tree (with the
> name of this general multipage export function being the same in all
> backends which support multipage output).

All the built-in backends already have such function. For example,

(defun org-html-export-to-html
    (&optional async subtreep visible-only body-only ext-plist)
                     ^^^^^^^^

If subtree export is good enough to handle multi-page export, we may not
even need to do much. (Although, deriving the file names is currently
hard-coded for subtrees and is not very customizable; see the link I
shared above)

> This would also imply a mechanism to define different org-export-pages
> plists and select from them before exporting by calling a generic
> backend-agnostic org-export-to-pages function in ox.el. This is very
> elegant but also somewhat different from the current layout of
> org-export which is single-page single-backend centered. Hmm...

I do not think that we need to go too deep into this rabbit hole for
now. A simple toggle based on `org-export-dispatch' might be good
enough. It can be easily extended to something like multi-state switch
(t/nil vs. t -> option A -> option B -> nil -> t -> ...).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-07 19:33             ` Orm Finnendahl
@ 2024-07-08 15:29               ` Ihor Radchenko
  2024-07-08 19:12                 ` Orm Finnendahl
  0 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-08 15:29 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

> For the backend I'm planning to realize the following options
> (implemented as custom variables, which can be overwritten in the
> document):
>
> - org-html-multipage-export-directory
>
>   The directory for the exported files (relative or absolute).

I am wondering about the reasoning behind not re-using
#+EXPORT_FILE_NAME: here (its directory part) and simply defaulting to
 `default-directory'.

Is there any situation when you need to export the full document
vs. multipage to different places?

> - org-html-multipage-head
>
>   (similar to HTML_HEAD but will be used instead of the HTML_HEAD for
>   custom css/js)

Again, why not directly using #+HTML_HEAD?

> - org-html-multipage-front-matter
>
>   A list to specify pages in front of the headlines of the
>   document. Possible values are 'title, 'title-toc and 'toc. title-toc
>   is a combined page containing the title and the toc. Multiple
>   entries are possible.

This sounds orthogonal to multipage export. May you please illustrate
what you want to achieve by introducing this option? Maybe there is an
existing feature that can be re-used instead of creating something new?

> - org-html-multipage-join-first-subsection
>
>   Boolean: Non-nil means that the first subsection of a section
>   without a body will be joined on the section page (recursively). See
>   my generated example pages linked below (Chapters 4, 5 and 7 for a
>   recursive example)

Sorry, but I cannot understand anything from there. May you explain in
words?

> - org-html-multipage-split
>
>   How to split the document. Possible values are
>
>   'toc for generating a page for each toc entry.

May I guess that the previous option may have something do with
situation when #+TOC: keyword is in the middle of a text?
   
> In addition I have a question about the html output layout
> structure. Here is an example of a file generated with the current
> code with some preliminary layout. It might give an idea about my use
> case:
>
> https://www.selma.hfmdk-frankfurt.de/finnendahl/klangsynthesebuch/01_00_00_vorwort.html#orge24571b
>
> Regardless of the colours, the file has a slightly different hierarchy
> than the single page html template of ORGMODE and is more oriented
> towards the layout of documentation nowadays with a (hideable) toc at
> the side on every page rather than the texinfo oriented layout used by
> the orgmode manual. If my code gets accepted/merged to org what should
> be the default layout shipped with multipage output? FYI: The
> visibility of the toc entries is managed by the css and the whole toc
> is included on each page (and its visibility could be managed with js
> as well). Should I rather go for the classic texinfo view?

Do I understand correctly that your alternative layout is simply a
question of custom #+HTML_HEADER? Or is there something more to it?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 15:05               ` Ihor Radchenko
@ 2024-07-08 15:41                 ` Orm Finnendahl
  2024-07-08 15:56                   ` Ihor Radchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-08 15:41 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

Am Montag, den 08. Juli 2024 um 15:05:58 Uhr (+0000) schrieb Ihor Radchenko:
> 
> We might also consider adding MULTIPAGE as an additional argument to the
> API function (just like BODY-ONLY, VISIBLE-ONLY, SUBTREEP that we
> already use), but that's probably an implementation idea we may or may
> not need to use.

Currently I set the :multipage property in info, but that's a detail
that can be sorted out later.

> Yes, roughly like this.  Ideally, we should simply modify
> `org-export-as', but handling output file name may be a bit tricky -
> it is somewhat awkwardly placed in the current ox.el API (see the
> discussion in
> https://list.orgmode.org/orgmode/25393.61240.135445.401251@gargle.gargle.HOWL/T/#u).

Today I had a look at ox.el when upgrading my code to
9.8-pre. Unfortunately the code (and behaviour of org-element, etc.)
has changed quite a bit and I had to fix many things.

Especially in org-export-as the parsing of the tree is now done in the
lexical context of a copy of the buffer which makes implementing a
multipage backend even more awkward.

IMHO the code is just the wrong way around: org-export-to-file calls
org-export-as which combines the parsing with generating the output
string. The multipage code has to split that part and that doesn't get
easier when both parts have to be evaluated in the context of
org-export-with-buffer-copy. I'd rather have that turned inside out:
Instead of org-export-as being a part of
org-export-to-file/buffer/etc., its functionality could be at the
top-level and then call org-export-to... appropriately (either for
multipage output, single-page output, buffer-output...). I will handle
it by splitting org-export-as just before the
org-export-with-buffer-copy, but consider it a bit ugly.

> I do not think that we need to go too deep into this rabbit hole for
> now. A simple toggle based on `org-export-dispatch' might be good
> enough. It can be easily extended to something like multi-state switch
> (t/nil vs. t -> option A -> option B -> nil -> t -> ...).

There is something else: A lot of my energy in the multipage backend
went into getting links and footnotes correct. Footnotes aren't a big
deal, but I have no idea how to handle cross document links if
different backends are present (e.g. linking from html to a pdf
document and vice versa ;-) I think this requires quite a bit more
thinking and maybe is unrealistic altogether, but at least the
framework could be changed to be able to tackle that in the distant
future...

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 15:41                 ` Orm Finnendahl
@ 2024-07-08 15:56                   ` Ihor Radchenko
  2024-07-08 19:18                     ` Orm Finnendahl
  0 siblings, 1 reply; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-08 15:56 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

>> Yes, roughly like this.  Ideally, we should simply modify
>> `org-export-as', but handling output file name may be a bit tricky -
>> it is somewhat awkwardly placed in the current ox.el API (see the
>> discussion in
>> https://list.orgmode.org/orgmode/25393.61240.135445.401251@gargle.gargle.HOWL/T/#u).
>
> Today I had a look at ox.el when upgrading my code to
> 9.8-pre. Unfortunately the code (and behaviour of org-element, etc.)
> has changed quite a bit and I had to fix many things.
>
> Especially in org-export-as the parsing of the tree is now done in the
> lexical context of a copy of the buffer which makes implementing a
> multipage backend even more awkward.
>
> IMHO the code is just the wrong way around: org-export-to-file calls
> org-export-as which combines the parsing with generating the output
> string. The multipage code has to split that part and that doesn't get
> easier when both parts have to be evaluated in the context of
> org-export-with-buffer-copy. I'd rather have that turned inside out:
> Instead of org-export-as being a part of
> org-export-to-file/buffer/etc., its functionality could be at the
> top-level and then call org-export-to... appropriately (either for
> multipage output, single-page output, buffer-output...). I will handle
> it by splitting org-export-as just before the
> org-export-with-buffer-copy, but consider it a bit ugly.

Or we can make `org-export-as' retain INFO channel when returning the
output. Then, we can make `org-export-to-file' make use of the INFO
channel to decide the file name. This way, there will be no need to
decide the file name before running the parsing.

> There is something else: A lot of my energy in the multipage backend
> went into getting links and footnotes correct. Footnotes aren't a big
> deal, but I have no idea how to handle cross document links if
> different backends are present (e.g. linking from html to a pdf
> document and vice versa ;-) I think this requires quite a bit more
> thinking and maybe is unrealistic altogether, but at least the
> framework could be changed to be able to tackle that in the distant
> future...

Yes, it is an important feature we would need to implement - turning
internal links into external when they no longer point inside the same
document.

Somewhat relevant code: `org-export--update-included-link' and ox-publish.

For links to external pdfs and co, we have discussed what can be done in
https://list.orgmode.org/orgmode/87a5rpoi4c.fsf@localhost/
TL;DR: In latex, \href{file.pdf#anchor} works; In web, anchors should
also work with pdfjs.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 15:29               ` Ihor Radchenko
@ 2024-07-08 19:12                 ` Orm Finnendahl
  2024-07-09 17:55                   ` Ihor Radchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-08 19:12 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Am Montag, den 08. Juli 2024 um 15:29:47 Uhr (+0000) schrieb Ihor Radchenko:
> Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:
> 
> > For the backend I'm planning to realize the following options
> > (implemented as custom variables, which can be overwritten in the
> > document):
> >
> > - org-html-multipage-export-directory
> >
> >   The directory for the exported files (relative or absolute).
> 
> I am wondering about the reasoning behind not re-using
> #+EXPORT_FILE_NAME: here (its directory part) and simply defaulting to
>  `default-directory'.
> 
> Is there any situation when you need to export the full document
> vs. multipage to different places?

Actually that is what I'm currently doing (and what I need for my
publishing chain): The single-page document is not in the html folder
used for the multipage document. Both files happen to have the same
name so it wouldn't work out, if I want to generate single-page along
the multipage version, without having to change the document.

> > - org-html-multipage-head
> >
> >   (similar to HTML_HEAD but will be used instead of the HTML_HEAD for
> >   custom css/js)
> 
> Again, why not directly using #+HTML_HEAD?

Same as above: My multipage has a completely different css and js and
I think this is unavoidable. All this is just for being able to do
both exports without interfering.

> > - org-html-multipage-front-matter
> >
> >   A list to specify pages in front of the headlines of the
> >   document. Possible values are 'title, 'title-toc and 'toc. title-toc
> >   is a combined page containing the title and the toc. Multiple
> >   entries are possible.
> 
> This sounds orthogonal to multipage export. May you please illustrate
> what you want to achieve by introducing this option? Maybe there is an
> existing feature that can be re-used instead of creating something new?

Could be: The toc as a first page is needed, when you don't want a toc
on the side of each html page, e.g. when using the classical info
layout. And it might be necessary to be able to distinguish between a
separate title page with author and the toc on the next page (or a
combined page with title and toc or no front matter at all because the
title appears on every page). If this is possible with already
existing options, even better. I just think that it might be necessary
to be able to distinguish between the needs for html output format
vs. the needs for LaTex or single-page output without having to edit
the document (I need that as my publishing chain is going to export
info, html multipage, pdf output and html single-page output using the
same org file).

> > - org-html-multipage-join-first-subsection
> >
> >   Boolean: Non-nil means that the first subsection of a section
> >   without a body will be joined on the section page
> >   (recursively). See my generated example pages linked below
> >   (Chapters 4, 5 and 7 for a recursive example)
> 
> Sorry, but I cannot understand anything from there. May you explain in
> words?

Consider a case like this:

* Headline 1
** Headline 2
*** Headline 3
    Text for Headline 3

Without the above option, Headline 1, Headline 2 and Headline 3 would
be on separate pages with Headline 1 and Headline 2 being empty pages
with just the Headline. The option puts all three Headlines and the
Contents of Headline 3 on the same page. See here:

https://www.selma.hfmdk-frankfurt.de/finnendahl/klangsynthesebuch

Chapters 4, 4.8, 5, 5.4 and 6 (two Headline levels combined) and
Chapter 7 (three Headline levels combined) are examples of joined
headlines and the other (sub)chapters are examples, how Chapters
containing body text are handled. It's mainly a matter of style but in
some situations it doesn't make much sense to me to add content below
a headline just to avoid an empty page in multipage html output.

> > - org-html-multipage-split
> >
> >   How to split the document. Possible values are
> >
> >   'toc for generating a page for each toc entry.
> 
> May I guess that the previous option may have something do with
> situation when #+TOC: keyword is in the middle of a text?

No: In the online document of the link above the page splitting
follows the toc (with the exception of the page joining explained
above), meaning that each visible toc entry will generate one page. Be
aware that this is not obvious on the online page as subfolders are
folded automatically using the css (folded elements have the class
"toc-hidden"). If you look at the html page source you can see that
every page contains the full toc to enable other css or js based
styling decisions.

> Do I understand correctly that your alternative layout is simply a
> question of custom #+HTML_HEADER? Or is there something more to it?

In my layout the main difference is that the nav left and nav right
elements are part of the page-main-body rather than part of
<content>. I'm not positive this is elegantly manageable with css,
when the navigation is outside the page-main-body.

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 15:56                   ` Ihor Radchenko
@ 2024-07-08 19:18                     ` Orm Finnendahl
  2024-07-09 18:08                       ` Ihor Radchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Orm Finnendahl @ 2024-07-08 19:18 UTC (permalink / raw)
  To: emacs-orgmode

Hi Ihor,

Am Montag, den 08. Juli 2024 um 15:56:48 Uhr (+0000) schrieb Ihor
Radchenko:
> 
> Or we can make `org-export-as' retain INFO channel when returning the
> output. Then, we can make `org-export-to-file' make use of the INFO
> channel to decide the file name. This way, there will be no need to
> decide the file name before running the parsing.

Are you sure that works? org-export-as currently returns a string. It
could in addition return the parse-tree in info, plus the smaller
parts which need to be exported, but we should not forget, that
org-export-as is an inferior function called from org-export-to-file
or org-export-to-buffer. But maybe I misunderstand what you mean.

Here is what is needed from my perspective:

1. parse the tree of the whole document

2. split the tree up.

3. call the export backend on each of the split parts to generate the
   string and save it to disk or do whatever is appropriate.

For me the most natural way would be that a central function
(export-according-to-org-property-list) does the parsing and then call
the different backend functions to export according to their rules
(the trees being converted in the central function or in backend
code).

If toplevel functions like org-export-to-file use org-export-as, than
org-export-as should only be concerned with generating the string but
not with reparsing.

Alternatively we can do the conversion to a string in the central
function as now with org-export-as, but there still needs to be a
mechanism to generate the different files for multipage output and
call the export backend on them to save them or whatever. Or what did
you have in mind?

> For links to external pdfs and co, we have discussed what can be done in
> https://list.orgmode.org/orgmode/87a5rpoi4c.fsf@localhost/
> TL;DR: In latex, \href{file.pdf#anchor} works; In web, anchors should
> also work with pdfjs.

Thanks, I'll check that out.

--
Orm


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 19:12                 ` Orm Finnendahl
@ 2024-07-09 17:55                   ` Ihor Radchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-09 17:55 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

>> Is there any situation when you need to export the full document
>> vs. multipage to different places?
>
> Actually that is what I'm currently doing (and what I need for my
> publishing chain): The single-page document is not in the html folder
> used for the multipage document. Both files happen to have the same
> name so it wouldn't work out, if I want to generate single-page along
> the multipage version, without having to change the document.

If this is the case, users may potentially need similar diverging
settings for single- vs. multi- page documents for almost any given
export option, not just the ones you mentioned.

To address such situations, we may, for example, allow an alternative
"multi" version of each export keyword to act specially when multipage
export is used. Consider that there is an export option #+SAMPLEOPTION.
If the document has only "#+SAMPLEOPTION: value", exporter will use it
for both normal and multipage export. However, we may allow an
alternative #+SAMPLEOPTION[multipage]: multipage value that will be used
instead when defined.

In addition to defining alternative variants of in-buffer settings, we
also need to provide the equivalent feature for custom variables
defining the export options. We can do it by treating the value of such
export-related variables specially - we may allow special values like
[org-export-variants :default default-value :multipage multipage-value]
and provide helper functions like

(org-export-set-option option-name  value) ; :default
(org-export-set-option option-name :multipage value) ; for multipage export only
(org-export-set-option option-name :singlepage value) ; just for singlepage export

(Or can be some other consistent way to define alternatives; feel free
to brainstorm)

>> > - org-html-multipage-front-matter
>> >
>> >   A list to specify pages in front of the headlines of the
>> >   document. Possible values are 'title, 'title-toc and 'toc. title-toc
>> >   is a combined page containing the title and the toc. Multiple
>> >   entries are possible.
>> 
>> This sounds orthogonal to multipage export. May you please illustrate
>> what you want to achieve by introducing this option? Maybe there is an
>> existing feature that can be re-used instead of creating something new?
>
> Could be: The toc as a first page is needed, when you don't want a toc
> on the side of each html page, e.g. when using the classical info
> layout. And it might be necessary to be able to distinguish between a
> separate title page with author and the toc on the next page (or a
> combined page with title and toc or no front matter at all because the
> title appears on every page). If this is possible with already
> existing options, even better. I just think that it might be necessary
> to be able to distinguish between the needs for html output format
> vs. the needs for LaTex or single-page output without having to edit
> the document (I need that as my publishing chain is going to export
> info, html multipage, pdf output and html single-page output using the
> same org file).

Sorry, but I still do not quite understand. May you please illustrate a
bit more with some kind of simple example?

>> > - org-html-multipage-join-first-subsection
>> >
>> >   Boolean: Non-nil means that the first subsection of a section
>> >   without a body will be joined on the section page
>> >   (recursively). See my generated example pages linked below
>> >   (Chapters 4, 5 and 7 for a recursive example)
>> 
>> Sorry, but I cannot understand anything from there. May you explain in
>> words?
>
> Consider a case like this:
>
> * Headline 1
> ** Headline 2
> *** Headline 3
>     Text for Headline 3
>
> Without the above option, Headline 1, Headline 2 and Headline 3 would
> be on separate pages with Headline 1 and Headline 2 being empty pages
> with just the Headline. The option puts all three Headlines and the
> Contents of Headline 3 on the same page. See here:

I see. It sounds useful given that your strategy to split the document
into pages is "on each headline on each level".

Conceptually, I see this as one of possible customizations for paging
strategies. Your `org-html-multipage-join-first-subsection' simply tells
to split off pages only when there is non-empty contents inside the
containing headings.

This also reveals that we may sometimes want more than just to tell how
to split the document. After splitting, we may want to rearrange the
pages differently (maybe even re-order?). In other words, multipage
export may need to:

1. Take document AST
2. Split it into multiple parts
3. Filter the obtained part list (post-process)
4. Perform actual per-page export
...

>> > - org-html-multipage-split
>> >
>> >   How to split the document. Possible values are
>> >
>> >   'toc for generating a page for each toc entry.
>> 
>> May I guess that the previous option may have something do with
>> situation when #+TOC: keyword is in the middle of a text?
>
> No: In the online document of the link above the page splitting
> follows the toc (with the exception of the page joining explained
> above), meaning that each visible toc entry will generate one page. Be
> aware that this is not obvious on the online page as subfolders are
> folded automatically using the css (folded elements have the class
> "toc-hidden"). If you look at the html page source you can see that
> every page contains the full toc to enable other css or js based
> styling decisions.

Sounds reasonable. I guess that the docstring can be improved :)

>> Do I understand correctly that your alternative layout is simply a
>> question of custom #+HTML_HEADER? Or is there something more to it?
>
> In my layout the main difference is that the nav left and nav right
> elements are part of the page-main-body rather than part of
> <content>. I'm not positive this is elegantly manageable with css,
> when the navigation is outside the page-main-body.

Sorry, but I am lost. What do you mean by "content" and what do you mean
by "page-main-body"?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: multipage html output
  2024-07-08 19:18                     ` Orm Finnendahl
@ 2024-07-09 18:08                       ` Ihor Radchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Ihor Radchenko @ 2024-07-09 18:08 UTC (permalink / raw)
  To: Orm Finnendahl; +Cc: emacs-orgmode

Orm Finnendahl <orm.finnendahl@selma.hfmdk-frankfurt.de> writes:

>> Or we can make `org-export-as' retain INFO channel when returning the
>> output. Then, we can make `org-export-to-file' make use of the INFO
>> channel to decide the file name. This way, there will be no need to
>> decide the file name before running the parsing.
>
> Are you sure that works? org-export-as currently returns a string. It
> could in addition return the parse-tree in info, plus the smaller
> parts which need to be exported, but we should not forget, that
> org-export-as is an inferior function called from org-export-to-file
> or org-export-to-buffer. But maybe I misunderstand what you mean.

That's exactly what I mean.

> Here is what is needed from my perspective:
>
> 1. parse the tree of the whole document
>
> 2. split the tree up.
>
> 3. call the export backend on each of the split parts to generate the
>    string and save it to disk or do whatever is appropriate.
>
> For me the most natural way would be that a central function
> (export-according-to-org-property-list) does the parsing and then call
> the different backend functions to export according to their rules
> (the trees being converted in the central function or in backend
> code).
>
> If toplevel functions like org-export-to-file use org-export-as, than
> org-export-as should only be concerned with generating the string but
> not with reparsing.

Sorry, but I do not understand your concern.

> Alternatively we can do the conversion to a string in the central
> function as now with org-export-as, but there still needs to be a
> mechanism to generate the different files for multipage output and
> call the export backend on them to save them or whatever. Or what did
> you have in mind?

What I have in mind is that `org-export-as' will return a list of
strings + INFO. INFO will contain data about which files to use for
saving the strings. Then, the caller does the saving and whatever is
necessary. If we write to files from `org-export-as' it will be a
massive breaking change in the expected behavior.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-07-09 18:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03  9:44 multipage html output Orm Finnendahl
2024-07-03 10:33 ` Dr. Arne Babenhauserheide
2024-07-03 10:58 ` Christian Moe
2024-07-03 11:05   ` Ihor Radchenko
2024-07-03 14:34     ` Christian Moe
2024-07-04  9:50     ` Orm Finnendahl
2024-07-04 11:41       ` Ihor Radchenko
2024-07-04 13:33         ` Orm Finnendahl
2024-07-04 16:20           ` Ihor Radchenko
2024-07-07 19:33             ` Orm Finnendahl
2024-07-08 15:29               ` Ihor Radchenko
2024-07-08 19:12                 ` Orm Finnendahl
2024-07-09 17:55                   ` Ihor Radchenko
2024-07-07 20:50             ` Orm Finnendahl
2024-07-08 15:05               ` Ihor Radchenko
2024-07-08 15:41                 ` Orm Finnendahl
2024-07-08 15:56                   ` Ihor Radchenko
2024-07-08 19:18                     ` Orm Finnendahl
2024-07-09 18:08                       ` Ihor Radchenko
2024-07-03 21:11 ` Rudolf Adamkovič
  -- strict thread matches above, loose matches on Subject: below --
2024-07-06  5:47 Pedro Andres Aranda Gutierrez
2024-07-06  9:04 ` Orm Finnendahl

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).