emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* About multilingual documents
@ 2021-05-02 20:20 Juan Manuel Macías
  2021-05-03  6:58 ` Aleksandar Dimitrov
  0 siblings, 1 reply; 18+ messages in thread
From: Juan Manuel Macías @ 2021-05-02 20:20 UTC (permalink / raw)
  To: orgmode

[-- Attachment #1: Type: text/plain, Size: 1865 bytes --]

Hi all,

I'm curious to see how other Org users deal with multilingual documents,
that is, those documents (for example, philology or linguistics texts)
that contain a significant number of online quotes in other languages.
Naturally, this makes more sense in the LaTeX backend, since it is
convenient to enclose these quotes in a \foreignlanguage command to
ensure that LaTeX at least apply the correct hyphenation patterns for
words in other languages.

Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
need to do this when it comes to languages whose script is different
from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
We can, for example, define Russian and Greek as:

#+begin_src latex
\babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
\babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
#+end_src

And also the fonts for both languages:

#+begin_src latex
\babelfont[russian]{rm}{Linux Libertine O}
\babelfont[greek]{rm}]{Free Serif}
#+end_src

For Latin-based scripts it is still necessary enclose the text in the
\foreignlanguage command. And now comes the question: how do Org users
who work in multilingual documents to obtain this command when exporting
to Latex?

I usually use macros, which always tend to work fine. But lately I have
been testing an alternative markup system using an export filter. The
idea would be something like:

%(lang) lorem ipsum dolor %()

I start from a list of the most used languages:

#+begin_src emacs-lisp
(langs '(("en" "english")
	 ("fr" "french")
	 ("de" "german")
	 ("it" "italian")
	 ("pt" "portuguese")))
#+end_src

And other possible languages that Babel supports can be indicated
explicitly, by prepending "--":

%(fr) ... %()

%(--esperanto) ... %()

(If someone wants to try it, I attach a small Org document).

Best regards,

Juan Manuel


[-- Attachment #2: test-langs.org --]
[-- Type: application/vnd.lotus-organizer, Size: 2263 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-02 20:20 About multilingual documents Juan Manuel Macías
@ 2021-05-03  6:58 ` Aleksandar Dimitrov
  2021-05-03 17:47   ` Greg Minshall
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-03  6:58 UTC (permalink / raw)
  To: emacs-orgmode

Hi Juan,

this sounds very interesting to me, as I, too, mostly write in Org
and, sometimes write documents in multiple languages, usually with
different varieties of either Latin or Cyrillic.

I have some suggestions:

Apart from the export, one of my biggest gripes is
flyspell. Specifically, the fact that you have to choose one language to
spell check the entire document with. That is insufficient in my case.

I think that the syntax you're suggesting looks good, but I'm not
sure how well it'd fit into org-mode's ecosystem. I had something in
mind that was closer to how org-babel works (it's called *babel*
for a reason, isn't it? :D)

#+begin_src org :lang pl
  … po polsku
#+end_src

#+begin_src org :lang de
  … auf deutsch
#+end_src

This would make use of org-mode's edit special environment function. It
would make it easier to persuade flyspell to do the right thing. You
could, perhaps, add

#+LANGUAGE: en

to the parent document, and then org would take care to set the correct
flyspell language (and the correct macros on LaTeX-export) and change
these parameters in the special environments.

I'm not 100% sure it should be #+begin_src org, maybe introducing a
different special environment would be better, say #+begin_lang XX where
XX is the ISO-code of said language, or the locale (think en_US
vs. en_GB.)

The drawback, and the clear disadvantage compared to your method is that
this works great only when the languages are separated by paragraph
breaks.

Therefore, I think our suggestions might be somewhat orthogonal. Yours
could be a shorthand syntax for introducing inline foreign-language
snippets.

What do you think?

Regards,
Aleks

Juan Manuel Macías writes:

> Hi all,
>
> I'm curious to see how other Org users deal with multilingual documents,
> that is, those documents (for example, philology or linguistics texts)
> that contain a significant number of online quotes in other languages.
> Naturally, this makes more sense in the LaTeX backend, since it is
> convenient to enclose these quotes in a \foreignlanguage command to
> ensure that LaTeX at least apply the correct hyphenation patterns for
> words in other languages.
>
> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
> need to do this when it comes to languages whose script is different
> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
> We can, for example, define Russian and Greek as:
>
> #+begin_src latex
> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
> #+end_src
>
> And also the fonts for both languages:
>
> #+begin_src latex
> \babelfont[russian]{rm}{Linux Libertine O}
> \babelfont[greek]{rm}]{Free Serif}
> #+end_src
>
> For Latin-based scripts it is still necessary enclose the text in the
> \foreignlanguage command. And now comes the question: how do Org users
> who work in multilingual documents to obtain this command when exporting
> to Latex?
>
> I usually use macros, which always tend to work fine. But lately I have
> been testing an alternative markup system using an export filter. The
> idea would be something like:
>
> %(lang) lorem ipsum dolor %()
>
> I start from a list of the most used languages:
>
> #+begin_src emacs-lisp
> (langs '(("en" "english")
> 	 ("fr" "french")
> 	 ("de" "german")
> 	 ("it" "italian")
> 	 ("pt" "portuguese")))
> #+end_src
>
> And other possible languages that Babel supports can be indicated
> explicitly, by prepending "--":
>
> %(fr) ... %()
>
> %(--esperanto) ... %()
>
> (If someone wants to try it, I attach a small Org document).
>
> Best regards,
>
> Juan Manuel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03  6:58 ` Aleksandar Dimitrov
@ 2021-05-03 17:47   ` Greg Minshall
  2021-05-04  7:30     ` Aleksandar Dimitrov
  2021-05-04  8:19     ` Eric S Fraga
  2021-05-03 18:48   ` About multilingual documents Joost Kremers
  2021-05-03 20:33   ` Juan Manuel Macías
  2 siblings, 2 replies; 18+ messages in thread
From: Greg Minshall @ 2021-05-03 17:47 UTC (permalink / raw)
  To: Aleksandar Dimitrov; +Cc: emacs-orgmode

Aleks, et al.,

> Apart from the export, one of my biggest gripes is
> flyspell. Specifically, the fact that you have to choose one language to
> spell check the entire document with. That is insufficient in my case.

in case it's relevant:

i also switch between languages.  but, for me (maybe i'm missing
something?) it means i switch input methods.  so, i've code bound to
(toggle-input-method) that, depending on the input method, changes the
dictionary "for" that input method.  this is not org-specific, but,
rather, works for all my emacs buffers.

cheers, Greg


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03  6:58 ` Aleksandar Dimitrov
  2021-05-03 17:47   ` Greg Minshall
@ 2021-05-03 18:48   ` Joost Kremers
  2021-05-04  8:00     ` Aleksandar Dimitrov
  2021-05-03 20:33   ` Juan Manuel Macías
  2 siblings, 1 reply; 18+ messages in thread
From: Joost Kremers @ 2021-05-03 18:48 UTC (permalink / raw)
  To: Aleksandar Dimitrov; +Cc: emacs-orgmode


[Not directly related to the OP, but might be useful to know.]

On Mon, May 03 2021, Aleksandar Dimitrov wrote:
> this sounds very interesting to me, as I, too, mostly write in Org
> and, sometimes write documents in multiple languages, usually with
> different varieties of either Latin or Cyrillic.
[...]
> Apart from the export, one of my biggest gripes is
> flyspell. Specifically, the fact that you have to choose one language to
> spell check the entire document with. That is insufficient in my case.

flyspell is basically just ispell, and ispell can be configured with different
backends. One possible backend is hunspell, which allows you to set multiple
dictionaries. So if you regularly use different languages in a buffer, you
should give hunspell a try.

[...]
> The drawback, and the clear disadvantage compared to your method is that
> this works great only when the languages are separated by paragraph
> breaks.

If that is the case, you could also check out the =guess-language= package:
<https://github.com/tmalsburg/guess-language.el>. It tries to detect the
language of the current paragraph and sets the ispell (and hence flyspell)
dictionary accordingly. I use it because I write in three different languages,
but usually don't mix them in one buffer.



-- 
Joost Kremers
Life has its moments


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03  6:58 ` Aleksandar Dimitrov
  2021-05-03 17:47   ` Greg Minshall
  2021-05-03 18:48   ` About multilingual documents Joost Kremers
@ 2021-05-03 20:33   ` Juan Manuel Macías
  2021-05-04  1:00     ` Tom Gillespie
  2021-05-04  8:44     ` Aleksandar Dimitrov
  2 siblings, 2 replies; 18+ messages in thread
From: Juan Manuel Macías @ 2021-05-03 20:33 UTC (permalink / raw)
  To: Aleksandar Dimitrov; +Cc: orgmode

Hi Aleksandar,

Thank you very much for your interesting comments. I think your idea of
applying org-babel to (multi) language support is tremendously
suggestive and, of course, more org-centric. I suppose it could be
applied also to languages within the paragraph by inline blocks... I
really liked what you propose.

Well, I admit that my marks are a bit exotic :-D. The main problem I see
is that they are not as robust as Org's own marks, since they are
controlled by an export filter. Doing some further tests, by the way, I
think it would be better to add the filter to
`org-export-filter-plain-text-functions', instead of
`...final-output-functions'. I also see that it would be convenient to
avoid their expansion in verbatim texts, with a `(unless
(org-in-verbatim-emphasis)...)'.

Anyway, I think (in general terms) it would be interesting for Org to
incorporate some multilingual support and the ability to toggle between
languages in a document, and the idea you propose seems to
me that it makes a lot of sense.

Best regards,

Juan Manuel 

Aleksandar Dimitrov writes:

> Hi Juan,
>
> this sounds very interesting to me, as I, too, mostly write in Org
> and, sometimes write documents in multiple languages, usually with
> different varieties of either Latin or Cyrillic.
>
> I have some suggestions:
>
> Apart from the export, one of my biggest gripes is
> flyspell. Specifically, the fact that you have to choose one language to
> spell check the entire document with. That is insufficient in my case.
>
> I think that the syntax you're suggesting looks good, but I'm not
> sure how well it'd fit into org-mode's ecosystem. I had something in
> mind that was closer to how org-babel works (it's called *babel*
> for a reason, isn't it? :D)
>
> #+begin_src org :lang pl
>   … po polsku
> #+end_src
>
> #+begin_src org :lang de
>   … auf deutsch
> #+end_src
>
>
> This would make use of org-mode's edit special environment function. It
> would make it easier to persuade flyspell to do the right thing. You
> could, perhaps, add
>
> #+LANGUAGE: en
>
> to the parent document, and then org would take care to set the correct
> flyspell language (and the correct macros on LaTeX-export) and change
> these parameters in the special environments.
>
> I'm not 100% sure it should be #+begin_src org, maybe introducing a
> different special environment would be better, say #+begin_lang XX where
> XX is the ISO-code of said language, or the locale (think en_US
> vs. en_GB.)
>
> The drawback, and the clear disadvantage compared to your method is that
> this works great only when the languages are separated by paragraph
> breaks.
>
> Therefore, I think our suggestions might be somewhat orthogonal. Yours
> could be a shorthand syntax for introducing inline foreign-language
> snippets.
>
> What do you think?
>
> Regards,
> Aleks
>
> Juan Manuel Macías writes:
>
>> Hi all,
>>
>> I'm curious to see how other Org users deal with multilingual documents,
>> that is, those documents (for example, philology or linguistics texts)
>> that contain a significant number of online quotes in other languages.
>> Naturally, this makes more sense in the LaTeX backend, since it is
>> convenient to enclose these quotes in a \foreignlanguage command to
>> ensure that LaTeX at least apply the correct hyphenation patterns for
>> words in other languages.
>>
>> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
>> need to do this when it comes to languages whose script is different
>> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
>> We can, for example, define Russian and Greek as:
>>
>> #+begin_src latex
>> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
>> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
>> #+end_src
>>
>> And also the fonts for both languages:
>>
>> #+begin_src latex
>> \babelfont[russian]{rm}{Linux Libertine O}
>> \babelfont[greek]{rm}]{Free Serif}
>> #+end_src
>>
>> For Latin-based scripts it is still necessary enclose the text in the
>> \foreignlanguage command. And now comes the question: how do Org users
>> who work in multilingual documents to obtain this command when exporting
>> to Latex?
>>
>> I usually use macros, which always tend to work fine. But lately I have
>> been testing an alternative markup system using an export filter. The
>> idea would be something like:
>>
>> %(lang) lorem ipsum dolor %()
>>
>> I start from a list of the most used languages:
>>
>> #+begin_src emacs-lisp
>> (langs '(("en" "english")
>> 	 ("fr" "french")
>> 	 ("de" "german")
>> 	 ("it" "italian")
>> 	 ("pt" "portuguese")))
>> #+end_src
>>
>> And other possible languages that Babel supports can be indicated
>> explicitly, by prepending "--":
>>
>> %(fr) ... %()
>>
>> %(--esperanto) ... %()
>>
>> (If someone wants to try it, I attach a small Org document).
>>
>> Best regards,
>>
>> Juan Manuel
>
>

-- 
--


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03 20:33   ` Juan Manuel Macías
@ 2021-05-04  1:00     ` Tom Gillespie
  2021-05-04  8:13       ` Aleksandar Dimitrov
  2021-05-04  8:44     ` Aleksandar Dimitrov
  1 sibling, 1 reply; 18+ messages in thread
From: Tom Gillespie @ 2021-05-04  1:00 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Aleksandar Dimitrov, orgmode

I like Aleksandar's solution quite a bit because it also works inline
e.g. as src_org[:lang de]{Meine deutsch ist zher schlect!}. In
principle this means that you could leverage the org-babel and org-src
buffer system to get flyspell results in that language in line as well
(though I don't think transporting overlays into the original buffer
has been implemented). Best!
Tom


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03 17:47   ` Greg Minshall
@ 2021-05-04  7:30     ` Aleksandar Dimitrov
  2021-05-04 17:09       ` Maxim Nikulin
  2021-05-04  8:19     ` Eric S Fraga
  1 sibling, 1 reply; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-04  7:30 UTC (permalink / raw)
  To: emacs-orgmode

Hi Greg,

>> Apart from the export, one of my biggest gripes is
>> flyspell. Specifically, the fact that you have to choose one language to
>> spell check the entire document with. That is insufficient in my case.
>
> in case it's relevant:
>
> i also switch between languages.  but, for me (maybe i'm missing
> something?) it means i switch input methods.  so, i've code bound to
> (toggle-input-method) that, depending on the input method, changes the
> dictionary "for" that input method.  this is not org-specific, but,
> rather, works for all my emacs buffers.

I don't usually switch input methods. Instead I rely on the X-Server's
facilities, including group toggles and XCompose. For example I use
XCompose to write all languages with a Latin alphabet without having to
switch layouts/input methods.

Cheers,
Aleks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03 18:48   ` About multilingual documents Joost Kremers
@ 2021-05-04  8:00     ` Aleksandar Dimitrov
  0 siblings, 0 replies; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-04  8:00 UTC (permalink / raw)
  To: Joost Kremers; +Cc: emacs-orgmode

Hi Joost

> [Not directly related to the OP, but might be useful to know.]
>
> On Mon, May 03 2021, Aleksandar Dimitrov wrote:
>> this sounds very interesting to me, as I, too, mostly write in Org
>> and, sometimes write documents in multiple languages, usually with
>> different varieties of either Latin or Cyrillic.
> [...]
>> Apart from the export, one of my biggest gripes is
>> flyspell. Specifically, the fact that you have to choose one language to
>> spell check the entire document with. That is insufficient in my case.
>
> flyspell is basically just ispell, and ispell can be configured with different
> backends. One possible backend is hunspell, which allows you to set multiple
> dictionaries. So if you regularly use different languages in a buffer, you
> should give hunspell a try.
>
> [...]
>> The drawback, and the clear disadvantage compared to your method is that
>> this works great only when the languages are separated by paragraph
>> breaks.
>
> If that is the case, you could also check out the =guess-language= package:
> <https://github.com/tmalsburg/guess-language.el>. It tries to detect the
> language of the current paragraph and sets the ispell (and hence flyspell)
> dictionary accordingly. I use it because I write in three different languages,
> but usually don't mix them in one buffer.

Thanks for your hints! =guess-language= seems really cool! I also didn't
know hunspell supported more than one dictionary.

Thanks!
Aleks



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-04  1:00     ` Tom Gillespie
@ 2021-05-04  8:13       ` Aleksandar Dimitrov
  0 siblings, 0 replies; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-04  8:13 UTC (permalink / raw)
  To: orgmode

> I like Aleksandar's solution quite a bit because it also works inline
> e.g. as src_org[:lang de]{Meine deutsch ist zher schlect!}. In
> principle this means that you could leverage the org-babel and org-src
> buffer system to get flyspell results in that language in line as well
> (though I don't think transporting overlays into the original buffer
> has been implemented). Best!

Oh wow, I'm learning lots of new things today, including inline-babel in
Org. I'm not sure highlighting typos in src-blocks is necessary. I think
it's enough if you can see them while you're editing the block.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03 17:47   ` Greg Minshall
  2021-05-04  7:30     ` Aleksandar Dimitrov
@ 2021-05-04  8:19     ` Eric S Fraga
  2021-05-04  8:29       ` Input methods [was: Re: About multilingual documents] Joost Kremers
  1 sibling, 1 reply; 18+ messages in thread
From: Eric S Fraga @ 2021-05-04  8:19 UTC (permalink / raw)
  To: Greg Minshall; +Cc: emacs-orgmode

On Monday,  3 May 2021 at 20:47, Greg Minshall wrote:
> but, for me (maybe i'm missing something?) it means i switch input
> methods.  

Which is what I do.

So, on this note, without hopefully hijacking the thread, maybe somebody
can tell me: what is the "default" input method, i.e. the one I get when
I start Emacs and haven't changed input methods at all?  I see no way to
get back to it once I have switched to a different one.

-- 
: Eric S Fraga via Emacs 28.0.50, Org release_9.4.5-480-g479a3d


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Input methods [was: Re: About multilingual documents]
  2021-05-04  8:19     ` Eric S Fraga
@ 2021-05-04  8:29       ` Joost Kremers
  2021-05-04  9:36         ` Eric S Fraga
  0 siblings, 1 reply; 18+ messages in thread
From: Joost Kremers @ 2021-05-04  8:29 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: emacs-orgmode


On Tue, May 04 2021, Eric S Fraga wrote:
> So, on this note, without hopefully hijacking the thread, maybe somebody
> can tell me: what is the "default" input method, i.e. the one I get when
> I start Emacs and haven't changed input methods at all?  I see no way to
> get back to it once I have switched to a different one.

It's not really an input method, more like the lack of one. You're probably
using =set-input-method= to change input methods? Check out
=toggle-input-method=. :-)

-- 
Joost Kremers
Life has its moments


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-03 20:33   ` Juan Manuel Macías
  2021-05-04  1:00     ` Tom Gillespie
@ 2021-05-04  8:44     ` Aleksandar Dimitrov
  2021-05-06 11:11       ` Juan Manuel Macías
  1 sibling, 1 reply; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-04  8:44 UTC (permalink / raw)
  To: org-mode-email

Hi Juan,

> Thank you very much for your interesting comments. I think your idea of
> applying org-babel to (multi) language support is tremendously
> suggestive and, of course, more org-centric. I suppose it could be
> applied also to languages within the paragraph by inline blocks... I
> really liked what you propose.
>
> Well, I admit that my marks are a bit exotic :-D. The main problem I see
> is that they are not as robust as Org's own marks, since they are
> controlled by an export filter. Doing some further tests, by the way, I
> think it would be better to add the filter to
> `org-export-filter-plain-text-functions', instead of
> `...final-output-functions'. I also see that it would be convenient to
> avoid their expansion in verbatim texts, with a `(unless
> (org-in-verbatim-emphasis)...)'.

What I like about =org-edit-special= is that it gives you a dedicated
little environment in a different language (either natural, or
programming language!) This allows me to focus on the task of editing it
really easily.

I must admit that I find the inline org-src notation (of which I didn't
know yet) somewhat jarring, and certainly less pleasant to read. Perhaps
we could use a similar mechanism to =org-hide-emphasis-markers= to make
it more pleasant to read. [1]

> Anyway, I think (in general terms) it would be interesting for Org to
> incorporate some multilingual support and the ability to toggle between
> languages in a document, and the idea you propose seems to
> me that it makes a lot of sense.

I definitely agree that Org would benefit from more multilingual
support. I'm not very experienced in emacs-lisp but would love to contribute.

One problem I foresee is the translation of locales into LaTeX macros
for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a
string like "en" or "en_UK" (which is readily understood by
([ai]|hun)spell) would have to be translated to the necessary
macros. For example for Polyglossia [2] the preamble would read

\setdefaultlanguage[variant=uk]{english}

And then the inline commands would have to be rendered as
\textenglish{…} or \textlang{english}{…} (probably the latter would be easier.)

I forgot what it is for LaTeX-Babel.

Note that the HTML export backend, too, could (or should) support
declaring multiple languages. [3]

There's a lot of work in there, but I would say that any implementation
effort should focus on one thing first. That could be switching the
dictionary on org-edit-special if a :lang-variable is set, or it could
be re-using what you, Juan, already wrote for LaTeX-Babel
exports. Support for Polyglossia or HTML could come at a later time.

Cheers,
Aleks

[1] https://stackoverflow.com/questions/20309842/how-to-syntax-highlight-for-org-mode-inline-source-code-src-lang/28059832#28059832
[2] https://ftp.rrze.uni-erlangen.de/ctan/macros/unicodetex/latex/polyglossia/polyglossia.pdf
[3] https://www.w3.org/International/questions/qa-html-language-declarations


>
> Best regards,
>
> Juan Manuel 
>
> Aleksandar Dimitrov writes:
>
>> Hi Juan,
>>
>> this sounds very interesting to me, as I, too, mostly write in Org
>> and, sometimes write documents in multiple languages, usually with
>> different varieties of either Latin or Cyrillic.
>>
>> I have some suggestions:
>>
>> Apart from the export, one of my biggest gripes is
>> flyspell. Specifically, the fact that you have to choose one language to
>> spell check the entire document with. That is insufficient in my case.
>>
>> I think that the syntax you're suggesting looks good, but I'm not
>> sure how well it'd fit into org-mode's ecosystem. I had something in
>> mind that was closer to how org-babel works (it's called *babel*
>> for a reason, isn't it? :D)
>>
>> #+begin_src org :lang pl
>>   … po polsku
>> #+end_src
>>
>> #+begin_src org :lang de
>>   … auf deutsch
>> #+end_src
>>
>>
>> This would make use of org-mode's edit special environment function. It
>> would make it easier to persuade flyspell to do the right thing. You
>> could, perhaps, add
>>
>> #+LANGUAGE: en
>>
>> to the parent document, and then org would take care to set the correct
>> flyspell language (and the correct macros on LaTeX-export) and change
>> these parameters in the special environments.
>>
>> I'm not 100% sure it should be #+begin_src org, maybe introducing a
>> different special environment would be better, say #+begin_lang XX where
>> XX is the ISO-code of said language, or the locale (think en_US
>> vs. en_GB.)
>>
>> The drawback, and the clear disadvantage compared to your method is that
>> this works great only when the languages are separated by paragraph
>> breaks.
>>
>> Therefore, I think our suggestions might be somewhat orthogonal. Yours
>> could be a shorthand syntax for introducing inline foreign-language
>> snippets.
>>
>> What do you think?
>>
>> Regards,
>> Aleks
>>
>> Juan Manuel Macías writes:
>>
>>> Hi all,
>>>
>>> I'm curious to see how other Org users deal with multilingual documents,
>>> that is, those documents (for example, philology or linguistics texts)
>>> that contain a significant number of online quotes in other languages.
>>> Naturally, this makes more sense in the LaTeX backend, since it is
>>> convenient to enclose these quotes in a \foreignlanguage command to
>>> ensure that LaTeX at least apply the correct hyphenation patterns for
>>> words in other languages.
>>>
>>> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
>>> need to do this when it comes to languages whose script is different
>>> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
>>> We can, for example, define Russian and Greek as:
>>>
>>> #+begin_src latex
>>> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
>>> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
>>> #+end_src
>>>
>>> And also the fonts for both languages:
>>>
>>> #+begin_src latex
>>> \babelfont[russian]{rm}{Linux Libertine O}
>>> \babelfont[greek]{rm}]{Free Serif}
>>> #+end_src
>>>
>>> For Latin-based scripts it is still necessary enclose the text in the
>>> \foreignlanguage command. And now comes the question: how do Org users
>>> who work in multilingual documents to obtain this command when exporting
>>> to Latex?
>>>
>>> I usually use macros, which always tend to work fine. But lately I have
>>> been testing an alternative markup system using an export filter. The
>>> idea would be something like:
>>>
>>> %(lang) lorem ipsum dolor %()
>>>
>>> I start from a list of the most used languages:
>>>
>>> #+begin_src emacs-lisp
>>> (langs '(("en" "english")
>>> 	 ("fr" "french")
>>> 	 ("de" "german")
>>> 	 ("it" "italian")
>>> 	 ("pt" "portuguese")))
>>> #+end_src
>>>
>>> And other possible languages that Babel supports can be indicated
>>> explicitly, by prepending "--":
>>>
>>> %(fr) ... %()
>>>
>>> %(--esperanto) ... %()
>>>
>>> (If someone wants to try it, I attach a small Org document).
>>>
>>> Best regards,
>>>
>>> Juan Manuel
>>
>>
>
> -- 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Input methods [was: Re: About multilingual documents]
  2021-05-04  8:29       ` Input methods [was: Re: About multilingual documents] Joost Kremers
@ 2021-05-04  9:36         ` Eric S Fraga
  0 siblings, 0 replies; 18+ messages in thread
From: Eric S Fraga @ 2021-05-04  9:36 UTC (permalink / raw)
  To: Joost Kremers; +Cc: emacs-orgmode

On Tuesday,  4 May 2021 at 10:29, Joost Kremers wrote:
> It's not really an input method, more like the lack of one. You're probably
> using =set-input-method= to change input methods? Check out
> =toggle-input-method=. :-)

Ah, interesting.  A lack of input method.  Kind of non-obvious.  But the
documentation for toggle-input-method explains it perfectly.

Thank you!

-- 
: Eric S Fraga via Emacs 28.0.50, Org release_9.4.5-480-g479a3d


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
@ 2021-05-04 11:43 autofrettage
  0 siblings, 0 replies; 18+ messages in thread
From: autofrettage @ 2021-05-04 11:43 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

Hi,

I must confess I haven't followed all the nooks and crannies of this subject, but when I browsed through the latest batch of contributions, I noticed that one simple (=crude) workaround hasn't been mentioned; Indirect buffers.

If one uses one indirect buffer per language, it should be possible to select a separate flyspell language for each buffer. Jumping between buffers/windows is perhaps less of a hassle than constantly switching spell checking languages.

I suspect the ambitions of the general list member is higher than that, but this workaround could ease the pain for some of us.

Cheers
Rasmus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-04  7:30     ` Aleksandar Dimitrov
@ 2021-05-04 17:09       ` Maxim Nikulin
  2021-05-04 18:55         ` Aleksandar Dimitrov
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-05-04 17:09 UTC (permalink / raw)
  To: emacs-orgmode

On 04/05/2021 14:30, Aleksandar Dimitrov wrote:
> 
> I don't usually switch input methods. Instead I rely on the X-Server's
> facilities, including group toggles and XCompose. For example I use
> XCompose to write all languages with a Latin alphabet without having to
> switch layouts/input methods.

You mentioned Cyrillic, and it is inconvenient to switch keyboard layout 
(Xkb group) for any command (C-c ...). Unfortunately keymaps in emacs 
are unaware of keysyms from "base" group when another group is active. 
On the other hand, emacs input method requires special tricks to keep 
emacs window (almost) always with latin keyboard layout while other 
applications rely on xkb.

On 04/05/2021 15:19, Eric S Fraga wrote:
> So, on this note, without hopefully hijacking the thread, maybe somebody
> can tell me: what is the "default" input method, i.e. the one I get when
> I start Emacs and haven't changed input methods at all?

Default input method depend on locale. E.g. en_US.UTF-8 does not require 
anything special.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-04 17:09       ` Maxim Nikulin
@ 2021-05-04 18:55         ` Aleksandar Dimitrov
  2021-05-06 16:22           ` Maxim Nikulin
  0 siblings, 1 reply; 18+ messages in thread
From: Aleksandar Dimitrov @ 2021-05-04 18:55 UTC (permalink / raw)
  To: emacs-orgmode

Maxim Nikulin writes:

> On 04/05/2021 14:30, Aleksandar Dimitrov wrote:
>> 
>> I don't usually switch input methods. Instead I rely on the X-Server's
>> facilities, including group toggles and XCompose. For example I use
>> XCompose to write all languages with a Latin alphabet without having to
>> switch layouts/input methods.
>
> You mentioned Cyrillic, and it is inconvenient to switch keyboard layout 
> (Xkb group) for any command (C-c ...). Unfortunately keymaps in emacs 
> are unaware of keysyms from "base" group when another group is active. 
> On the other hand, emacs input method requires special tricks to keep 
> emacs window (almost) always with latin keyboard layout while other 
> applications rely on xkb.

Yeah, I know the issue, which is why I rely on XCompose for Latin
scripts. For Cyrillic, alas, that is impossible. It means that I
basically can't control Emacs while using a Cyrillic layout, which is a
pity. I have no good workaround.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-04  8:44     ` Aleksandar Dimitrov
@ 2021-05-06 11:11       ` Juan Manuel Macías
  0 siblings, 0 replies; 18+ messages in thread
From: Juan Manuel Macías @ 2021-05-06 11:11 UTC (permalink / raw)
  To: Aleksandar Dimitrov; +Cc: orgmode

[-- Attachment #1: Type: text/plain, Size: 4300 bytes --]

Hi Aleksandar,

Aleksandar Dimitrov writes:
> [...]
> I must admit that I find the inline org-src notation (of which I
> didn't know yet) somewhat jarring, and certainly less pleasant to
> read. Perhaps we could use a similar mechanism to
> =org-hide-emphasis-markers= to make it more pleasant to read. [1]

You may be interested in this thread: https://orgmode.org/list/87a6r6avgg.fsf@gmail.com/

> I definitely agree that Org would benefit from more multilingual
> support. I'm not very experienced in emacs-lisp but would love to contribute.
>
> One problem I foresee is the translation of locales into LaTeX macros
> for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a
> string like "en" or "en_UK" (which is readily understood by
> ([ai]|hun)spell) would have to be translated to the necessary
> macros. For example for Polyglossia [2] the preamble would read
>
> \setdefaultlanguage[variant=uk]{english}
>
> And then the inline commands would have to be rendered as
> \textenglish{…} or \textlang{english}{…} (probably the latter would be easier.)

Since these days I had some free time, I have written this little
snippet, based on your idea. Of course, it is only a 'sketch', or a
'proof of concept'. It has obvious limitations and does not collect all
the features that your idea suggests. Here I only apply the (LaTeX)
Babel environments, but they can be easily substituted by those of
Polyglossia [1], or add both possibilities using a defcustom. I have put
two options: `:lang' and `:lang-quotes'. The second option is to use it
with the csquotes package. As I have only focused on exporting to LaTeX
I have not included support for html (or odt), but I agree with you that
it would be necessary to add some multilingual support as well for these
backends. And there's no support for inline blocks either, as the output
of the variables I've added is multiline. Anyway, it is a very hasty
sketch (maybe too hasty ;-)), but if you want to try it, I attach here a
small test document.

The code:

#+begin_src emacs-lisp
  (defun my-lang-org-backend (lang body)
    (cond
     ((org-export-derived-backend-p org-export-current-backend 'latex)
      (format "@@latex:\\begin{otherlanguage}{%s}@@\n%s\n@@latex:\\end{otherlanguage}@@" lang body))
     ((or (org-export-derived-backend-p org-export-current-backend 'html)
          (org-export-derived-backend-p org-export-current-backend 'odt))
      (format "%s" body))))

  (defun my-lang-csquotes-org-backend (lang body)
    (cond
     ((org-export-derived-backend-p org-export-current-backend 'latex)
      (format "@@latex:\\begin{otherlanguage*}{%s}\n\\EnableQuotes@@\n%s\n@@latex:\\end{otherlanguage*}@@" lang body))
     ((or (org-export-derived-backend-p org-export-current-backend 'html)
          (org-export-derived-backend-p org-export-current-backend 'odt))
      (format "%s" body))))

  (defun org-babel-execute:org (body params)
    "Execute a block of Org code with.
  This function is called by `org-babel-execute-src-block'."
    (let ((result-params (split-string (or (cdr (assq :results params)) "")))
          (lang (cdr (assq :lang params)))
          (lang-quotes (cdr (assq :lang-quotes params)))
          (body (org-babel-expand-body:org
                 (replace-regexp-in-string "^," "" body) params)))
      (cond
       (lang
        (my-lang-org-backend lang body))
       (lang-quotes
        (my-lang-csquotes-org-backend lang-quotes body))
       ((member "latex" result-params)
        (org-export-string-as (concat "#+Title: \n" body) 'latex t))
       ((member "html" result-params) (org-export-string-as  body 'html t))
       ((member "ascii" result-params) (org-export-string-as body 'ascii t))
       (t body))))
#+end_src

Best regards,

Juan Manuel

[1] I used Polyglossia for a while, when I migrated to XeTeX and then to
LuaTeX, and babel at that time did not support both engines. But now
Babel does give them full support and has grown so much that it has
surpassed (IMHO) to Polyglossia. I recommend taking a look at all
novelties and new functionalities that has added the current Babel
maintainer, Javier Bezos:
http://mirrors.ctan.org/macros/latex/required/babel/base/babel.pdf


[-- Attachment #2: langs-test.org --]
[-- Type: application/vnd.lotus-organizer, Size: 2120 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: About multilingual documents
  2021-05-04 18:55         ` Aleksandar Dimitrov
@ 2021-05-06 16:22           ` Maxim Nikulin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxim Nikulin @ 2021-05-06 16:22 UTC (permalink / raw)
  To: emacs-orgmode

On 05/05/2021 01:55, Aleksandar Dimitrov wrote:
> Yeah, I know the issue, which is why I rely on XCompose for Latin
> scripts. For Cyrillic, alas, that is impossible. It means that I
> basically can't control Emacs while using a Cyrillic layout, which is a
> pity. I have no good workaround.

Generally, the idea is to enable layout (Xkb group) per window and to 
reset layout to English if active window is Emacs. I have not tried 
recipes with managing Xkb group from emacs itself, e.g.
https://github.com/lislon/emacs-switch-lang
https://github.com/Mihara/kbd-indicator.el

Another approach it to set global hotkey and if Emacs is focused, send 
some special key event that is bound to switching of input method. I 
have some links but the pages are not in English. Personally, I have not 
fully polished my setup, however it works with some limitations. I 
started from bash script calling xdotool, xvkbd, and xprop. Then I 
realized that C code is not dramatically longer but it allows to avoid 
struggling with limitations of such tools.

Tim Cross suggested me to raise the question concerning keymaps in 
emacs-devel once more, but I still do not feel that I am ready to 
discussion of technical aspects (e.g. hotkey handling in applications 
that fixed similar issues several years ago)
https://orgmode.org/list/87r1lnvjh0.fsf@gmail.com




^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-06 16:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-02 20:20 About multilingual documents Juan Manuel Macías
2021-05-03  6:58 ` Aleksandar Dimitrov
2021-05-03 17:47   ` Greg Minshall
2021-05-04  7:30     ` Aleksandar Dimitrov
2021-05-04 17:09       ` Maxim Nikulin
2021-05-04 18:55         ` Aleksandar Dimitrov
2021-05-06 16:22           ` Maxim Nikulin
2021-05-04  8:19     ` Eric S Fraga
2021-05-04  8:29       ` Input methods [was: Re: About multilingual documents] Joost Kremers
2021-05-04  9:36         ` Eric S Fraga
2021-05-03 18:48   ` About multilingual documents Joost Kremers
2021-05-04  8:00     ` Aleksandar Dimitrov
2021-05-03 20:33   ` Juan Manuel Macías
2021-05-04  1:00     ` Tom Gillespie
2021-05-04  8:13       ` Aleksandar Dimitrov
2021-05-04  8:44     ` Aleksandar Dimitrov
2021-05-06 11:11       ` Juan Manuel Macías
  -- strict thread matches above, loose matches on Subject: below --
2021-05-04 11:43 autofrettage

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).