emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About multilingual documents


From: Aleksandar Dimitrov
Subject: Re: About multilingual documents
Date: Tue, 04 May 2021 10:44:43 +0200
User-agent: mu4e 1.5.5; emacs 28.0.50

Hi Juan,

> Thank you very much for your interesting comments. I think your idea of
> applying org-babel to (multi) language support is tremendously
> suggestive and, of course, more org-centric. I suppose it could be
> applied also to languages within the paragraph by inline blocks... I
> really liked what you propose.
>
> Well, I admit that my marks are a bit exotic :-D. The main problem I see
> is that they are not as robust as Org's own marks, since they are
> controlled by an export filter. Doing some further tests, by the way, I
> think it would be better to add the filter to
> `org-export-filter-plain-text-functions', instead of
> `...final-output-functions'. I also see that it would be convenient to
> avoid their expansion in verbatim texts, with a `(unless
> (org-in-verbatim-emphasis)...)'.

What I like about =org-edit-special= is that it gives you a dedicated
little environment in a different language (either natural, or
programming language!) This allows me to focus on the task of editing it
really easily.

I must admit that I find the inline org-src notation (of which I didn't
know yet) somewhat jarring, and certainly less pleasant to read. Perhaps
we could use a similar mechanism to =org-hide-emphasis-markers= to make
it more pleasant to read. [1]

> Anyway, I think (in general terms) it would be interesting for Org to
> incorporate some multilingual support and the ability to toggle between
> languages in a document, and the idea you propose seems to
> me that it makes a lot of sense.

I definitely agree that Org would benefit from more multilingual
support. I'm not very experienced in emacs-lisp but would love to contribute.

One problem I foresee is the translation of locales into LaTeX macros
for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a
string like "en" or "en_UK" (which is readily understood by
([ai]|hun)spell) would have to be translated to the necessary
macros. For example for Polyglossia [2] the preamble would read

\setdefaultlanguage[variant=uk]{english}

And then the inline commands would have to be rendered as
\textenglish{…} or \textlang{english}{…} (probably the latter would be easier.)

I forgot what it is for LaTeX-Babel.

Note that the HTML export backend, too, could (or should) support
declaring multiple languages. [3]

There's a lot of work in there, but I would say that any implementation
effort should focus on one thing first. That could be switching the
dictionary on org-edit-special if a :lang-variable is set, or it could
be re-using what you, Juan, already wrote for LaTeX-Babel
exports. Support for Polyglossia or HTML could come at a later time.

Cheers,
Aleks

[1] 
https://stackoverflow.com/questions/20309842/how-to-syntax-highlight-for-org-mode-inline-source-code-src-lang/28059832#28059832
[2] 
https://ftp.rrze.uni-erlangen.de/ctan/macros/unicodetex/latex/polyglossia/polyglossia.pdf
[3] https://www.w3.org/International/questions/qa-html-language-declarations


>
> Best regards,
>
> Juan Manuel 
>
> Aleksandar Dimitrov writes:
>
>> Hi Juan,
>>
>> this sounds very interesting to me, as I, too, mostly write in Org
>> and, sometimes write documents in multiple languages, usually with
>> different varieties of either Latin or Cyrillic.
>>
>> I have some suggestions:
>>
>> Apart from the export, one of my biggest gripes is
>> flyspell. Specifically, the fact that you have to choose one language to
>> spell check the entire document with. That is insufficient in my case.
>>
>> I think that the syntax you're suggesting looks good, but I'm not
>> sure how well it'd fit into org-mode's ecosystem. I had something in
>> mind that was closer to how org-babel works (it's called *babel*
>> for a reason, isn't it? :D)
>>
>> #+begin_src org :lang pl
>>   … po polsku
>> #+end_src
>>
>> #+begin_src org :lang de
>>   … auf deutsch
>> #+end_src
>>
>>
>> This would make use of org-mode's edit special environment function. It
>> would make it easier to persuade flyspell to do the right thing. You
>> could, perhaps, add
>>
>> #+LANGUAGE: en
>>
>> to the parent document, and then org would take care to set the correct
>> flyspell language (and the correct macros on LaTeX-export) and change
>> these parameters in the special environments.
>>
>> I'm not 100% sure it should be #+begin_src org, maybe introducing a
>> different special environment would be better, say #+begin_lang XX where
>> XX is the ISO-code of said language, or the locale (think en_US
>> vs. en_GB.)
>>
>> The drawback, and the clear disadvantage compared to your method is that
>> this works great only when the languages are separated by paragraph
>> breaks.
>>
>> Therefore, I think our suggestions might be somewhat orthogonal. Yours
>> could be a shorthand syntax for introducing inline foreign-language
>> snippets.
>>
>> What do you think?
>>
>> Regards,
>> Aleks
>>
>> Juan Manuel Macías writes:
>>
>>> Hi all,
>>>
>>> I'm curious to see how other Org users deal with multilingual documents,
>>> that is, those documents (for example, philology or linguistics texts)
>>> that contain a significant number of online quotes in other languages.
>>> Naturally, this makes more sense in the LaTeX backend, since it is
>>> convenient to enclose these quotes in a \foreignlanguage command to
>>> ensure that LaTeX at least apply the correct hyphenation patterns for
>>> words in other languages.
>>>
>>> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
>>> need to do this when it comes to languages whose script is different
>>> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
>>> We can, for example, define Russian and Greek as:
>>>
>>> #+begin_src latex
>>> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
>>> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
>>> #+end_src
>>>
>>> And also the fonts for both languages:
>>>
>>> #+begin_src latex
>>> \babelfont[russian]{rm}{Linux Libertine O}
>>> \babelfont[greek]{rm}]{Free Serif}
>>> #+end_src
>>>
>>> For Latin-based scripts it is still necessary enclose the text in the
>>> \foreignlanguage command. And now comes the question: how do Org users
>>> who work in multilingual documents to obtain this command when exporting
>>> to Latex?
>>>
>>> I usually use macros, which always tend to work fine. But lately I have
>>> been testing an alternative markup system using an export filter. The
>>> idea would be something like:
>>>
>>> %(lang) lorem ipsum dolor %()
>>>
>>> I start from a list of the most used languages:
>>>
>>> #+begin_src emacs-lisp
>>> (langs '(("en" "english")
>>>      ("fr" "french")
>>>      ("de" "german")
>>>      ("it" "italian")
>>>      ("pt" "portuguese")))
>>> #+end_src
>>>
>>> And other possible languages that Babel supports can be indicated
>>> explicitly, by prepending "--":
>>>
>>> %(fr) ... %()
>>>
>>> %(--esperanto) ... %()
>>>
>>> (If someone wants to try it, I attach a small Org document).
>>>
>>> Best regards,
>>>
>>> Juan Manuel
>>
>>
>
> -- 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]