all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* UTF-8 characters in comments of a program
@ 2023-10-20 22:53 Heime
  2023-10-21  7:48 ` Eli Zaretskii
  2023-10-21 10:25 ` Emanuel Berg
  0 siblings, 2 replies; 16+ messages in thread
From: Heime @ 2023-10-20 22:53 UTC (permalink / raw)
  To: Heime via Users list for the GNU Emacs text editor


Is it allowed to have UTF-8 characters in comments of a program (elisp, fortran, C, C++, latex) ?




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-20 22:53 UTF-8 characters in comments of a program Heime
@ 2023-10-21  7:48 ` Eli Zaretskii
  2023-10-21 10:48   ` Heime
  2023-10-21 13:19   ` Jonathon McKitrick via Users list for the GNU Emacs text editor
  2023-10-21 10:25 ` Emanuel Berg
  1 sibling, 2 replies; 16+ messages in thread
From: Eli Zaretskii @ 2023-10-21  7:48 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Fri, 20 Oct 2023 22:53:38 +0000
> From: Heime <heimeborgia@protonmail.com>
> 
> 
> Is it allowed to have UTF-8 characters in comments of a program (elisp, fortran, C, C++, latex) ?

There's no such thing as "UTF-8 characters".  UTF-8 is an encoding of
Unicode character set.

If you want to know whether language compilers and interpreters accept
UTF-8 encoded characters, then you will need to consult the
documentation of the relevant compiler.  AFAIK, C/C++ compilers
support this only in recent versions.  For Emacs Lisp, the answer is
YES, as the default encoding of ELisp files is UTF-8.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-20 22:53 UTF-8 characters in comments of a program Heime
  2023-10-21  7:48 ` Eli Zaretskii
@ 2023-10-21 10:25 ` Emanuel Berg
  1 sibling, 0 replies; 16+ messages in thread
From: Emanuel Berg @ 2023-10-21 10:25 UTC (permalink / raw)
  To: help-gnu-emacs

Heime wrote:

> Is it allowed to have UTF-8 characters in comments of
> a program (elisp, fortran, C, C++, latex) ?

It is sometimes allowed depending on what tools are involved
in the build process. But it is safer to stick with ASCII
which is always sufficient, especially for your examples,
which are all either old or very old by now.

Lisp    1958 List Processor/Processing. USA
Fortran 1957 Formula Translation. IBM, John Backus
C       1972 Dennis Ritchie, Bell Labs
C++     1983 "C with classes", Bjarne Stroustrup
LaTeX   1984 typesetting for PDF. Leslie Lamport, SRI, California
Elisp   1985 Emacs Lisp

https://dataswamp.org/~incal/sth/scripts/hist
https://dataswamp.org/~incal/COMP-HIST

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21  7:48 ` Eli Zaretskii
@ 2023-10-21 10:48   ` Heime
  2023-10-21 11:24     ` Eli Zaretskii
  2023-10-21 13:19   ` Jonathon McKitrick via Users list for the GNU Emacs text editor
  1 sibling, 1 reply; 16+ messages in thread
From: Heime @ 2023-10-21 10:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs



------- Original Message -------
On Saturday, October 21st, 2023 at 7:48 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Fri, 20 Oct 2023 22:53:38 +0000
> > From: Heime heimeborgia@protonmail.com
> > 
> > Is it allowed to have UTF-8 characters in comments of a program (elisp, fortran, C, C++, latex) ?
> 
> 
> There's no such thing as "UTF-8 characters". UTF-8 is an encoding of
> Unicode character set.
> 
> If you want to know whether language compilers and interpreters accept
> UTF-8 encoded characters, then you will need to consult the
> documentation of the relevant compiler. 

> AFAIK, C/C++ compilers support this only in recent versions. 

Would you know if there is support for languages in Gnu GCC ?

> For Emacs Lisp, the answer is YES, as the default encoding of ELisp files is UTF-8.

My difficulty is how am I going to introduce them to an elisp source file.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 10:48   ` Heime
@ 2023-10-21 11:24     ` Eli Zaretskii
  2023-10-21 11:36       ` Heime
  2023-10-21 15:54       ` Basile Starynkevitch
  0 siblings, 2 replies; 16+ messages in thread
From: Eli Zaretskii @ 2023-10-21 11:24 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sat, 21 Oct 2023 10:48:10 +0000
> From: Heime <heimeborgia@protonmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> > AFAIK, C/C++ compilers support this only in recent versions. 
> 
> Would you know if there is support for languages in Gnu GCC ?

Depends on the version, and I don't remember which one started
supporting it, sorry.

> > For Emacs Lisp, the answer is YES, as the default encoding of ELisp files is UTF-8.
> 
> My difficulty is how am I going to introduce them to an elisp source file.

Just type ? followed by the character, which you can type via "C-x 8 RET"
followed by the Unicode codepoint in hex.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 11:24     ` Eli Zaretskii
@ 2023-10-21 11:36       ` Heime
  2023-10-21 11:46         ` Eli Zaretskii
  2023-10-21 15:54       ` Basile Starynkevitch
  1 sibling, 1 reply; 16+ messages in thread
From: Heime @ 2023-10-21 11:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs






Sent with Proton Mail secure email.

------- Original Message -------
On Saturday, October 21st, 2023 at 11:24 PM, Eli Zaretskii <eliz@gnu.org> wrote:


> > Date: Sat, 21 Oct 2023 10:48:10 +0000
> > From: Heime heimeborgia@protonmail.com
> > Cc: help-gnu-emacs@gnu.org
> > 
> > > AFAIK, C/C++ compilers support this only in recent versions.
> > 
> > Would you know if there is support for languages in Gnu GCC ?
> 
> 
> Depends on the version, and I don't remember which one started
> supporting it, sorry.
> 
> > > For Emacs Lisp, the answer is YES, as the default encoding of ELisp files is UTF-8.
> > 
> > My difficulty is how am I going to introduce them to an elisp source file.
> 
> 
> Just type ? followed by the character, which you can type via "C-x 8 RET"
> followed by the Unicode codepoint in hex.

It would help a lot if after the name, the actual symbol is shown when using "C-x 8 RET".



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 11:36       ` Heime
@ 2023-10-21 11:46         ` Eli Zaretskii
  2023-10-21 11:51           ` Heime
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2023-10-21 11:46 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sat, 21 Oct 2023 11:36:44 +0000
> From: Heime <heimeborgia@protonmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> > Just type ? followed by the character, which you can type via "C-x 8 RET"
> > followed by the Unicode codepoint in hex.
> 
> It would help a lot if after the name, the actual symbol is shown when using "C-x 8 RET".

It is shown if you type TAB with incomplete name.  If your name is
complete, then it is shown in the buffer into which it is inserted.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 11:46         ` Eli Zaretskii
@ 2023-10-21 11:51           ` Heime
  2023-10-23 15:39             ` Leo Butler
  0 siblings, 1 reply; 16+ messages in thread
From: Heime @ 2023-10-21 11:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs






Sent with Proton Mail secure email.

------- Original Message -------
On Saturday, October 21st, 2023 at 11:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:


> > Date: Sat, 21 Oct 2023 11:36:44 +0000
> > From: Heime heimeborgia@protonmail.com
> > Cc: help-gnu-emacs@gnu.org
> > 
> > > Just type ? followed by the character, which you can type via "C-x 8 RET"
> > > followed by the Unicode codepoint in hex.
> > 
> > It would help a lot if after the name, the actual symbol is shown when using "C-x 8 RET".
> 
> 
> It is shown if you type TAB with incomplete name. If your name is
> complete, then it is shown in the buffer into which it is inserted.

Thank you.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21  7:48 ` Eli Zaretskii
  2023-10-21 10:48   ` Heime
@ 2023-10-21 13:19   ` Jonathon McKitrick via Users list for the GNU Emacs text editor
  2023-10-21 13:49     ` Emanuel Berg
  1 sibling, 1 reply; 16+ messages in thread
From: Jonathon McKitrick via Users list for the GNU Emacs text editor @ 2023-10-21 13:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sat, Oct 21, 2023 at 10:48:11AM +0300, Eli Zaretskii wrote:
: > Date: Fri, 20 Oct 2023 22:53:38 +0000
: > From: Heime <heimeborgia@protonmail.com>
: > 
: > 
: > Is it allowed to have UTF-8 characters in comments of a program (elisp, fortran, C, C++, latex) ?
: 
: There's no such thing as "UTF-8 characters".  UTF-8 is an encoding of
: Unicode character set.
: 
: If you want to know whether language compilers and interpreters accept
: UTF-8 encoded characters, then you will need to consult the
: documentation of the relevant compiler.  AFAIK, C/C++ compilers
: support this only in recent versions.  For Emacs Lisp, the answer is
: YES, as the default encoding of ELisp files is UTF-8.

A few years ago I found a bug in an input form of our web app,
and I thoroughly enjoyed writing unit tests to verify the fix,
including the 'poo' emoji. This was in Scala, BTW.


Jonathon McKitrick
--
'My other computer is your Windows box.'



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 13:19   ` Jonathon McKitrick via Users list for the GNU Emacs text editor
@ 2023-10-21 13:49     ` Emanuel Berg
  2023-10-22 11:04       ` Heime
  2023-10-22 13:44       ` Eric S Fraga
  0 siblings, 2 replies; 16+ messages in thread
From: Emanuel Berg @ 2023-10-21 13:49 UTC (permalink / raw)
  To: help-gnu-emacs

Jonathon McKitrick via Users list for the GNU Emacs text editor wrote:

>> If you want to know whether language compilers and
>> interpreters accept UTF-8 encoded characters, then you will
>> need to consult the documentation of the relevant compiler.
>> AFAIK, C/C++ compilers support this only in recent
>> versions. For Emacs Lisp, the answer is YES, as the default
>> encoding of ELisp files is UTF-8.
>
> A few years ago I found a bug in an input form of our web
> app, and I thoroughly enjoyed writing unit tests to verify
> the fix, including the 'poo' emoji. This was in Scala, BTW.

In certain applications, notably those who deal with
communication between people, those chars sure has their
place, just like support for different human languages, not
just English, obviously should be supported.

For example when I talk about countries in my smartphone
Signal app, I like to add their flags after the country names.
It spices things up and look nice and besides everyone loves
flags, right?

But in computer-computer technology and programming not so
much so IMO. I'm sure modern programming languages that are
designed and implemented today can support them, but what is
the gain, really? Maybe I'm just old-school.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 11:24     ` Eli Zaretskii
  2023-10-21 11:36       ` Heime
@ 2023-10-21 15:54       ` Basile Starynkevitch
  1 sibling, 0 replies; 16+ messages in thread
From: Basile Starynkevitch @ 2023-10-21 15:54 UTC (permalink / raw)
  To: help-gnu-emacs


On 10/21/23 13:24, Eli Zaretskii wrote:
>> Date: Sat, 21 Oct 2023 10:48:10 +0000
>> From: Heime <heimeborgia@protonmail.com>
>> Cc: help-gnu-emacs@gnu.org
>>
>>> AFAIK, C/C++ compilers support this only in recent versions.
>> Would you know if there is support for languages in Gnu GCC ?
> Depends on the version, and I don't remember which one started
> supporting it, sorry.


Recent versions (after GCC 10, and probably a few versions before) are 
supporting UTF-8 in comments and literal strings.

>
>>> For Emacs Lisp, the answer is YES, as the default encoding of ELisp files is UTF-8.
>> My difficulty is how am I going to introduce them to an elisp source file.


M-x insert-char

or copy/paste on Linux from a charmap or a browser.

> Just type ? followed by the character, which you can type via "C-x 8 RET"
> followed by the Unicode codepoint in hex.
>

NB. My pet open source software (work in progress) is the RefPerSys 
GPLv3+ inference engine project on https://github.com/RefPerSys/RefPerSys/

-- 
Basile Starynkevitch
  <basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 13:49     ` Emanuel Berg
@ 2023-10-22 11:04       ` Heime
  2023-10-22 11:30         ` Emanuel Berg
  2023-10-22 13:44       ` Eric S Fraga
  1 sibling, 1 reply; 16+ messages in thread
From: Heime @ 2023-10-22 11:04 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs






Sent with Proton Mail secure email.

------- Original Message -------
On Sunday, October 22nd, 2023 at 1:49 AM, Emanuel Berg <incal@dataswamp.org> wrote:


> Jonathon McKitrick via Users list for the GNU Emacs text editor wrote:
> 
> > > If you want to know whether language compilers and
> > > interpreters accept UTF-8 encoded characters, then you will
> > > need to consult the documentation of the relevant compiler.
> > > AFAIK, C/C++ compilers support this only in recent
> > > versions. For Emacs Lisp, the answer is YES, as the default
> > > encoding of ELisp files is UTF-8.
> > 
> > A few years ago I found a bug in an input form of our web
> > app, and I thoroughly enjoyed writing unit tests to verify
> > the fix, including the 'poo' emoji. This was in Scala, BTW.
> 
> 
> In certain applications, notably those who deal with
> communication between people, those chars sure has their
> place, just like support for different human languages, not
> just English, obviously should be supported.
> 
> For example when I talk about countries in my smartphone
> Signal app, I like to add their flags after the country names.
> It spices things up and look nice and besides everyone loves
> flags, right?
> 
> But in computer-computer technology and programming not so
> much so IMO. I'm sure modern programming languages that are
> designed and implemented today can support them, but what is
> the gain, really? Maybe I'm just old-school.

It is not old-school.  It is western-school, because some writing systems
it is customary to introduce ideograms representing concepts or ideas 
rather than a specific word in a language.  Examples include Cuneiform,
Egyptian and Anatolian Hieroglyphs, Mayan, Chinese Scripts, Japanese, the
list is not short.
 
> --
> underground experts united
> https://dataswamp.org/~incal



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-22 11:04       ` Heime
@ 2023-10-22 11:30         ` Emanuel Berg
  0 siblings, 0 replies; 16+ messages in thread
From: Emanuel Berg @ 2023-10-22 11:30 UTC (permalink / raw)
  To: help-gnu-emacs

Heime wrote:

>> But in computer-computer technology and programming not so
>> much so IMO. I'm sure modern programming languages that are
>> designed and implemented today can support them, but what
>> is the gain, really? Maybe I'm just old-school.
>
> It is not old-school. It is western-school, because some
> writing systems it is customary to introduce ideograms
> representing concepts or ideas rather than a specific word
> in a language. Examples include Cuneiform, Egyptian and
> Anatolian Hieroglyphs, Mayan, Chinese Scripts, Japanese, the
> list is not short.

The computer revolution - the transistor and later the PC -
are all western things. In particular they are American
things. And what made them possible - the industrial
revolution - is a western thing as well, in particular an
English thing. Pretending to program in another language in
2023 won't change that.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 13:49     ` Emanuel Berg
  2023-10-22 11:04       ` Heime
@ 2023-10-22 13:44       ` Eric S Fraga
  2023-10-23  5:51         ` Emanuel Berg
  1 sibling, 1 reply; 16+ messages in thread
From: Eric S Fraga @ 2023-10-22 13:44 UTC (permalink / raw)
  To: help-gnu-emacs

On Saturday, 21 Oct 2023 at 15:49, Emanuel Berg wrote:
> I'm sure modern programming languages that are designed and
> implemented today can support them, but what is the gain, really?

The gain can be an increase in readability.  For instance, being able to
use δx or ΔT as variables (e.g. in Julia) instead of delta_x, delta_T or
similar.  Mathematical expressions become more like the actual
mathematics.  Julia, in particular, actually uses unicode characters for
some of the operations, such as

for i ϵ 1:10

although you can still write

for i in 1:10

if you need to.

And you might be able to imagine being able to define operators such as
· and × for vector arithmetic.

Etc.  US-ASCII is very limiting for expression.

Just my 2¢. 😉

-- 
Eric S Fraga via gnus (Emacs 30.0.50 2023-09-14) on Debian 12.1




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-22 13:44       ` Eric S Fraga
@ 2023-10-23  5:51         ` Emanuel Berg
  0 siblings, 0 replies; 16+ messages in thread
From: Emanuel Berg @ 2023-10-23  5:51 UTC (permalink / raw)
  To: help-gnu-emacs

Eric S Fraga wrote:

>> I'm sure modern programming languages that are designed and
>> implemented today can support them, but what is the
>> gain, really?
>
> The gain can be an increase in readability. For instance,
> being able to use δx or ΔT as variables (e.g. in Julia)
> instead of delta_x, delta_T or similar.
> Mathematical expressions become more like the actual
> mathematics. Julia, in particular, actually uses unicode
> characters for some of the operations, such as
>
> for i ϵ 1:10
>
> although you can still write
>
> for i in 1:10
>
> if you need to.
>
> And you might be able to imagine being able to define
> operators such as · and × for vector arithmetic.
>
> Etc. US-ASCII is very limiting for expression.

To me "for i in {1..10}" is as easy or easier to read and much
faster to type.

But if you grew up reading and writing code that supported and
relied upon math and scientific notation in practical
programming maybe you would like it better, it is possible.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UTF-8 characters in comments of a program
  2023-10-21 11:51           ` Heime
@ 2023-10-23 15:39             ` Leo Butler
  0 siblings, 0 replies; 16+ messages in thread
From: Leo Butler @ 2023-10-23 15:39 UTC (permalink / raw)
  To: Heime; +Cc: Eli Zaretskii, help-gnu-emacs@gnu.org

On Sat, Oct 21 2023, Heime <heimeborgia@protonmail.com> wrote:

> Sent with Proton Mail secure email.
>
> ------- Original Message -------
> On Saturday, October 21st, 2023 at 11:46 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>
>> > Date: Sat, 21 Oct 2023 11:36:44 +0000
>> > From: Heime heimeborgia@protonmail.com
>> > Cc: help-gnu-emacs@gnu.org
>> > 
>> > > Just type ? followed by the character, which you can type via "C-x 8 RET"
>> > > followed by the Unicode codepoint in hex.
>> > 
>> > It would help a lot if after the name, the actual symbol is shown when using "C-x 8 RET".
>> 
>> 
>> It is shown if you type TAB with incomplete name. If your name is
>> complete, then it is shown in the buffer into which it is inserted.
>
> Thank you.

Since you mentioned latex in your original post, then you are perhaps
familiar with how to get (la)tex to emit ê (\^e). You can use the TeX
input method to do this in emacs: in a buffer, type C-\ TeX RET. Then,
most plain tex commands, like \alpha or \"e, are translated to glyphs
like α or ë.

For more information, see:

(info "(emacs) Input Methods")

One additional tip: if you have a glyph like € and you want to know how
to enter it, put point on top of it and type C-u C-x =.

Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-10-23 15:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-20 22:53 UTF-8 characters in comments of a program Heime
2023-10-21  7:48 ` Eli Zaretskii
2023-10-21 10:48   ` Heime
2023-10-21 11:24     ` Eli Zaretskii
2023-10-21 11:36       ` Heime
2023-10-21 11:46         ` Eli Zaretskii
2023-10-21 11:51           ` Heime
2023-10-23 15:39             ` Leo Butler
2023-10-21 15:54       ` Basile Starynkevitch
2023-10-21 13:19   ` Jonathon McKitrick via Users list for the GNU Emacs text editor
2023-10-21 13:49     ` Emanuel Berg
2023-10-22 11:04       ` Heime
2023-10-22 11:30         ` Emanuel Berg
2023-10-22 13:44       ` Eric S Fraga
2023-10-23  5:51         ` Emanuel Berg
2023-10-21 10:25 ` Emanuel Berg

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.