manipulating (capitalize, lower case) unicode bold and italic characters

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* manipulating (capitalize, lower case) unicode bold and italic characters
@ 2019-07-07 19:13 Dan Hitt
  2019-07-08  2:29 ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Hitt @ 2019-07-07 19:13 UTC (permalink / raw)
  To: help-gnu-emacs

In emacs, you can use CTRL-x 8 CR and then at the prompt auto-complete a
name like "MATHEMATICAL BOLD SMALL w" (without the quotes) and get a
unicode bold w inserted into your buffer.  You could also auto-complete in
a name like "MATHEMATICAL ITALIC CAPTIAL W", for example (also without
quotes).

This is extremely useful, because, for example, you can enter a bold face
or italicized comment in a piece of code, and the font will survive exactly
the way you want notwithstanding any font-lock or other rules that may
dictate some other style that you specifically wish to override.

So it's a sort of unqualified good, but it would be very useful to be able
to select a region of text, and convert it to mathematical bold, or
mathematical italic, or fraktur, or other unicode symbols.   (Right now, i
do copy paste for this, and would like something easier.)  It would also be
good to be able to change case easily (and i guess this should work whether
the characters are greek or latin, and work with arabic digits as well).
(And such a means should try to be complete enough to work on the other
unicode goodies, such as mathematical bold script capital R.  If a
transition were impossible, then the character in question should just not
be modified.)

Is there any code out there that does this?

I'm using emacs 24.5.1 on debian 9.

TIA for any info!

dan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-07 19:13 manipulating (capitalize, lower case) unicode bold and italic characters Dan Hitt
@ 2019-07-08  2:29 ` Eli Zaretskii
  2019-07-08 13:11   ` Marcin Borkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-07-08  2:29 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Dan Hitt <dan.hitt@gmail.com>
> Date: Sun, 7 Jul 2019 12:13:11 -0700
> 
> So it's a sort of unqualified good, but it would be very useful to be able
> to select a region of text, and convert it to mathematical bold, or
> mathematical italic, or fraktur, or other unicode symbols.

What you call mathematical bold and mathematical italic is actually a
special Unicode block of characters whose intended use is to provide
symbols for mathematical formulae.  Their use as "normal" text is not
something I'd recommend, because the characters' properties are
different from those of normal text.  Nevertheless, converting ASCII
text to use these characters should be a simple matter of replacing
one character by another.

I don't understand what you mean by "fraktur", though.

> It would also be good to be able to change case easily (and i guess
> this should work whether the characters are greek or latin, and work
> with arabic digits as well).

This already does work with any character for which Unicode defines
the upper-case or lower-case pair.  Use M-l and M-u.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08  2:29 ` Eli Zaretskii
@ 2019-07-08 13:11   ` Marcin Borkowski
  2019-07-08 13:51     ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Marcin Borkowski @ 2019-07-08 13:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs


On 2019-07-08, at 04:29, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Dan Hitt <dan.hitt@gmail.com>
>> Date: Sun, 7 Jul 2019 12:13:11 -0700
>>
>> So it's a sort of unqualified good, but it would be very useful to be able
>> to select a region of text, and convert it to mathematical bold, or
>> mathematical italic, or fraktur, or other unicode symbols.
>
> What you call mathematical bold and mathematical italic is actually a
> special Unicode block of characters whose intended use is to provide
> symbols for mathematical formulae.  Their use as "normal" text is not
> something I'd recommend, because the characters' properties are
> different from those of normal text.  Nevertheless, converting ASCII
> text to use these characters should be a simple matter of replacing
> one character by another.

Not that this is tricky also because there are "holes" there,
e.g. "mathematical lowercase italic" has no letter "h".

I'd also recommend against this idea.  I'd use e.g. Markdown (possibly
with font-lock) for that.

> I don't understand what you mean by "fraktur", though.

https://en.wikipedia.org/wiki/Fraktur

>> It would also be good to be able to change case easily (and i guess
>> this should work whether the characters are greek or latin, and work
>> with arabic digits as well).
>
> This already does work with any character for which Unicode defines
> the upper-case or lower-case pair.  Use M-l and M-u.

A quick test shows that this won't work in the current state of affairs.

Best

--
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 13:11   ` Marcin Borkowski
@ 2019-07-08 13:51     ` Eli Zaretskii
  2019-07-08 17:07       ` Stefan Monnier
  2019-07-08 18:50       ` Dan Hitt
  0 siblings, 2 replies; 9+ messages in thread
From: Eli Zaretskii @ 2019-07-08 13:51 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: help-gnu-emacs@gnu.org
> Date: Mon, 08 Jul 2019 15:11:44 +0200
> 
> > I don't understand what you mean by "fraktur", though.
> 
> https://en.wikipedia.org/wiki/Fraktur

Thanks, I know what Fraktur is, I just don't understand what it has to
do with the issue at hand.  There's no "fraktur" block in Unicode,
AFAIK.

> >> It would also be good to be able to change case easily (and i guess
> >> this should work whether the characters are greek or latin, and work
> >> with arabic digits as well).
> >
> > This already does work with any character for which Unicode defines
> > the upper-case or lower-case pair.  Use M-l and M-u.
> 
> A quick test shows that this won't work in the current state of affairs.

??? Where doesn't it work, and why?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 13:51     ` Eli Zaretskii
@ 2019-07-08 17:07       ` Stefan Monnier
  2019-07-08 17:19         ` Eli Zaretskii
  2019-07-08 18:50       ` Dan Hitt
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2019-07-08 17:07 UTC (permalink / raw)
  To: help-gnu-emacs

> Thanks, I know what Fraktur is, I just don't understand what it has to
> do with the issue at hand.  There's no "fraktur" block in Unicode,
> AFAIK.

There's "MATHEMATICAL BOLD FRAKTUR CAPITAL A" and friends.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 17:07       ` Stefan Monnier
@ 2019-07-08 17:19         ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2019-07-08 17:19 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Mon, 08 Jul 2019 13:07:54 -0400
> 
> > Thanks, I know what Fraktur is, I just don't understand what it has to
> > do with the issue at hand.  There's no "fraktur" block in Unicode,
> > AFAIK.
> 
> There's "MATHEMATICAL BOLD FRAKTUR CAPITAL A" and friends.

Ah, OK.  So from the same block.

Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 13:51     ` Eli Zaretskii
  2019-07-08 17:07       ` Stefan Monnier
@ 2019-07-08 18:50       ` Dan Hitt
  2019-07-08 19:21         ` Eli Zaretskii
  1 sibling, 1 reply; 9+ messages in thread
From: Dan Hitt @ 2019-07-08 18:50 UTC (permalink / raw)
  Cc: help-gnu-emacs

On Mon, Jul 8, 2019 at 6:51 AM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Marcin Borkowski <mbork@mbork.pl>
> > Cc: help-gnu-emacs@gnu.org
> > Date: Mon, 08 Jul 2019 15:11:44 +0200
> >
> ......
>
> > >> It would also be good to be able to change case easily (and i guess
> > >> this should work whether the characters are greek or latin, and work
> > >> with arabic digits as well).
> > >
> > > This already does work with any character for which Unicode defines
> > > the upper-case or lower-case pair.  Use M-l and M-u.
> >
> > A quick test shows that this won't work in the current state of affairs.
>
> ??? Where doesn't it work, and why?
>

Thanks Eli, Marcin, and Stefan for your help.

Here's an example of what doesn't work, i think.

If you enter a mathematical italic small w (0x1D464) and a mathematical
italic capital w (0x1D44A) and do 'describe-char' for each, the small one
has the Lowercase general category, while the capital has the Uppercase
general category.  I did not know about the concept of 'case pair' in
unicode, so i guess it is possible that even though emacs knows one is
Lowercase and one is Uppercase, it is possible that it does not know that
they are in a pair.  (How would i find out, from emacs?)  The commands
downcase-region and upcase-region do not work on them, at least for me
(emacs 24.5.1, on debian 9).  (I guess what i'd want to do in that case is
keep some kind of file in .emacs.d that defines custom case-pairs.  ??)

Regarding markdown, i do like markdown, and i'm glad there's a markdown
mode.  However, for commenting code, i'm not sure how it would fit in: the
code file would have to be in two modes?  And also, even though i might be
able to convince emacs to recognize the markdown, cat would not, nor would
vi or other programs.   (I don't want to be too inauthentic here, because
normally i don't use vi for anything, but i did read some sample files with
vi, and it can pick up the unicode stuff without missing a beat.  Likewise
cat and more also have no issue with it, at least using the xfce4-terminal
in debian.)

Thanks again everybody for your help, and for educating me about case pairs
in unicode!!

dan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 18:50       ` Dan Hitt
@ 2019-07-08 19:21         ` Eli Zaretskii
  2019-07-08 22:32           ` Dan Hitt
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2019-07-08 19:21 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Dan Hitt <dan.hitt@gmail.com>
> Date: Mon, 8 Jul 2019 11:50:07 -0700
> Cc: help-gnu-emacs@gnu.org
> 
> If you enter a mathematical italic small w (0x1D464) and a mathematical
> italic capital w (0x1D44A) and do 'describe-char' for each, the small one
> has the Lowercase general category, while the capital has the Uppercase
> general category.  I did not know about the concept of 'case pair' in
> unicode, so i guess it is possible that even though emacs knows one is
> Lowercase and one is Uppercase, it is possible that it does not know that
> they are in a pair.

They are not a letter-case pair because Unicode doesn't say they are.
The fact that a character has general category lowercase doesn't yet
imply that it has a defined upper-case variant, these are two separate
attributes.  Emacs defines its case pairs according to what it finds
in the Unicode Character Database.

Of course, you can teach Emacs about the letter-case pairs yourself,
like this:

  (let ((tbl (standard-case-table)))
    (set-downcase-syntax ?𝑊 ?𝑤 tbl)
    (set-upcase-syntax ?𝑤 ?𝑊 tbl))

> (How would i find out, from emacs?)

By looking at the table returned by standard-case-table, for example:

  (aref (standard-case-table) ?A)
    => 97

but

  (aref (standard-case-table) ?𝑊)
    => nil

> The commands downcase-region and upcase-region do not work on them

They won't work on letters that have no case-pairs.

Once again: I do NOT recommend using the characters from the
Mathematical Alphanumeric Symbols block for writing English text,
that's not their purpose.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: manipulating (capitalize, lower case) unicode bold and italic characters
  2019-07-08 19:21         ` Eli Zaretskii
@ 2019-07-08 22:32           ` Dan Hitt
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Hitt @ 2019-07-08 22:32 UTC (permalink / raw)
  To: help-gnu-emacs

On Mon, Jul 8, 2019 at 12:30 PM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Dan Hitt <dan.hitt@gmail.com>
> > Date: Mon, 8 Jul 2019 11:50:07 -0700
> > Cc: help-gnu-emacs@gnu.org
> >
> > If you enter a mathematical italic small w (0x1D464) and a mathematical
> > italic capital w (0x1D44A) and do 'describe-char' for each, the small one
> > has the Lowercase general category, while the capital has the Uppercase
> > general category.  I did not know about the concept of 'case pair' in
> > unicode, so i guess it is possible that even though emacs knows one is
> > Lowercase and one is Uppercase, it is possible that it does not know that
> > they are in a pair.
>
> They are not a letter-case pair because Unicode doesn't say they are.
> The fact that a character has general category lowercase doesn't yet
> imply that it has a defined upper-case variant, these are two separate
> attributes.  Emacs defines its case pairs according to what it finds
> in the Unicode Character Database.
>
> Of course, you can teach Emacs about the letter-case pairs yourself,
> like this:
>
>   (let ((tbl (standard-case-table)))
>     (set-downcase-syntax ?𝑊 ?𝑤 tbl)
>     (set-upcase-syntax ?𝑤 ?𝑊 tbl))
>

It looks like the set-upcase-syntax function takes the same argument order
as set-downcase-syntax, so it would be
   (set-upcase-syntax ?𝑊 ?𝑤 tbl)
but otherwise this works perfectly, so thanks very much.

>
> > (How would i find out, from emacs?)
>
> By looking at the table returned by standard-case-table, for example:
>
>   (aref (standard-case-table) ?A)
>     => 97
>
> but
>
>   (aref (standard-case-table) ?𝑊)
>     => nil
>

> > The commands downcase-region and upcase-region do not work on them
>
> They won't work on letters that have no case-pairs.
>
> Once again: I do NOT recommend using the characters from the
> Mathematical Alphanumeric Symbols block for writing English text,
> that's not their purpose.
>

Well, i'm sympathetic to that view, and i think i can understand the
motivation.  For example, a piece of software might scan a file and decide
that anything in the MAS block should be parsed into a 'formula' and maybe
build up an index of such formulas.   (Although, even in this case being
able to upcase and downcase easily is useful, as statements like  '𝑤 ∈ 𝑊'
are common in mathematics.)  If i just bold-face some English text it would
confuse any such software.  So i understand there's a powerful argument to
not use the MAS block for formatting.

Thus i'm very interested in any alternatives that offer comparable
advantages to the MAS block (e.g., no markup lying around anywhere, can be
used in comments and as variables in code, persistent, immune to font-lock,
etc).

Thanks again for your help, code, and explanations!  :)

dan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-08 22:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-07 19:13 manipulating (capitalize, lower case) unicode bold and italic characters Dan Hitt
2019-07-08  2:29 ` Eli Zaretskii
2019-07-08 13:11   ` Marcin Borkowski
2019-07-08 13:51     ` Eli Zaretskii
2019-07-08 17:07       ` Stefan Monnier
2019-07-08 17:19         ` Eli Zaretskii
2019-07-08 18:50       ` Dan Hitt
2019-07-08 19:21         ` Eli Zaretskii
2019-07-08 22:32           ` Dan Hitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).