From: Tobias Getzner <tobias.getzner@gmx.de>
To: Aaron Ecay <aaronecay@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [BUG] Mark-up handling chokes on Unicode white-space
Date: Wed, 24 Sep 2014 09:34:25 +0200 [thread overview]
Message-ID: <1411544065.2146.1.camel@gmx.de> (raw)
In-Reply-To: <87ppemnqxy.fsf@gmail.com>
Hi Aaron,
On Di, 2014-09-23 at 14:15 -0400, Aaron Ecay wrote:
> org-emphasis-regexp-components is known to be a wart. You can search
> for posts on the mailing list. Some people are trying to figure out how
> to get rid of it. (You can search in particular for Nicolas Goaziou’s
> posts...) Here’s one thread where you can see the lay of the land:
> <http://mid.gmane.org/87zjl6ktu2.fsf@gmail.com>.
Thank you for the background info!
> All that to say, the longer-term solution is to figure out some radically
> different approach. In the meantime though, if you can provide a list of
> characters (by unicode name and/or code point) that you think should be
> added to that variable, someone might be able to add them.
I guess the straightforward way of defining white-space would be just
using the set of characters with the Unicode property WSpace=Y, and
this would be what «[:space:]», «\s«, etc., should be expected to match
on Unicode-based locales. I’m supplying a list of code-points below,
for convenience.
I agree though that defining what counts as «white space» within the
confines of org-mode is putting the cart before the horse. I’ll try to
ascertain whether the Emacs implementation of «[:space:]» really only
does 8-bit spaces, and if so I’ll see whether I can poke someone on the
Emacs bug tracker about this.
Best regards,
T.
──────────────────────────────────────────────────────────────────────
List of Unicode white-space
Below is the list of characters with the property White_Space set,
taken from the Unicode 7.0.0 character database. This includes
line-breaking white-space such as «line feed». If these are not
relevant, one can use the subset of space separators (Zs; these do not
include control characters such as Tab) and control chars (Cc).
0009..000D ; White_Space # Cc [5] <control-0009>..<control-000D>
0020 ; White_Space # Zs SPACE
0085 ; White_Space # Cc <control-0085>
00A0 ; White_Space # Zs NO-BREAK SPACE
1680 ; White_Space # Zs OGHAM SPACE MARK
2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE
2028 ; White_Space # Zl LINE SEPARATOR
2029 ; White_Space # Zp PARAGRAPH SEPARATOR
202F ; White_Space # Zs NARROW NO-BREAK SPACE
205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE
3000 ; White_Space # Zs IDEOGRAPHIC SPACE
──────────────────────────────────────────────────────────────────────
prev parent reply other threads:[~2014-09-24 7:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-23 12:44 [BUG] Mark-up handling chokes on unicode whitespace Tobias Getzner
2014-09-23 17:03 ` Aaron Ecay
2014-09-23 17:44 ` Tobias Getzner
2014-09-23 18:15 ` Aaron Ecay
2014-09-24 7:34 ` Tobias Getzner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1411544065.2146.1.camel@gmx.de \
--to=tobias.getzner@gmx.de \
--cc=aaronecay@gmail.com \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.