From: Mark Shoulson <mark@kli.org>
To: emacs-orgmode@gnu.org
Subject: Re: Smart Quotes Exporting
Date: Fri, 15 Jun 2012 16:20:43 +0000 (UTC) [thread overview]
Message-ID: <loom.20120615T171057-967@post.gmane.org> (raw)
In-Reply-To: 874nqgeke6.fsf@gmail.com
Nicolas Goaziou <n.goaziou <at> gmail.com> writes:
>
> Hello,
>
> Mark Shoulson <mark <at> kli.org> writes:
>
> >> ASCII exporter also handle UTF-8. So it's good to have there too.
> >
> > Really? I would have thought ASCII meant ASCII, as in 7-bit clean
> > text.
>
> org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
> encodings.
I noticed that after writing my response. The name just threw me a little.
Yes, that exporter needs to handle it too.
> > It looked to me like your solution would essentially boil down to "do
> > string handling when there's a string, otherwise recur down and find
> > the strings," which essentially means apply it to all the
> > strings... and there were already functions out there applying things
> > to strings, so this can just ride along with them. Here, let's look
> > at your suggestion and see if we can find what I missed:
> >
....
> > So, if it's a string, use the regexps (if they can be smart enough to look
at
> > beginning and end of the string, which they can--though I haven't been
using the
> > :post-blank property so presumably something is amiss), and if it isn't a
> > string, recur down until you get to a string... Ah, but only if it's in
> > org-element-recursive-objects.
>
> You're missing an important part: the regexps cannot be smart enough for
> quotes at the beginning or the end of the string. There, you must look
> outside the string. Hence:
Well, wait; regexps can make some pretty darn good guesses at the beginnings
or ends of strings. Quotations don't normally end in spaces (in the
conventions used with ""; French typography is different, but if you're using
spaces around your quotes you have worse problems (line-breaks) to worry
about). So if a string ends in space(s) followed by a quote, it's very likely
that quote is an open-quote for some stuff that comes after. Conversely, if a
string starts with a quote followed by some spaces, it's very likely a close-
quote to what went on before.
This isn't quite it; beginning-of-string followed by quote, then punctuation
and then spaces is also a close-quote, etc... There is a lot of fine-tuning.
But even what I currently have was able to handle your
Caesar said, "/Alea Jacta est./"
example. Yes, there are edge-cases which this won't catch, and it remains to
be seen how pervasive and annoying those are. It may be that repeated
tweaking of regexps will handle enough of the ordinary cases. It may be that
after a few rounds of regexp-hacking someone will finally decide that regexp-
hacking just won't handle enough of the important cases. But I think even as
it stands now we'd probably handle 80-90% of the normal situations, which
really is as much as we reasonably can hope for.
Could I trouble someone to try applying my patch and trying it out for
yourself and seeing just how bad/good the performance is? It seems to work
okay for the cases I've been trying, but maybe my dataset isn't robust
enough. Let's give it a test and seen how many actual cases in common usage
it gets wrong. Maybe see how much can be fixed by tuning regexps.
>
> > ] 1. If it has a quote as its first or last position, check for
> > ] objects before or after the string to guess its status. An
> > ] object never starts with a white space, but you may have to
> > ] check :post-blank property in order to know if previous object
> > ] had white spaces at its end.
>
> But you can only do that from the element containing the string, not
> from the string itself.
The case where a quote both sits at the edge of a string (i.e. at the border
of some element, formatting, etc) *and* does not have whitespace next to it,
with possible punctuation, does not seem to be a normal occurrence to me. If
I'm wrong, how common *is* it?
>
> > So the issue with the current state is that it
> > would wind up applying to too much? (it would hit code and verbatim
elements,
> > for example, and that would be wrong.)
>
> No, you are not applying it too much (verbatim elements don't contain
> plain-text objects) but your function hasn't got access to enough
> information to be useful.
The on-screen version, of course, will have to be smarter and check for
the "face" formatting to make sure it doesn't happen in comments or verbatims;
I am pretty sure it does not do that yet.
> > wait, called on the top-level parsed tree object, recursively doing
> > its thing before(?) the transcoders of the individual objects get to
> > it.
>
> That's called a parse tree filter. That should be a possibility
> indeed. The function would be applied on the parse tree and would
> replace strings within elements containing plain text (that is
> paragraph, verse-block and table-row types). parse tree filters are
> applied very early in the export process.
>
> Another option would be to integrate it into
> `org-element-normalize-contents', but I think the previous way is
> better.
Maybe. I know it sounds like I'm fixated on the plain-text solution, but I'm
not convinced the envisioned problems are more than theoretical, or that they
will cause an unacceptable amount of error (keeping in mind that some error
*is* acceptable and unavoidable).
> > The on-screen one would still use the plain-string computation, as you
said,
> > since the full parse isn't available.
>
> Yes.
>
> > It would also need to be tweaked not to act on verbatim/comment text,
> > etc.
>
> Yes. You may want to use `org-element-at-point' and `org-element-type'
> to tell if you're somewhere smart quotes are allowed (in table,
> table-row, paragraph, verse-block elements).
Probably. I think I saw some other package make these decisions by peeking at
the formatting and seeing if it is set in comment-face or something, but
checking the element at point is presumably more sensible.
~mark
next prev parent reply other threads:[~2012-06-15 16:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 3:32 "Smart" quotes Mark E. Shoulson
2012-05-23 22:17 ` Nicolas Goaziou
2012-05-24 3:05 ` Mark E. Shoulson
2012-05-25 17:14 ` Nicolas Goaziou
2012-05-25 17:51 ` Jambunathan K
2012-05-25 22:51 ` Mark E. Shoulson
2012-05-26 6:48 ` Nicolas Goaziou
2012-05-29 1:30 ` Mark E. Shoulson
2012-05-29 17:57 ` Nicolas Goaziou
2012-05-30 0:51 ` Mark E. Shoulson
2012-05-31 1:50 ` (no subject) Mark Shoulson
2012-05-31 13:38 ` Nicolas Goaziou
2012-05-31 23:26 ` Smart Quotes Exporting (Was: Re: (no subject)) Mark E. Shoulson
2012-06-01 17:11 ` Smart Quotes Exporting Nicolas Goaziou
2012-06-01 22:41 ` Mark E. Shoulson
2012-06-03 3:16 ` Mark E. Shoulson
2012-06-06 2:14 ` Mark E. Shoulson
2012-06-07 19:21 ` Nicolas Goaziou
2012-06-11 1:28 ` Mark Shoulson
2012-06-12 13:21 ` Nicolas Goaziou
2012-06-15 16:20 ` Mark Shoulson [this message]
2012-06-19 9:26 ` Nicolas Goaziou
2012-08-07 23:18 ` Bastien
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=loom.20120615T171057-967@post.gmane.org \
--to=mark@kli.org \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.