emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Bug: text export and multi-word link descriptions with line breaks
@ 2014-04-03 14:28 Mathias Bauer
  2014-04-03 15:25 ` Nicolas Goaziou
  0 siblings, 1 reply; 4+ messages in thread
From: Mathias Bauer @ 2014-04-03 14:28 UTC (permalink / raw)
  To: emacs-orgmode

Dear Maintainers,

I just stumbled over Org's plain text export and how it works on
links with descriptions consisting of multiple words and line
breaks between them.  I'm running Org stable version 8.2.5h.

Org source (spaces at the end of line 1 and 2 don't matter):

--------------------snip--------------------
"OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
...
foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
--------------------snip--------------------

Text export result:

--------------------snip--------------------
"OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz


[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991

[RFC 2440] https://tools.ietf.org/html/rfc2440

[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991
--------------------snip--------------------

These multiple references look quite bad.  Is it possible to
"normalize" the descriptions in some way *before* checking them
for uniqueness and output them thereafter?

Thanks for considering this issue.

Kind regards
Mathias

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: text export and multi-word link descriptions with line breaks
  2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer
@ 2014-04-03 15:25 ` Nicolas Goaziou
  2014-04-03 16:30   ` Mathias Bauer
  0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Goaziou @ 2014-04-03 15:25 UTC (permalink / raw)
  To: emacs-orgmode

Hello,

Mathias Bauer <mbauer@gmx.org> writes:

> I just stumbled over Org's plain text export and how it works on
> links with descriptions consisting of multiple words and line
> breaks between them.  I'm running Org stable version 8.2.5h.
>
> Org source (spaces at the end of line 1 and 2 don't matter):
>
> --------------------snip--------------------
> "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> ...
> foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> --------------------snip--------------------
>
> Text export result:
>
> --------------------snip--------------------
> "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
>
>
> [RFC 4880] https://tools.ietf.org/html/rfc4880
>
> [RFC 1991] https://tools.ietf.org/html/rfc1991
>
> [RFC 2440] https://tools.ietf.org/html/rfc2440
>
> [RFC 4880] https://tools.ietf.org/html/rfc4880
>
> [RFC 1991] https://tools.ietf.org/html/rfc1991
> --------------------snip--------------------
>
> These multiple references look quite bad.  Is it possible to
> "normalize" the descriptions in some way *before* checking them
> for uniqueness and output them thereafter?
>
> Thanks for considering this issue.

Could you be more explicit? What does look quite bad? What did you
expect instead? How is related to line breaks in the descriptions?


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: text export and multi-word link descriptions with line breaks
  2014-04-03 15:25 ` Nicolas Goaziou
@ 2014-04-03 16:30   ` Mathias Bauer
  2014-04-03 20:54     ` Nicolas Goaziou
  0 siblings, 1 reply; 4+ messages in thread
From: Mathias Bauer @ 2014-04-03 16:30 UTC (permalink / raw)
  To: emacs-orgmode

Hello Nicolas,

* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):

> Mathias Bauer <mbauer@gmx.org> writes:
>
> > I just stumbled over Org's plain text export and how it works on
> > links with descriptions consisting of multiple words and line
> > breaks between them.  I'm running Org stable version 8.2.5h.
> >
> > Org source (spaces at the end of line 1 and 2 don't matter):
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> > ...
> > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> > --------------------snip--------------------
> >
> > Text export result:
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> > 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
> >
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> >
> > [RFC 2440] https://tools.ietf.org/html/rfc2440
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> > --------------------snip--------------------
> >
> > These multiple references look quite bad.  Is it possible to
> > "normalize" the descriptions in some way *before* checking
> > them for uniqueness and output them thereafter?
>
> Could you be more explicit? What does look quite bad? What did
> you expect instead? How is related to line breaks in the
> descriptions?

Ok, let's go into more details.  See the Org source text:

1. There are three links and each of them appears twice.  The
   link targets of every two of them are identical.

2. Each of the two "[...][RFC 2440]" links appear in one line; the
   links "[...][RFC 4880]" and "[...][RFC 1991]" each have a
   newline in their description.  They are in fact
   "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively,
   "[...][RFC\n1991]" and "[...][RFC 1991]".

So, now let's examine the Org text export:

The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.

This is, at least, inconsistent.

The point is, that Org obviously considers "[...][RFC 4880]" and
"[...][RFC\n4880]" as being two different links internally and
list both of them in the reference part.  For this listing, the
\n is removed.  This is, what I called "normalization" in my
first post.

Human eyes, however, won't see any difference between this two
forms and start being surprised.

I expect, Org to do the following steps while parsing the source
text:

1. "Normalize" or clean the link description, i.e. remove any
   newlines, starting and trailing spaces, and replace any
   occurrences of "[ \t]+" in the interior by a single space
   only.  (To be done.)

2. Check the tuple (description,target) for duplicates and drop
   them.  (Seems ok to me.)

3. Below the paragraph list the tuples as "[description] target"
   in the order of occurrence in the original text.  (Also seems
   ok to me.)

I hope this makes this issue a little bit more clear now.

Kind regards,
Mathias

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: text export and multi-word link descriptions with line breaks
  2014-04-03 16:30   ` Mathias Bauer
@ 2014-04-03 20:54     ` Nicolas Goaziou
  0 siblings, 0 replies; 4+ messages in thread
From: Nicolas Goaziou @ 2014-04-03 20:54 UTC (permalink / raw)
  To: emacs-orgmode

Mathias Bauer <mbauer@gmx.org> writes:

> I expect, Org to do the following steps while parsing the source
> text:
>
> 1. "Normalize" or clean the link description, i.e. remove any
>    newlines, starting and trailing spaces, and replace any
>    occurrences of "[ \t]+" in the interior by a single space
>    only.  (To be done.)
>
> 2. Check the tuple (description,target) for duplicates and drop
>    them.  (Seems ok to me.)
>
> 3. Below the paragraph list the tuples as "[description] target"
>    in the order of occurrence in the original text.  (Also seems
>    ok to me.)
>
> I hope this makes this issue a little bit more clear now.

Indeed. I missed the duplicates links. This should be fixed.

Thank you for the report.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-03 20:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer
2014-04-03 15:25 ` Nicolas Goaziou
2014-04-03 16:30   ` Mathias Bauer
2014-04-03 20:54     ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).