unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#9286: fill-paragraph destroys URLs
@ 2011-08-11 21:25 jidanni
  2011-08-20 19:58 ` Chong Yidong
  2011-08-20 20:04 ` jidanni
  0 siblings, 2 replies; 7+ messages in thread
From: jidanni @ 2011-08-11 21:25 UTC (permalink / raw)
  To: 9286; +Cc: handa, yamaoka, jidanni

Gentlemen, watch as emacs' fill-paragraph hatefully victimizes this
http://goo.gl/rThbu URL below while leaving the others unscathed.

$ cat a.txt
CD 定義:
http://goo.gl/rThbu
國內代表性網站:
http://smcj.net/
http://ragii.com/ 另參:
http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/
Our membership target is a Taiwan audience at this time.
煩請通知社團管理員您真的是否確定要加入,以免 spam.

$ LC_CTYPE=zh_TW.UTF-8 emacs a.txt
M-q
CD 定義: http://goo.gl/rThbu國內代表性網站: http://smcj.net/
http://ragii.com/ 另參: http://www.facebook.com/tg.taiwan
http://www.flickr.com/groups/tg-taiwan/ Our membership target is a
Taiwan audience at this time. 煩請通知社團管理員您真的是否確定要加入,以
免 spam.

Emacs _just assumes_ it is OK to ram 'u' into '國' if it crosses a newline.

Allow us to hit M-q on

u 國 u 國 u 國 u 國 u 國 u 國
u 國 u 國 u 國 u 國 u 國 u 國

We come up with:

u 國 u 國 u 國 u 國 u 國 u 國u 國 u 國 u 國 u 國 u 國 u 國

No kidding, deep in ones essays emacs is secretly destroying certain URLs as we speak.

My point is if emacs is brazen enough to squeeze out a newline, then it
should be brazen enough to squeeze out a space. But better yet don't be
brazen enough at all.

Further experiments
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國
but
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
becomes:
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國國 國 國 國 國 國 國 國 國 國 國 國 國
國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國 國

OK have it your way, but at least don't join syntax of u into syntax of
Chinese... P.S., don't send me a fix just for me. I'm reporting a bug
not asking for help.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#9286: fill-paragraph destroys URLs
  2011-08-11 21:25 bug#9286: fill-paragraph destroys URLs jidanni
@ 2011-08-20 19:58 ` Chong Yidong
  2019-10-09 22:30   ` Lars Ingebrigtsen
  2011-08-20 20:04 ` jidanni
  1 sibling, 1 reply; 7+ messages in thread
From: Chong Yidong @ 2011-08-20 19:58 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 9286, jidanni

If I am decoding the jidanni-speak correctly, his complaint is doing M-q
on a buffer containing

asdf
國

turns the text into

asdf國

instead of what he wants:

asdf 國


This is because line joining does not include a space if *either*
character on each side of the newline has the ?| (line-breakable)
category and an entry in fill-nospace-between-words-table.  To get the
behavior jidanni wants, we could change it so that *both* the characters
must have this property; see attached patch.

But I am not sure this is TRT in general.  Handa-san, could you weigh in
with an opinion?  Adding a space seems more or less correct to me, but I
am no expert.


*** lisp/textmodes/fill.el	2011-07-16 20:05:54 +0000
--- lisp/textmodes/fill.el	2011-08-20 19:52:41 +0000
***************
*** 482,491 ****
  	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
  	  (let ((prev (char-before (match-beginning 0)))
  		(next (following-char)))
! 	    (if (and (or (aref (char-category-set next) ?|)
! 			 (aref (char-category-set prev) ?|))
! 		     (or (aref fill-nospace-between-words-table next)
! 			 (aref fill-nospace-between-words-table prev)))
  		(delete-char -1))))))
  
    (goto-char from)
--- 482,491 ----
  	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
  	  (let ((prev (char-before (match-beginning 0)))
  		(next (following-char)))
! 	    (if (and (aref (char-category-set next) ?|)
! 		     (aref (char-category-set prev) ?|)
! 		     (aref fill-nospace-between-words-table next)
! 		     (aref fill-nospace-between-words-table prev))
  		(delete-char -1))))))
  
    (goto-char from)






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#9286: fill-paragraph destroys URLs
  2011-08-11 21:25 bug#9286: fill-paragraph destroys URLs jidanni
  2011-08-20 19:58 ` Chong Yidong
@ 2011-08-20 20:04 ` jidanni
  1 sibling, 0 replies; 7+ messages in thread
From: jidanni @ 2011-08-20 20:04 UTC (permalink / raw)
  To: cyd; +Cc: 9286

CY> If I am decoding the jidanni-speak correctly
Yes correct.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#9286: fill-paragraph destroys URLs
  2011-08-20 19:58 ` Chong Yidong
@ 2019-10-09 22:30   ` Lars Ingebrigtsen
  2019-10-10  7:43     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-09 22:30 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 34463, jidanni, Kenichi Handa, 9286

Chong Yidong <cyd@stupidchicken.com> writes:

> If I am decoding the jidanni-speak correctly, his complaint is doing M-q
> on a buffer containing
>
> asdf
> 國
>
> turns the text into
>
> asdf國
>
> instead of what he wants:
>
> asdf 國
>
> This is because line joining does not include a space if *either*
> character on each side of the newline has the ?| (line-breakable)
> category and an entry in fill-nospace-between-words-table.  To get the
> behavior jidanni wants, we could change it so that *both* the characters
> must have this property; see attached patch.
>
> But I am not sure this is TRT in general.  Handa-san, could you weigh in
> with an opinion?  Adding a space seems more or less correct to me, but I
> am no expert.

This problem is still present in Emacs 27.  This patch, from 2011, was
never applied.  I think Chong's proposal sounds logical, but like him,
I'm (ahem) no expert.

> *** lisp/textmodes/fill.el	2011-07-16 20:05:54 +0000
> --- lisp/textmodes/fill.el	2011-08-20 19:52:41 +0000
> ***************
> *** 482,491 ****
>   	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
>   	  (let ((prev (char-before (match-beginning 0)))
>   		(next (following-char)))
> ! 	    (if (and (or (aref (char-category-set next) ?|)
> ! 			 (aref (char-category-set prev) ?|))
> ! 		     (or (aref fill-nospace-between-words-table next)
> ! 			 (aref fill-nospace-between-words-table prev)))
>   		(delete-char -1))))))
>
>     (goto-char from)
> --- 482,491 ----
>   	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
>   	  (let ((prev (char-before (match-beginning 0)))
>   		(next (following-char)))
> ! 	    (if (and (aref (char-category-set next) ?|)
> ! 		     (aref (char-category-set prev) ?|)
> ! 		     (aref fill-nospace-between-words-table next)
> ! 		     (aref fill-nospace-between-words-table prev))
>   		(delete-char -1))))))
>
>     (goto-char from)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#9286: fill-paragraph destroys URLs
  2019-10-09 22:30   ` Lars Ingebrigtsen
@ 2019-10-10  7:43     ` Eli Zaretskii
  2019-10-11  6:58       ` bug#34463: " Lars Ingebrigtsen
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2019-10-10  7:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 9286, cyd, handa, jidanni, 34463

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Thu, 10 Oct 2019 00:30:54 +0200
> Cc: 34463@debbugs.gnu.org, jidanni@jidanni.org, Kenichi Handa <handa@m17n.org>,
>  9286@debbugs.gnu.org
> 
> > This is because line joining does not include a space if *either*
> > character on each side of the newline has the ?| (line-breakable)
> > category and an entry in fill-nospace-between-words-table.  To get the
> > behavior jidanni wants, we could change it so that *both* the characters
> > must have this property; see attached patch.
> >
> > But I am not sure this is TRT in general.  Handa-san, could you weigh in
> > with an opinion?  Adding a space seems more or less correct to me, but I
> > am no expert.
> 
> This problem is still present in Emacs 27.  This patch, from 2011, was
> never applied.  I think Chong's proposal sounds logical, but like him,
> I'm (ahem) no expert.

Since Kenichi didn't respond, I think we should study what the Unicode
Line-breaking Algorithm has to say about that.  Can you look there for
relevant guidance?  We don't yet implement the complete algorithm, but
some of what they say could nevertheless be used to resolve this
issue.

Thanks.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#34463: bug#9286: fill-paragraph destroys URLs
  2019-10-10  7:43     ` Eli Zaretskii
@ 2019-10-11  6:58       ` Lars Ingebrigtsen
  2019-11-23 14:00         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 7+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-11  6:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 9286, cyd, handa, jidanni, 34463

Eli Zaretskii <eliz@gnu.org> writes:

> Since Kenichi didn't respond, I think we should study what the Unicode
> Line-breaking Algorithm has to say about that.  Can you look there for
> relevant guidance?  We don't yet implement the complete algorithm, but
> some of what they say could nevertheless be used to resolve this
> issue.

That would be this:

https://unicode.org/reports/tr14/

I have just skimmed it, but I can't see that it says anything helpful
about filling/folding lines.

If I read it correctly, then it's perfectly allowed to line-break

asdf國

into

asdf
國

But it doesn't say what software should do when filling

asdf
國

Presumably filling that into

asdf國

would be correct in many circumstances, but as Dan said, if it's really

http://google.com
國

then filling that into 

http://google.com國

is most likely wrong.  So if we want to be cautious, then applying
Chong's patch seems to be the right thing:  Adding the space will lead
to things working more of the time, while the downside is that somebody
might prefer 

asdf國

visually.  I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#34463: bug#9286: fill-paragraph destroys URLs
  2019-10-11  6:58       ` bug#34463: " Lars Ingebrigtsen
@ 2019-11-23 14:00         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 7+ messages in thread
From: Lars Ingebrigtsen @ 2019-11-23 14:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 9286, cyd, jidanni, 34463, handa

Lars Ingebrigtsen <larsi@gnus.org> writes:

> That would be this:
>
> https://unicode.org/reports/tr14/
>
> I have just skimmed it, but I can't see that it says anything helpful
> about filling/folding lines.

Ah, this is all moot -- in Emacs 26, the
fill-separate-heterogeneous-words-with-space variable was introduced,
which gives the behaviour that Dan wants (and is similar to Chong's
patch, only guarded by that variable).

So I'm closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-23 14:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-11 21:25 bug#9286: fill-paragraph destroys URLs jidanni
2011-08-20 19:58 ` Chong Yidong
2019-10-09 22:30   ` Lars Ingebrigtsen
2019-10-10  7:43     ` Eli Zaretskii
2019-10-11  6:58       ` bug#34463: " Lars Ingebrigtsen
2019-11-23 14:00         ` Lars Ingebrigtsen
2011-08-20 20:04 ` jidanni

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).