unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* GSoC project "Hyphenation"?
@ 2012-03-27 16:01 Tim Landscheidt
  2012-03-27 17:48 ` Deniz Dogan
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tim Landscheidt @ 2012-03-27 16:01 UTC (permalink / raw)
  To: emacs-devel

Hi,

time and time again I have searched for "Emacs" and "hyphen-
ation", and so little results came up that I looked up "hy-
phenation" again to make sure that I hadn't misspelled it.
It seems that it is not a feature often asked for as the
typical workflow of text processing in Emacs usually in-
volves TeX or something similar, but I do find myself often
in need to hyphenate texts like mails or output of console
programs.  With Google Summer of Code around, I'd like to
propose the following idea "Hyphenation in GNU Emacs":

1. Research, define and qualify "use cases"

   Where in the Emacs world could hyphenation be used, where
   must it not be and how would it be used in a typical
   workflow?  For example, in TeX documents or program
   sources, automatic hyphenation is probably only useful in
   comments if at all.  In text modes, paragraphs are writ-
   ten, filled, edited, refilled, killed, yanked, etc.  In
   HTML and other languages, it might be useful to add soft
   hyphens to individual or all words.  In all modes, it
   might be handy to show possible hyphenations for the word
   at point.

     These use cases can be ordered according to their (pos-
   itive) effect on user productivity and difficulty of im-
   plementation.  At this stage the mentor would decide
   which of these use cases would have to be implemented as
   part of this project.

2. Research and define a high-level interface and syntax

   Based on the use cases, how would the user specify the
   hyphenation "locale" wanted?  How does that relate to
   other language-specific customizations?  How would edit-
   ing and filling functions query the hyphenation of a par-
   ticular word?  How would automatically hyphenated words
   be marked up in buffers and on disk?

3. Implement a dummy backend and set up tests

   Compile a list of hyphenated words from free sources and
   implement a backend that uses them.  Set up a test suite
   that compares the results generated by other backends
   with this.

4. Implement the frontend

   This involves amending the editing and filling functions
   so that the use cases identified in 1. can be fulfilled
   with the limited word list of the dummy backend.  This
   would also serve as the mid-term evaluation point.

5. Identify possible backends, their (legal) compatibility
   with GNU Emacs and implement them

   5.1. One of the most often used algorithms is the one de-
        veloped by Franklin Mark Liang and implemented for
        TeX.  While there are implementations even in GNU
        Emacs Lisp, the licence of the accompanying pattern
        files is often a topic of discussion so that for ex-
        ample Apache FOP outsourced them to a separate pack-
        age.

        a) Work out with FSF whether and how pattern files
           can be included in which form.  As groff does
           this, I am confident that this path can be fol-
           lowed.  Port/review and adjust an implementation
           of Liang's algorithm and enhance the Emacs build
           system by targets that import the pattern files
           and convert them to GNU Emacs Lisp.

        b) If they cannot be included, define a user inter-
           face with sensible defaults that point to their
           location elsewhere.  Candidates are installations
           of (La)TeX and the aforementioned "FOP XML Hy-
           phenation Patterns".  Implement a reader.

   5.2. There are other backends that implement other algo-
        rithms or clad Liang's in a different form.  Re-
        search whether they are popular (enough) and option-
        ally implement a connector.  If 5.1. is legally fea-
        sible, this would be an add-on.

6. Test the system and fix the bugs.

   Completion criteria would be that:

   - at least the use cases selected by the mentor in
     1. would be implemented with a non-dummy backend,

   - the source is documented to a degree that a third per-
     son who is familiar with hyphenation/the chosen algo-
     rithm understands the code so that it can be main-
     tained, and

   - no existing functionality has been broken :-).

As the project is aimed at users and Emacs developers appar-
ently didn't bother enough about hyphenation to implement it
themselves :-), I'd plan to code the project in the early
stages as a separate package that would advice the relevant
core functions so that it could be tested by users running a
regular release, and only integrate it in the regular code
late in the game.

  Comments or sentiments?

Tim




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC project "Hyphenation"?
  2012-03-27 16:01 GSoC project "Hyphenation"? Tim Landscheidt
@ 2012-03-27 17:48 ` Deniz Dogan
  2012-03-27 18:04   ` Eli Zaretskii
  2012-03-27 18:40 ` Stefan Monnier
  2016-12-23  1:09 ` hector
  2 siblings, 1 reply; 7+ messages in thread
From: Deniz Dogan @ 2012-03-27 17:48 UTC (permalink / raw)
  To: emacs-devel

On 2012-03-27 18:01, Tim Landscheidt wrote:
> Hi,
>
> time and time again I have searched for "Emacs" and "hyphen-
> ation", and so little results came up that I looked up "hy-
> phenation" again to make sure that I hadn't misspelled it.
> It seems that it is not a feature often asked for as the
> typical workflow of text processing in Emacs usually in-
> volves TeX or something similar, but I do find myself often
> in need to hyphenate texts like mails or output of console
> programs.  With Google Summer of Code around, I'd like to
> propose the following idea "Hyphenation in GNU Emacs":
>

Hi, Tim

Without having read your e-mail in its entirety, I think a proper 
implementation in Emacs should not interfere with things like searching 
for and replacing text.  E.g. if I load your e-mail in Emacs and I want 
to find all occurrences of the word "hyphenation" in it, I expect to be 
able to use isearch to find even those occurrences where hyphenation is 
taking place.

Now /that/ would be cool!

Cheers,
Deniz



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC project "Hyphenation"?
  2012-03-27 17:48 ` Deniz Dogan
@ 2012-03-27 18:04   ` Eli Zaretskii
  0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2012-03-27 18:04 UTC (permalink / raw)
  To: Deniz Dogan; +Cc: emacs-devel

> Date: Tue, 27 Mar 2012 19:48:18 +0200
> From: Deniz Dogan <deniz@dogan.se>
> 
> Without having read your e-mail in its entirety, I think a proper 
> implementation in Emacs should not interfere with things like searching 
> for and replacing text.

Should be no problem if the implementation will use `display' text
property to "insert" a hyphen without actually touching the buffer
contents.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC project "Hyphenation"?
  2012-03-27 16:01 GSoC project "Hyphenation"? Tim Landscheidt
  2012-03-27 17:48 ` Deniz Dogan
@ 2012-03-27 18:40 ` Stefan Monnier
  2012-03-28  1:01   ` Miles Bader
  2016-12-23  1:09 ` hector
  2 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2012-03-27 18:40 UTC (permalink / raw)
  To: emacs-devel

> time and time again I have searched for "Emacs" and "hyphen-
> ation", and so little results came up that I looked up "hy-
> phenation" again to make sure that I hadn't misspelled it.

I guess there's a combination of reasons for that:
- Emacs doesn't support proportional fonts very well.
- monospaced justified text is rarely used.
- => justified text is rarely used in Emacs.
- hyphenation is rarely needed for non-justified text.

E.g. without hyphenation, your email would not have been much
less pleasantly balanced.  There are only 3 paragraphs where your
hyphenation lets Emacs's filling algorithm give noticeably better
results, but by tweaking down the fill-column on a case-by-case
basis (from 60 to 57, for example), you can easily get back
something reasonably close (aesthetically) to the
hyphenated version.

Of course, a refined filling code could get even better result
when combined with hyphenation, but my point is simply that I'm
not sure the extra work needed for hyphenation is worth the
trouble at this point.  Especially since "naive" hyphenation like
you've done in your email is not great either: a typographer
would scream at some of your paragraphs where all lines are
hyphenated (already, hyphenating two successive lines is
generally considered bad style).

>   Comments or sentiments?

It's the first time I hear not just a request but even any
mention of the concept of hyphenation support for Emacs, so
I indeed wouldn't consider it high-priority.  But by all means,
do scratch that itch,


        Stefan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC project "Hyphenation"?
  2012-03-27 18:40 ` Stefan Monnier
@ 2012-03-28  1:01   ` Miles Bader
  2012-03-28 12:47     ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Miles Bader @ 2012-03-28  1:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
> I guess there's a combination of reasons for that:
> - Emacs doesn't support proportional fonts very well.
> - monospaced justified text is rarely used.
> - => justified text is rarely used in Emacs.
> - hyphenation is rarely needed for non-justified text.
>
> E.g. without hyphenation, your email would not have been much
> less pleasantly balanced.  There are only 3 paragraphs where your
> hyphenation lets Emacs's filling algorithm give noticeably better
> results, but by tweaking down the fill-column on a case-by-case
> basis (from 60 to 57, for example), you can easily get back
> something reasonably close (aesthetically) to the
> hyphenated version.

One thing that I find I do sort of want in practice, is simply more
intelligent treatment of _explicit_ hyphens, as filling often yields
awkward results when long hyphenated words
("moggle-crested-snurd-radler") occur in a paragraph.  It'd be nice if
both filling and display-time word-wrapping would be willing to break
after hyphens, and not insert whitespace after a hyphen when filling.

That would presumably be a lot simpler than full balls-to-the-wall
hyphenation.  Hmm, maybe not enough for a full GSoC project tho...

-Miles

-- 
Neighbor, n. One whom we are commanded to love as ourselves, and who does all
he knows how to make us disobedient.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC project "Hyphenation"?
  2012-03-28  1:01   ` Miles Bader
@ 2012-03-28 12:47     ` Stefan Monnier
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2012-03-28 12:47 UTC (permalink / raw)
  To: Miles Bader; +Cc: emacs-devel

> One thing that I find I do sort of want in practice, is simply more
> intelligent treatment of _explicit_ hyphens, as filling often yields
> awkward results when long hyphenated words
> ("moggle-crested-snurd-radler") occur in a paragraph.

Agreed that being able to split long composed words would be nice.
[ BTW, does that mean you're working on a moggle-crested-snurd-radler?
  I'd kill to see it.  ]


        Stefan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* GSoC project "Hyphenation"?
  2012-03-27 16:01 GSoC project "Hyphenation"? Tim Landscheidt
  2012-03-27 17:48 ` Deniz Dogan
  2012-03-27 18:40 ` Stefan Monnier
@ 2016-12-23  1:09 ` hector
  2 siblings, 0 replies; 7+ messages in thread
From: hector @ 2016-12-23  1:09 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1220 bytes --]

Hi.

I did the same and I came upon this post.

I wrote a little program in ELISP to do it.
Currently it works but I have to fix some things: patterns should not match
at the end of the word.

Since my purpose was not to hyphenate mails or output of console
I didn't wrote anything to integrate it with the available filling
or searching functions.

It just takes a word and returns a list of word "slices".

But now I'm thinking that this is some general task. Not specific to
Emacs nor TeX. Shouldn't it be a system library?

To try it:
M-: (load-patterns "FILENAME.DIC")
M-x ly:hyphenate-region

On Tue, Mar 27, 2012 at 04:01:30PM +0000, Tim Landscheidt wrote:
> Hi,
> 
> time and time again I have searched for "Emacs" and "hyphen-
> ation", and so little results came up that I looked up "hy-
> phenation" again to make sure that I hadn't misspelled it.
> It seems that it is not a feature often asked for as the
> typical workflow of text processing in Emacs usually in-
> volves TeX or something similar, but I do find myself often
> in need to hyphenate texts like mails or output of console
> programs.  With Google Summer of Code around, I'd like to
> propose the following idea "Hyphenation in GNU Emacs":
> 

[-- Attachment #2: hyphenate.el --]
[-- Type: text/plain, Size: 6249 bytes --]

;; hyphenate.el - build and manage pattern trie
;; Copyright Héctor Lahoz 2016
;;
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program.  If not, see <http://www.gnu.org/licenses/>.
;;
;; this program is based on the work of Franklin M. Liang
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(eval-when-compile (require 'cl))

; (optimize (safety 0)) ;; uncomment for production

(defstruct ptrie:node
  children ;; CAR - no match; CDR - match (next position)
  (char nil :read-only t)
  (final nil))

(defvar pattern-trie (make-ptrie:node :char ?\s :children '(nil . nil))
  "Root of the patterns trie")

(defun ptrie:print-trie (n path)
  "Print the tree recursively"
  (let ((path_ (concat path (make-string 1 (ptrie:node-char n)))))
    (if (null (cdr (ptrie:node-children n)))
	(progn
	  (princ path_)
	  (princ " - ")
	  (princ (ptrie:node-final n))
	  (princ "\n"))
      (ptrie:print-trie (cdr (ptrie:node-children n)) path_))
    (when (car (ptrie:node-children n))
      (ptrie:print-trie (car (ptrie:node-children n)) path))))

(defun ptrie:print-node (n)
  "Print node N for debugging"
  (let ((ret1 "Node:  :")
	(ret2 " - "))
 ;; I don't understand why this is necessary			
 ;; it seems the string referenced by ret2 is kept between calls and it is not initialised
    (aset ret2 0 ?\s)
    (aset ret2 2 ?\s)
    (aset ret1 6 (ptrie:node-char n))
    (if (null (ptrie:node-children n))
	(setq ret2 "no children")
      (when (car (ptrie:node-children n))
	(aset ret2 0 (ptrie:node-char (car (ptrie:node-children n)))))
      (when (cdr (ptrie:node-children n))
	(aset ret2 2 (ptrie:node-char (cdr (ptrie:node-children n))))))
    (concat ret1 ret2)))

(defun ptrie:find-next-char (node char &optional create)
  "Returns the node corresponding to CHAR. Add a new node when CREATE is t
and requested node doesn't exist"
  (let ((prev node)
	n
	new
	(set-prev-link 'setcdr))
    (setq n (cdr (ptrie:node-children prev)))
      
    (while (and n  ;; works too when (null node-children)
		(> char (ptrie:node-char n)))
      (setq prev n)
      (setq set-prev-link 'setcar)
      (setq n (car (ptrie:node-children n))))
    (when (or (null n)
	      (/= char (ptrie:node-char n)))
      (if (null create)
	  (setq n nil)
	(setq new (make-ptrie:node :char char
			     :children (cons n nil)))
	(when (null (ptrie:node-children prev))
	  (setf (ptrie:node-children prev) '(nil . nil)))
	(funcall set-prev-link (ptrie:node-children prev) new)
	(setq n new)))
    n))

(defun find-pattern (trie p)
  "Return pattern indicated by P starting at TRIE or nil if not found"
  (let ((n trie))
    (dotimes (i (length p) (ptrie:node-final n))
      (when (null (setq n (ptrie:find-next-char n (aref p i))))
	(return nil)))))

(defun add-pattern (trie p)
  "Add pattern P to trie TRIE"
  (let ((pnw (pat-nw p))
	(n trie)
	char)
    
    (dotimes (i (length pnw))
      (setq char (aref pnw i))
      (setq n (ptrie:find-next-char n char t)))
    (setf (ptrie:node-final n) p)))

(defun pat-nw (str)
  "Reomve weight digits from STR"
  (let ((ret nil)
	(char nil)
	(char-str nil)
	(l (length str)))
    (do ((i (- l 1) (1- i))) ((< i 0))
	(setq char (aref str i))
	(setq char-str (substring-no-properties str i (1+ i)))
	(if (not (string-match "[[:digit:]]" char-str))
	    (push char ret)))
    (concat ret)))

(defun read-pattern (buf)
  (let* ((pat))
    (setq pat (buffer-substring (point)
				(progn (beginning-of-line 2)
				       (- (point) 1))))
    (if (or (equal pat "")
	    (equal pat "\n"))
	nil
      pat)))

(defun load-patterns (file)
  (let ((hyphen-patterns (find-file-read-only file))
	(pat nil)
	(pat-nw nil)
	(n pattern-trie)
	(tmp)
	(i))
    (while (setq pat (read-pattern hyphen-patterns))
      (add-pattern pattern-trie pat))))

(defmacro digitp (c)
  "True if c is a digit"
  (if (and (< 47 (eval c))
	   (> 58 (eval c)))
      't
    'nil))

;; TODO optimise
(defun ly:hyphenate-word (word)
  "Returns WORD with hyphens added"
  (let* (s-word
	pat
	weight
	ret
	p-found
	(hpos 0)
	;; add markers at beginning and end
	(delim-word (concat "." word "."))
	(hyphen-weights (make-vector (length delim-word) 0)))
    (dotimes (anchor (length delim-word))
      (setq s-word (substring delim-word anchor))
      (do ((end 1 (1+ end))) ((> end (length s-word)))
	(when (setq pat (find-pattern pattern-trie (substring s-word 0 end)))
	  ;; store weights
	  (setq hpos 0)
	  (dotimes (pos (length pat))
	    (if (not (digitp (aref pat pos)))
		(setq hpos (1+ hpos))
	      (setq weight (- (aref pat pos) ?0))
	      (when (> weight (aref hyphen-weights (+ anchor hpos)))
		(aset hyphen-weights (+ anchor hpos) weight)))))))

    (dotimes (i (length word))
      ;; avoid hyphens before word (when i == 1)
      ;; e.g. pattern "1de" matches the word "de" so it produces " -- de"
      ;; perhaps we should modify the preceding algorithm, not to include
      ;; them in the first place
      (when (and (/= i 1)
		 (= (% (aref hyphen-weights (1+ i)) 2) 1))
	(push " -- " ret))
      (push (aref word i) ret))
    (mapconcat (lambda (s)
		 (if (stringp s)
		     s
		   (string s)))
	       (nreverse ret)
	       "")))

(defun ly:hyphenate-region (beg end)
  "Add lilypond centered hyphens to every word in the region"
  (interactive "r")
  (save-excursion
    (goto-char beg)
    (search-forward "{" (line-beginning-position 2) t)
    (let ((end (copy-marker end))
	  word-beg)
      (while (< (point) end)
	(skip-chars-forward "^a-zA-Záéíóúñäëöüß") ;; find next word
	(setq word-beg (point))
	(forward-word)
	(insert	(ly:hyphenate-word 
		 (prog1
		     (buffer-substring-no-properties word-beg (point))
		   (delete-region word-beg (point)))))))))

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-23  1:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-27 16:01 GSoC project "Hyphenation"? Tim Landscheidt
2012-03-27 17:48 ` Deniz Dogan
2012-03-27 18:04   ` Eli Zaretskii
2012-03-27 18:40 ` Stefan Monnier
2012-03-28  1:01   ` Miles Bader
2012-03-28 12:47     ` Stefan Monnier
2016-12-23  1:09 ` hector

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).