From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: hector Newsgroups: gmane.emacs.devel Subject: GSoC project "Hyphenation"? Date: Fri, 23 Dec 2016 02:09:36 +0100 Message-ID: <20161223010936.GA2877@workstation> References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1482456053 9705 195.159.176.226 (23 Dec 2016 01:20:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 23 Dec 2016 01:20:53 +0000 (UTC) User-Agent: Mutt/1.5.20 (2009-06-14) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 23 02:20:49 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cKEXN-00012s-12 for ged-emacs-devel@m.gmane.org; Fri, 23 Dec 2016 02:20:45 +0100 Original-Received: from localhost ([::1]:36912 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cKEXO-0003Ll-8l for ged-emacs-devel@m.gmane.org; Thu, 22 Dec 2016 20:20:46 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58244) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cKEXH-0003Ld-FA for emacs-devel@gnu.org; Thu, 22 Dec 2016 20:20:41 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cKEXD-0003pe-EX for emacs-devel@gnu.org; Thu, 22 Dec 2016 20:20:39 -0500 Original-Received: from mail-wm0-x241.google.com ([2a00:1450:400c:c09::241]:33725) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cKEXD-0003p8-4o for emacs-devel@gnu.org; Thu, 22 Dec 2016 20:20:35 -0500 Original-Received: by mail-wm0-x241.google.com with SMTP id u144so39013746wmu.0 for ; Thu, 22 Dec 2016 17:20:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=ZdQk4GP56/ObxieVZrinteQw6h0rem00GKygahvB/VY=; b=TCULuDM7SYDaS/gSgrEjPxLJl2SJ91F8qXCfXaKacs5s0jr5bGqAEAp7mzbpYeKASO CNCPPqUrwdOa6Y9tHdpSaMB1lF/se7W9F4lYgTc6LrfFJ+CSQCJl4ktk8Q72otQgYjML WvK3Af/pjE9/aFCLr+XbE2i/daypMCOhV/O3bPEd1/tPYb56ISynqc2BAtE0ifvSLJrz AKmDDDy4hAjvli+YrMxj/2C7DGB6UYkVMy4WXcrC67C/gXuhxvrwZ3xTDpkBxvr2l3XV 94/9ZCcm/YRfvYglmFqxVFFcpZXWExIazQpzm/rmlcNOdsPp2F6Tl5FOQz7yTpGrBHDO wbXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=ZdQk4GP56/ObxieVZrinteQw6h0rem00GKygahvB/VY=; b=Y4mT9swa5V/MKzz9YyouRGhKMylV0sqKGhQ9q0toaTf2MkVvMlBTF3ZIEkXjyInjQT 1tu9MPaA9WxwWwHXpoLIJLnrKnKTdUx4jZyTpErxVCeX52olAW1GkO1/K8JntV8VnuBE LCJGfoCXyNdh37GPes2ShC0jp7tNCe4vgM1NcPS0sNvOiPkRVThK1HCMBoSHrq929zr+ xYWvMCB465zi8oLJQZLPcFA+wd0u0gkcFPC4kcVj5RX5Szh0+gwY3Jve3yaDbbMURKz1 5bayAepE6sibsiwOVw9UwuZzRP/kne48dVNBjDgLXFIkMQHNRXUt0FmZ+n9vFMEJVRMT 3imw== X-Gm-Message-State: AIkVDXJPEyr1y95Brh+kVynw4LZFknZvwMdaoD+XiBjGKD/DWP0QrA8QNt5UxAFUbrswEA== X-Received: by 10.28.107.21 with SMTP id g21mr11877256wmc.131.1482456033800; Thu, 22 Dec 2016 17:20:33 -0800 (PST) Original-Received: from workstation ([47.63.38.34]) by smtp.gmail.com with ESMTPSA id ua15sm38114848wjb.1.2016.12.22.17.20.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 22 Dec 2016 17:20:33 -0800 (PST) X-Google-Original-From: hector Content-Disposition: inline In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::241 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:210746 Archived-At: --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi. I did the same and I came upon this post. I wrote a little program in ELISP to do it. Currently it works but I have to fix some things: patterns should not match at the end of the word. Since my purpose was not to hyphenate mails or output of console I didn't wrote anything to integrate it with the available filling or searching functions. It just takes a word and returns a list of word "slices". But now I'm thinking that this is some general task. Not specific to Emacs nor TeX. Shouldn't it be a system library? To try it: M-: (load-patterns "FILENAME.DIC") M-x ly:hyphenate-region On Tue, Mar 27, 2012 at 04:01:30PM +0000, Tim Landscheidt wrote: > Hi, > > time and time again I have searched for "Emacs" and "hyphen- > ation", and so little results came up that I looked up "hy- > phenation" again to make sure that I hadn't misspelled it. > It seems that it is not a feature often asked for as the > typical workflow of text processing in Emacs usually in- > volves TeX or something similar, but I do find myself often > in need to hyphenate texts like mails or output of console > programs. With Google Summer of Code around, I'd like to > propose the following idea "Hyphenation in GNU Emacs": > --J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="hyphenate.el" Content-Transfer-Encoding: 8bit ;; hyphenate.el - build and manage pattern trie ;; Copyright Héctor Lahoz 2016 ;; ;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see . ;; ;; this program is based on the work of Franklin M. Liang ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (eval-when-compile (require 'cl)) ; (optimize (safety 0)) ;; uncomment for production (defstruct ptrie:node children ;; CAR - no match; CDR - match (next position) (char nil :read-only t) (final nil)) (defvar pattern-trie (make-ptrie:node :char ?\s :children '(nil . nil)) "Root of the patterns trie") (defun ptrie:print-trie (n path) "Print the tree recursively" (let ((path_ (concat path (make-string 1 (ptrie:node-char n))))) (if (null (cdr (ptrie:node-children n))) (progn (princ path_) (princ " - ") (princ (ptrie:node-final n)) (princ "\n")) (ptrie:print-trie (cdr (ptrie:node-children n)) path_)) (when (car (ptrie:node-children n)) (ptrie:print-trie (car (ptrie:node-children n)) path)))) (defun ptrie:print-node (n) "Print node N for debugging" (let ((ret1 "Node: :") (ret2 " - ")) ;; I don't understand why this is necessary ;; it seems the string referenced by ret2 is kept between calls and it is not initialised (aset ret2 0 ?\s) (aset ret2 2 ?\s) (aset ret1 6 (ptrie:node-char n)) (if (null (ptrie:node-children n)) (setq ret2 "no children") (when (car (ptrie:node-children n)) (aset ret2 0 (ptrie:node-char (car (ptrie:node-children n))))) (when (cdr (ptrie:node-children n)) (aset ret2 2 (ptrie:node-char (cdr (ptrie:node-children n)))))) (concat ret1 ret2))) (defun ptrie:find-next-char (node char &optional create) "Returns the node corresponding to CHAR. Add a new node when CREATE is t and requested node doesn't exist" (let ((prev node) n new (set-prev-link 'setcdr)) (setq n (cdr (ptrie:node-children prev))) (while (and n ;; works too when (null node-children) (> char (ptrie:node-char n))) (setq prev n) (setq set-prev-link 'setcar) (setq n (car (ptrie:node-children n)))) (when (or (null n) (/= char (ptrie:node-char n))) (if (null create) (setq n nil) (setq new (make-ptrie:node :char char :children (cons n nil))) (when (null (ptrie:node-children prev)) (setf (ptrie:node-children prev) '(nil . nil))) (funcall set-prev-link (ptrie:node-children prev) new) (setq n new))) n)) (defun find-pattern (trie p) "Return pattern indicated by P starting at TRIE or nil if not found" (let ((n trie)) (dotimes (i (length p) (ptrie:node-final n)) (when (null (setq n (ptrie:find-next-char n (aref p i)))) (return nil))))) (defun add-pattern (trie p) "Add pattern P to trie TRIE" (let ((pnw (pat-nw p)) (n trie) char) (dotimes (i (length pnw)) (setq char (aref pnw i)) (setq n (ptrie:find-next-char n char t))) (setf (ptrie:node-final n) p))) (defun pat-nw (str) "Reomve weight digits from STR" (let ((ret nil) (char nil) (char-str nil) (l (length str))) (do ((i (- l 1) (1- i))) ((< i 0)) (setq char (aref str i)) (setq char-str (substring-no-properties str i (1+ i))) (if (not (string-match "[[:digit:]]" char-str)) (push char ret))) (concat ret))) (defun read-pattern (buf) (let* ((pat)) (setq pat (buffer-substring (point) (progn (beginning-of-line 2) (- (point) 1)))) (if (or (equal pat "") (equal pat "\n")) nil pat))) (defun load-patterns (file) (let ((hyphen-patterns (find-file-read-only file)) (pat nil) (pat-nw nil) (n pattern-trie) (tmp) (i)) (while (setq pat (read-pattern hyphen-patterns)) (add-pattern pattern-trie pat)))) (defmacro digitp (c) "True if c is a digit" (if (and (< 47 (eval c)) (> 58 (eval c))) 't 'nil)) ;; TODO optimise (defun ly:hyphenate-word (word) "Returns WORD with hyphens added" (let* (s-word pat weight ret p-found (hpos 0) ;; add markers at beginning and end (delim-word (concat "." word ".")) (hyphen-weights (make-vector (length delim-word) 0))) (dotimes (anchor (length delim-word)) (setq s-word (substring delim-word anchor)) (do ((end 1 (1+ end))) ((> end (length s-word))) (when (setq pat (find-pattern pattern-trie (substring s-word 0 end))) ;; store weights (setq hpos 0) (dotimes (pos (length pat)) (if (not (digitp (aref pat pos))) (setq hpos (1+ hpos)) (setq weight (- (aref pat pos) ?0)) (when (> weight (aref hyphen-weights (+ anchor hpos))) (aset hyphen-weights (+ anchor hpos) weight))))))) (dotimes (i (length word)) ;; avoid hyphens before word (when i == 1) ;; e.g. pattern "1de" matches the word "de" so it produces " -- de" ;; perhaps we should modify the preceding algorithm, not to include ;; them in the first place (when (and (/= i 1) (= (% (aref hyphen-weights (1+ i)) 2) 1)) (push " -- " ret)) (push (aref word i) ret)) (mapconcat (lambda (s) (if (stringp s) s (string s))) (nreverse ret) ""))) (defun ly:hyphenate-region (beg end) "Add lilypond centered hyphens to every word in the region" (interactive "r") (save-excursion (goto-char beg) (search-forward "{" (line-beginning-position 2) t) (let ((end (copy-marker end)) word-beg) (while (< (point) end) (skip-chars-forward "^a-zA-Záéíóúñäëöüß") ;; find next word (setq word-beg (point)) (forward-word) (insert (ly:hyphenate-word (prog1 (buffer-substring-no-properties word-beg (point)) (delete-region word-beg (point))))))))) --J2SCkAp4GZ/dPZZf--