From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Tim Landscheidt Newsgroups: gmane.emacs.devel Subject: GSoC project "Hyphenation"? Date: Tue, 27 Mar 2012 16:01:30 +0000 Organization: Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1332869133 20713 80.91.229.3 (27 Mar 2012 17:25:33 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 27 Mar 2012 17:25:33 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 27 19:25:33 2012 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SCa9D-0002tn-SO for ged-emacs-devel@m.gmane.org; Tue, 27 Mar 2012 19:25:32 +0200 Original-Received: from localhost ([::1]:56546 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SCa9D-0000Aj-7c for ged-emacs-devel@m.gmane.org; Tue, 27 Mar 2012 13:25:31 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:59142) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SCa8z-00080x-J9 for emacs-devel@gnu.org; Tue, 27 Mar 2012 13:25:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SCa8r-0007tW-17 for emacs-devel@gnu.org; Tue, 27 Mar 2012 13:25:17 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:46580) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SCa8q-0007se-MA for emacs-devel@gnu.org; Tue, 27 Mar 2012 13:25:08 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SCa8p-0002cQ-41 for emacs-devel@gnu.org; Tue, 27 Mar 2012 19:25:07 +0200 Original-Received: from g224127137.adsl.alicedsl.de ([92.224.127.137]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 27 Mar 2012 19:25:07 +0200 Original-Received: from tim by g224127137.adsl.alicedsl.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 27 Mar 2012 19:25:07 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 110 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: g224127137.adsl.alicedsl.de Mail-Copies-To: never User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:0LS9dFWcB2vB0EUNVPJ4UEYwYB0= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:149229 Archived-At: Hi, time and time again I have searched for "Emacs" and "hyphen- ation", and so little results came up that I looked up "hy- phenation" again to make sure that I hadn't misspelled it. It seems that it is not a feature often asked for as the typical workflow of text processing in Emacs usually in- volves TeX or something similar, but I do find myself often in need to hyphenate texts like mails or output of console programs. With Google Summer of Code around, I'd like to propose the following idea "Hyphenation in GNU Emacs": 1. Research, define and qualify "use cases" Where in the Emacs world could hyphenation be used, where must it not be and how would it be used in a typical workflow? For example, in TeX documents or program sources, automatic hyphenation is probably only useful in comments if at all. In text modes, paragraphs are writ- ten, filled, edited, refilled, killed, yanked, etc. In HTML and other languages, it might be useful to add soft hyphens to individual or all words. In all modes, it might be handy to show possible hyphenations for the word at point. These use cases can be ordered according to their (pos- itive) effect on user productivity and difficulty of im- plementation. At this stage the mentor would decide which of these use cases would have to be implemented as part of this project. 2. Research and define a high-level interface and syntax Based on the use cases, how would the user specify the hyphenation "locale" wanted? How does that relate to other language-specific customizations? How would edit- ing and filling functions query the hyphenation of a par- ticular word? How would automatically hyphenated words be marked up in buffers and on disk? 3. Implement a dummy backend and set up tests Compile a list of hyphenated words from free sources and implement a backend that uses them. Set up a test suite that compares the results generated by other backends with this. 4. Implement the frontend This involves amending the editing and filling functions so that the use cases identified in 1. can be fulfilled with the limited word list of the dummy backend. This would also serve as the mid-term evaluation point. 5. Identify possible backends, their (legal) compatibility with GNU Emacs and implement them 5.1. One of the most often used algorithms is the one de- veloped by Franklin Mark Liang and implemented for TeX. While there are implementations even in GNU Emacs Lisp, the licence of the accompanying pattern files is often a topic of discussion so that for ex- ample Apache FOP outsourced them to a separate pack- age. a) Work out with FSF whether and how pattern files can be included in which form. As groff does this, I am confident that this path can be fol- lowed. Port/review and adjust an implementation of Liang's algorithm and enhance the Emacs build system by targets that import the pattern files and convert them to GNU Emacs Lisp. b) If they cannot be included, define a user inter- face with sensible defaults that point to their location elsewhere. Candidates are installations of (La)TeX and the aforementioned "FOP XML Hy- phenation Patterns". Implement a reader. 5.2. There are other backends that implement other algo- rithms or clad Liang's in a different form. Re- search whether they are popular (enough) and option- ally implement a connector. If 5.1. is legally fea- sible, this would be an add-on. 6. Test the system and fix the bugs. Completion criteria would be that: - at least the use cases selected by the mentor in 1. would be implemented with a non-dummy backend, - the source is documented to a degree that a third per- son who is familiar with hyphenation/the chosen algo- rithm understands the code so that it can be main- tained, and - no existing functionality has been broken :-). As the project is aimed at users and Emacs developers appar- ently didn't bother enough about hyphenation to implement it themselves :-), I'd plan to code the project in the early stages as a separate package that would advice the relevant core functions so that it could be tested by users running a regular release, and only integrate it in the regular code late in the game. Comments or sentiments? Tim