From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Yair F Newsgroups: gmane.emacs.devel Subject: Re: Composing Hebrew diacriticals Date: Fri, 7 May 2010 13:00:16 +0300 Message-ID: References: <83mxwlw2c0.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1273240876 25338 80.91.229.12 (7 May 2010 14:01:16 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 7 May 2010 14:01:16 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 07 16:01:13 2010 connect(): No such file or directory Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OAO73-00043E-Pe for ged-emacs-devel@m.gmane.org; Fri, 07 May 2010 16:01:10 +0200 Original-Received: from localhost ([127.0.0.1]:41903 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OAO72-0005GF-U2 for ged-emacs-devel@m.gmane.org; Fri, 07 May 2010 10:01:09 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1OANYz-0002GV-In for emacs-devel@gnu.org; Fri, 07 May 2010 09:25:58 -0400 Original-Received: from [140.186.70.92] (port=47582 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OANYq-0006Kb-UK for emacs-devel@gnu.org; Fri, 07 May 2010 09:25:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OAKLx-0001fI-Rp for emacs-devel@gnu.org; Fri, 07 May 2010 06:00:21 -0400 Original-Received: from mail-wy0-f169.google.com ([74.125.82.169]:46910) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OAKLx-0001et-L0 for emacs-devel@gnu.org; Fri, 07 May 2010 06:00:17 -0400 Original-Received: by wyb40 with SMTP id 40so340172wyb.0 for ; Fri, 07 May 2010 03:00:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=hPxsvHGXbDqK22KQWoHOTnCTbrFa9gAKk6ZCi9seZzo=; b=dX7tcMg7h+LxeEKABEOwg/KWBCzOrro6YwvLRftO0K0UJt35wJg/SRRALV49h5/SA9 7d8igQJDfzfvpPFTOFB1Z+IAagxsWuJcNVes0FB/e0xx8rFVI/NJejb6yQdtsZSmaIP3 nCQ+y7CKEEY8OpvCI9li0k+Rjw41r1wC/EW1M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=w6ZRwxmx0pmhjqToklgNd98CpKMRMs+kT0q5lFq/l+Y40Ago51Pl7JZMIIALffh0JC MARGnigXQQAgPdtFW8f8Y2RGrGLQfK9pxueWzGakCVo8CCIfDvmqj82KdBno9fu1G3Va 0MKJRQh4TgvlRBJZMJxof8oYbndTB7MvzyVC0= Original-Received: by 10.227.136.139 with SMTP id r11mr5511374wbt.129.1273226416647; Fri, 07 May 2010 03:00:16 -0700 (PDT) Original-Received: by 10.216.177.204 with HTTP; Fri, 7 May 2010 03:00:16 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:124615 Archived-At: On Fri, May 7, 2010 at 9:23 AM, Kenichi Handa wrote: > If what composed are only diacritical marks, and they are > placed on any base characters, it is better to set that kind > of list only for hebrew diacriticals for efficiency. =A0So, > the code will be something like this: > > (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...)) > =A0 =A0 =A0(regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..])) > =A0(dolist (elt hebrew-diacritals-list) > =A0 =A0(set-char-table-range elt > =A0 =A0 =A0(list (vector regexp 1 'font-shape-gstring))))) > > Here "1" is for moving back one character to check matching > with REGEXP. > >>> There are some restrictions on which characters are allowed to be compo= sed. > > If that restrictions are more rigid, regexp should vary for > each diacritical mark. This is the composition regexp : I added whitespace and comments for readab= ility \\( [\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base [\u05BC\u05BF]? ;; 0-1 marks of 1st class (dagesh) [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\| \u05D5 ; base \u05BC? ;; 0-1 marks of 1st class (dagesh) [\u05B0-\u05BB\u05C7]? ;; 0-1 marks of extended 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\| \u05E9 ; base \u05BC ;; 0-1 marks of 1st class (dagesh) [\u05C1\u05C2]? ;; 0-1 marks of 2nd class (shin dot) [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\) What would be the best way in this case? In the most extreme case there are 6 marks attached to base character.