From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Yair F <yair.f.lists@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Composing Hebrew diacriticals
Date: Fri, 7 May 2010 13:00:16 +0300
Message-ID: <p2hba5bff411005070300lbc01fb06k9a753cf629b1b4c0@mail.gmail.com>
References: <83mxwlw2c0.fsf@gnu.org> <tl7ocgvkr6n.fsf@m17n.org>
	<loom.20100506T165338-12@post.gmane.org> <tl7pr18fsfu.fsf@m17n.org>
	<x2hba5bff411005062141rbcadbcd5va8b1ead65f40aef8@mail.gmail.com>
	<tl7mxwcfcxg.fsf@m17n.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1273240876 25338 80.91.229.12 (7 May 2010 14:01:16 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Fri, 7 May 2010 14:01:16 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Kenichi Handa <handa@m17n.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 07 16:01:13 2010
connect(): No such file or directory
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1OAO73-00043E-Pe
	for ged-emacs-devel@m.gmane.org; Fri, 07 May 2010 16:01:10 +0200
Original-Received: from localhost ([127.0.0.1]:41903 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1OAO72-0005GF-U2
	for ged-emacs-devel@m.gmane.org; Fri, 07 May 2010 10:01:09 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1OANYz-0002GV-In
	for emacs-devel@gnu.org; Fri, 07 May 2010 09:25:58 -0400
Original-Received: from [140.186.70.92] (port=47582 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OANYq-0006Kb-UK
	for emacs-devel@gnu.org; Fri, 07 May 2010 09:25:56 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <yair.f.lists@gmail.com>) id 1OAKLx-0001fI-Rp
	for emacs-devel@gnu.org; Fri, 07 May 2010 06:00:21 -0400
Original-Received: from mail-wy0-f169.google.com ([74.125.82.169]:46910)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <yair.f.lists@gmail.com>) id 1OAKLx-0001et-L0
	for emacs-devel@gnu.org; Fri, 07 May 2010 06:00:17 -0400
Original-Received: by wyb40 with SMTP id 40so340172wyb.0
	for <emacs-devel@gnu.org>; Fri, 07 May 2010 03:00:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=hPxsvHGXbDqK22KQWoHOTnCTbrFa9gAKk6ZCi9seZzo=;
	b=dX7tcMg7h+LxeEKABEOwg/KWBCzOrro6YwvLRftO0K0UJt35wJg/SRRALV49h5/SA9
	7d8igQJDfzfvpPFTOFB1Z+IAagxsWuJcNVes0FB/e0xx8rFVI/NJejb6yQdtsZSmaIP3
	nCQ+y7CKEEY8OpvCI9li0k+Rjw41r1wC/EW1M=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=w6ZRwxmx0pmhjqToklgNd98CpKMRMs+kT0q5lFq/l+Y40Ago51Pl7JZMIIALffh0JC
	MARGnigXQQAgPdtFW8f8Y2RGrGLQfK9pxueWzGakCVo8CCIfDvmqj82KdBno9fu1G3Va
	0MKJRQh4TgvlRBJZMJxof8oYbndTB7MvzyVC0=
Original-Received: by 10.227.136.139 with SMTP id r11mr5511374wbt.129.1273226416647; 
	Fri, 07 May 2010 03:00:16 -0700 (PDT)
Original-Received: by 10.216.177.204 with HTTP; Fri, 7 May 2010 03:00:16 -0700 (PDT)
In-Reply-To: <tl7mxwcfcxg.fsf@m17n.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:124615
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/124615>

On Fri, May 7, 2010 at 9:23 AM, Kenichi Handa <handa@m17n.org> wrote:

> If what composed are only diacritical marks, and they are
> placed on any base characters, it is better to set that kind
> of list only for hebrew diacriticals for efficiency. =A0So,
> the code will be something like this:
>
> (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...))
> =A0 =A0 =A0(regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..]))
> =A0(dolist (elt hebrew-diacritals-list)
> =A0 =A0(set-char-table-range elt
> =A0 =A0 =A0(list (vector regexp 1 'font-shape-gstring)))))
>
> Here "1" is for moving back one character to check matching
> with REGEXP.
>
>>> There are some restrictions on which characters are allowed to be compo=
sed.
>
> If that restrictions are more rigid, regexp should vary for
> each diacritical mark.

This is the composition regexp : I added whitespace and comments for readab=
ility

\\(
[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base
  [\u05BC\u05BF]?                               ;; 0-1 marks of 1st
class (dagesh)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05D5                                          ; base
  \u05BC?                                       ;; 0-1 marks of 1st
class (dagesh)
[\u05B0-\u05BB\u05C7]?                          ;; 0-1 marks of
extended 3rd class (niqud)
[\u0591-\u05AF\u05BD]*                          ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05E9                                          ; base
  \u05BC                                       ;; 0-1 marks of 1st
class (dagesh)
  [\u05C1\u05C2]?                              ;; 0-1 marks of 2nd
class (shin dot)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\)

What would be the best way in this case?
In the most extreme case there are 6 marks attached to base character.