From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Jean-Christophe Helary Newsgroups: gmane.emacs.help Subject: Re: Emacs as a translator's tool Date: Mon, 8 Jun 2020 21:34:54 +0900 Message-ID: References: <871rn35lqc.fsf@mbork.pl> <87zh9r45ad.fsf@mbork.pl> <87h7vz2m5g.fsf@ebih.ebihd> <87d06k4rmg.fsf@mbork.pl> <87eeqzmanl.fsf@ebih.ebihd> <877dwmoboq.fsf@mbork.pl> <87bllypckg.fsf@ebih.ebihd> <87tuzpmnuo.fsf@mbork.pl> <87bllu4lx0.fsf@ebih.ebihd> <87blluxfcq.fsf@mbork.pl> <1rmqrrvn.fsf@ebih.ebihd> <87o8ptydil.fsf@ebih.ebihd> <87ftb5ycrz.fsf@ebih.ebihd> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="111865"; mail-complaints-to="usenet@ciao.gmane.io" Cc: help-gnu-emacs@gnu.org To: Emanuel Berg Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jun 08 14:35:31 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jiGzz-000T0A-EL for geh-help-gnu-emacs@m.gmane-mx.org; Mon, 08 Jun 2020 14:35:31 +0200 Original-Received: from localhost ([::1]:41582 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jiGzy-0005p0-GZ for geh-help-gnu-emacs@m.gmane-mx.org; Mon, 08 Jun 2020 08:35:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40494) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jiGzb-0005nd-DP for help-gnu-emacs@gnu.org; Mon, 08 Jun 2020 08:35:07 -0400 Original-Received: from relay8-d.mail.gandi.net ([217.70.183.201]:46659) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jiGzZ-0004Bn-R5 for help-gnu-emacs@gnu.org; Mon, 08 Jun 2020 08:35:06 -0400 X-Originating-IP: 111.89.103.91 Original-Received: from [10.0.1.13] (pl22363.ag1313.nttpc.ne.jp [111.89.103.91]) (Authenticated sender: jean.christophe.helary@traduction-libre.org) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 576AF1BF205; Mon, 8 Jun 2020 12:34:59 +0000 (UTC) In-Reply-To: <87ftb5ycrz.fsf@ebih.ebihd> X-Mailer: Apple Mail (2.3608.80.23.2.2) Received-SPF: pass client-ip=217.70.183.201; envelope-from=jean.christophe.helary@traduction-libre.org; helo=relay8-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/08 08:35:02 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:123279 Archived-At: > On Jun 8, 2020, at 21:15, Emanuel Berg via Users list for the GNU = Emacs text editor wrote: >=20 >> One thing tho, M. Helary said something about the >> chopping up of the input into segments, my >> intuition tells me they are shorter (the input more >> segmentized) than what you get with >> `forward-sentence' and `backward-sentence'. (My >> intuition also tells me backward-sentence is >> (forward-sentence -1) ...) >>=20 >> Maybe `sentence-end' already has been configured >> somewhere to get the most restrictive definition, >> i.e., here, with the purpose of getting the >> shortest possible segments that still make sense... >=20 > Unless... unless the DB is really fined > tuned already. Then we should do our own segmentation > rules, we should get the exact same as they (OmegaT > or whoever has it) uses... CAT segmentation rules are defined by the SRX standard. They are = basically a set of cascading regex rules (break/don't break). It is possible to fine-tune a translation by modifying a rule set before = or during the translation. --=20 Jean-Christophe Helary @brandelune http://mac4translators.blogspot.com