From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Emanuel Berg via Users list for the GNU Emacs text editor Newsgroups: gmane.emacs.help Subject: Re: Emacs as a translator's tool Date: Tue, 09 Jun 2020 23:13:31 +0200 Message-ID: <878sgwkkn8.fsf@ebih.ebihd> References: <871rn35lqc.fsf@mbork.pl> <87zh9r45ad.fsf@mbork.pl> <87h7vz2m5g.fsf@ebih.ebihd> <87d06k4rmg.fsf@mbork.pl> <87eeqzmanl.fsf@ebih.ebihd> <877dwmoboq.fsf@mbork.pl> <87bllypckg.fsf@ebih.ebihd> <87tuzpmnuo.fsf@mbork.pl> <87bllu4lx0.fsf@ebih.ebihd> <87blluxfcq.fsf@mbork.pl> <1rmqrrvn.fsf@ebih.ebihd> <87o8ptydil.fsf@ebih.ebihd> <87ftb5ycrz.fsf@ebih.ebihd> Reply-To: Emanuel Berg Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="68658"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:e1W8YjBZjcH/9DgDwhJwb8kuGiE= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Tue Jun 09 23:25:42 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jilkb-000HlG-V7 for geh-help-gnu-emacs@m.gmane-mx.org; Tue, 09 Jun 2020 23:25:41 +0200 Original-Received: from localhost ([::1]:36502 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jilkb-0003Rc-1N for geh-help-gnu-emacs@m.gmane-mx.org; Tue, 09 Jun 2020 17:25:41 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38876) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jilkH-0003RM-Ld for help-gnu-emacs@gnu.org; Tue, 09 Jun 2020 17:25:21 -0400 Original-Received: from ciao.gmane.io ([159.69.161.202]:41060) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jilix-00054R-3I for help-gnu-emacs@gnu.org; Tue, 09 Jun 2020 17:24:00 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1jilYv-0002w8-ND for help-gnu-emacs@gnu.org; Tue, 09 Jun 2020 23:13:37 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: help-gnu-emacs@gnu.org Mail-Copies-To: never X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/09 16:54:50 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:123288 Archived-At: Jean-Christophe Helary wrote: > CAT segmentation rules are defined by the SRX > standard. They are basically a set of cascading > regex rules (break/don't break). OK? do (sentence-end) to get, maybe, "\\([.?!…‽][]\"'”’)}]*\\($\\|[  ]\\)\\|[。.?!]+\\)[   ]"* Because "ll paragraph boundaries also end sentences" ... paragraph-separate is a variable defined in ‘paragraphs.el’. Its value is "[ ]*$" This variable is safe as a file local variable if its value satisfies the predicate ‘stringp’. Documentation: Regexp for beginning of a line that separates paragraphs. If you change this, you may have to change ‘paragraph-start’ also. So we'll just change it to comply with the SRX standard :) > It is possible to fine-tune a translation by > modifying a rule set ... It is possible but with Emacs, even more so :) -- underground experts united http://user.it.uu.se/~embe8573 https://dataswamp.org/~incal