From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Emanuel Berg via Users list for the GNU Emacs text editor Newsgroups: gmane.emacs.help Subject: Re: Emacs as a translator's tool Date: Wed, 10 Jun 2020 17:33:18 +0200 Message-ID: <878sgv3phd.fsf@ebih.ebihd> References: <871rn35lqc.fsf@mbork.pl> <87eeqzmanl.fsf@ebih.ebihd> <877dwmoboq.fsf@mbork.pl> <87bllypckg.fsf@ebih.ebihd> <87tuzpmnuo.fsf@mbork.pl> <87bllu4lx0.fsf@ebih.ebihd> <87blluxfcq.fsf@mbork.pl> <1rmqrrvn.fsf@ebih.ebihd> <87o8ptydil.fsf@ebih.ebihd> <87ftb5ycrz.fsf@ebih.ebihd> <878sgwkkn8.fsf@ebih.ebihd> <0631E9A3-8E0C-4DF6-B8F3-84A1109C1D59@traduction-libre.org> <87imfzn1g2.fsf@ebih.ebihd> <15BD6D26-87A5-4E5F-9271-273495A647EC@traduction-libre.org> <87mu5bmviz.fsf@ebih.ebihd> <4F5ED686-7444-441C-B547-49DB18175F17@traduction-libre.org> <87imfzmucg.fsf@ebih.ebihd> <2F52BF05-8427-4228-B691-6C3926B4A40D@traduction-libre.org> <87pna73zuf.fsf@ebih.ebihd> <9F4D5274-B074-4A3F-91F1-ED4835F687BE@traduction-libre.org> Reply-To: Emanuel Berg Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="14579"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:rwoUncd3LeJHmgNRFOJ2KUcUxQM= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jun 10 17:34:53 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jj2ke-0003bI-7O for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 10 Jun 2020 17:34:52 +0200 Original-Received: from localhost ([::1]:60038 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jj2kd-0004tn-4z for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 10 Jun 2020 11:34:51 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44024) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jj2jJ-0003qX-Lf for help-gnu-emacs@gnu.org; Wed, 10 Jun 2020 11:33:29 -0400 Original-Received: from ciao.gmane.io ([159.69.161.202]:37414) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jj2jH-0006OO-0g for help-gnu-emacs@gnu.org; Wed, 10 Jun 2020 11:33:29 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1jj2jE-0001lY-8o for help-gnu-emacs@gnu.org; Wed, 10 Jun 2020 17:33:24 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: help-gnu-emacs@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=159.69.161.202; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-detected-operating-system: by eggs.gnu.org: First seen = 2020/06/10 11:27:11 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:123315 Archived-At: Jean-Christophe Helary wrote: >> The Emacs sentence format and associated functions >> certainly makes sense to use but with tweaked >> settings to get in particular shorter segments >> I think would benefit both searching the DB and >> getting better results. > > Sure, but it's not trivial to find "natural" > subsegments with the tools at hand. That's why I hoped someone already did it :( Well, obviously someone did! It isn't trivial, no, and regexps will only get you that far. Here one should get proper parsing IMO. But let's postpone this for now... >> So yes, where do you get the database? > > In most of the cases, it's something the translator > build from her own translations. Really? That's cool, then you can have your own style from day 1, and the more you do it, the stronger you get... Any suggestions as to the format of the database? Just because a database can be as simple as a text file [1] it doesn't mean it should be, always. [1] https://dataswamp.org/~incal/bike/TIRE -- underground experts united http://user.it.uu.se/~embe8573 https://dataswamp.org/~incal