From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.comp.gnu.gettext.bugs,gmane.emacs.devel Subject: Re: Emacs i18n Date: Thu, 21 Mar 2019 23:45:31 +0200 Organization: LINKOV.NET Message-ID: <87y357q4uc.fsf@mail.linkov.net> References: <25076895.mA2g9mTHSI@omega> <87h8bx5ijn.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="204745"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) Cc: bug-gettext-mXXj517/zsQ@public.gmane.org, emacs-devel-mXXj517/zsQ@public.gmane.org To: Richard Stallman Original-X-From: bug-gettext-bounces+gcggb-bug-gettext=m.gmane.org-mXXj517/zsQ@public.gmane.org Thu Mar 21 23:13:50 2019 Return-path: Envelope-to: gcggb-bug-gettext@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h75wc-000r8C-4A for gcggb-bug-gettext@m.gmane.org; Thu, 21 Mar 2019 23:13:50 +0100 Original-Received: from localhost ([127.0.0.1]:47458 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h75wa-0005fk-LP for gcggb-bug-gettext@m.gmane.org; Thu, 21 Mar 2019 18:13:48 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:49360) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h75wU-0005fe-6f for bug-gettext-mXXj517/zsQ@public.gmane.org; Thu, 21 Mar 2019 18:13:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h75wT-0005W2-3C for bug-gettext-mXXj517/zsQ@public.gmane.org; Thu, 21 Mar 2019 18:13:42 -0400 Original-Received: from lavender.maple.relay.mailchannels.net ([23.83.214.99]:25354) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h75wS-0005TL-MW; Thu, 21 Mar 2019 18:13:41 -0400 X-Sender-Id: dreamhost|x-authsender|jurta-mbQthtfvRezYtjvyW6yDsg@public.gmane.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id D454B2C1A15; Thu, 21 Mar 2019 22:13:38 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a56.g.dreamhost.com (100-96-7-41.trex.outbound.svc.cluster.local [100.96.7.41]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id B77A62C19BF; Thu, 21 Mar 2019 22:13:33 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta-mbQthtfvRezYtjvyW6yDsg@public.gmane.org Original-Received: from pdx1-sub0-mail-a56.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.17.2); Thu, 21 Mar 2019 22:13:38 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta-mbQthtfvRezYtjvyW6yDsg@public.gmane.org X-MailChannels-Auth-Id: dreamhost X-Robust-Tangy: 16c0ece076675e29_1553206413988_2012592858 X-MC-Loop-Signature: 1553206413988:2061769500 X-MC-Ingress-Time: 1553206413988 Original-Received: from pdx1-sub0-mail-a56.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a56.g.dreamhost.com (Postfix) with ESMTP id 5A34681B19; Thu, 21 Mar 2019 15:13:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=linkov.net; bh=uepiPL +B7wPpwbYeQn8PIA/5P80=; b=IKDxXTYE2qg+zz6dwJUnrzBfZwQbZO9I2MKMYB oBRTzPnUWA4xukgr+RUzm9bCEmzVsggqVNnjskq1BOn28C9tlm1dnV6kvDMQdI5b 1QjuDsERU2V7a0fySmUeSYX6LJ5EEVoYDj6uyu/Kjh/mm2k1vMYhB5jae20pAjLB r3D6Y= Original-Received: from mail.jurta.org (m91-129-108-250.cust.tele2.ee [91.129.108.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta-mbQthtfvRezYtjvyW6yDsg@public.gmane.org) by pdx1-sub0-mail-a56.g.dreamhost.com (Postfix) with ESMTPSA id 9743081B1C; Thu, 21 Mar 2019 15:13:25 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a56 In-Reply-To: (Richard Stallman's message of "Wed, 20 Mar 2019 22:14:35 -0400") X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: 0 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedutddrieelgddugeefucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvffuohhfffgjkfgfgggtgfesthekredttderjeenucfhrhhomheplfhurhhiucfnihhnkhhovhcuoehjuhhriheslhhinhhkohhvrdhnvghtqeenucffohhmrghinheplhhinhhkohhvrdhnvghtpdifihhkihhpvgguihgrrdhorhhgnecukfhppeeluddruddvledruddtkedrvdehtdenucfrrghrrghmpehmohguvgepshhmthhppdhhvghlohepmhgrihhlrdhjuhhrthgrrdhorhhgpdhinhgvthepledurdduvdelrddutdekrddvhedtpdhrvghtuhhrnhdqphgrthhhpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqpdhmrghilhhfrhhomhepjhhurhhisehlihhnkhhovhdrnhgvthdpnhhrtghpthhtoheprhhmshesghhnuhdrohhrghenucevlhhushhtvghrufhiiigvpedt X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 23.83.214.99 X-BeenThere: bug-gettext-mXXj517/zsQ@public.gmane.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for GNU gettext List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gettext-bounces+gcggb-bug-gettext=m.gmane.org-mXXj517/zsQ@public.gmane.org Original-Sender: "bug-gettext" Xref: news.gmane.org gmane.comp.gnu.gettext.bugs:1968 gmane.emacs.devel:234498 Archived-At: > > Indeed, a complete implementation of all Russian morphological rule= s > > takes ~1600 lines of dense Perl code: > > > http://www.linkov.net/files/nlp/Lingua-RU-Inflect.pm > > > I can't imagine how to include all these rules to gettext. > > I agree with you about that. What I propose is something else. > > 1. I do not propose implementing them all. Only some -- whichever ones > we think are worth while. > > 2. I do not propose putting any of this in gettext. > What I propose would be Emacs code that operates on the strings that > come from gettext. The misconception of your proposal is assuming a pure algorithmic inflection whereas actually inflection in Russian is dictionary-based (in addition to algorithms that process words from the dictionary), i.e. to be able to inflect a word you need a large dictionary of all words where each word in the dictionary has at least the following lexical properties: - part of speech - noun grammatical gender: masculine, feminine, neuter - noun animacy: animate, inanimate - inflection type And the main parameters that influence the declension are: - grammatical case (one of 6 basic: nominative, genitive, dative, accusative, instrumental, prepositional plus some additional) - number: singular and plural. Dual is not a grammatical number, it only influences the choice of cases for words after numerals: for 1 - nominative case, singular for 2..4 - genitive case, singular for 5.. - genitive case, plural An additional problem is that there are many exceptions: some words have an additional form called "count form" https://en.wikipedia.org/wiki/Russian_declension#Count_form For instance, an exception is to use "5 =D0=B1=D0=B0=D0=B9=D1=82" (5 byte= ) instead of what should be according to the grammatical rule that requires genitive plural for most other words, but not for bytes, i.e. this is incorrect: "5 =D0=B1=D0=B0=D0=B9=D1=82=D0=BE=D0=B2" (5 bytes= ). Such exceptions are marked in the dictionary with a special property that has different values: - mandatory: only the count form is allowed for such units of measure as amperes, watts, volts, bits, bytes, etc. - optional: both forms are accepted for such units as angstroms, gauss, (kilo)grams, decibels, carats, microns, ohms, r=C3=B6ntgen, etc.