unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* German-postfix fix for Quail
       [not found]     ` <B38610C6-933F-4F40-A82D-5FABFE1D5397@tzi.org>
@ 2008-08-21 23:43       ` David Reitter
  2008-08-22  1:25         ` Carsten Bormann
  0 siblings, 1 reply; 4+ messages in thread
From: David Reitter @ 2008-08-21 23:43 UTC (permalink / raw)
  To: Carsten Bormann; +Cc: Emacs-Devel devel

[-- Attachment #1: Type: text/plain, Size: 3210 bytes --]

cc'ing Emacs devel.  For the patch, see below.

On 21 Aug 2008, at 19:19, Carsten Bormann wrote:

> On Aug 22 2008, at 00:09, David Reitter wrote:
>
>> seems sensible. Have you checked your new rules with a corpus?
>
>
> Well it worked well for my everyday typing so far :-)
>
> I just did, in a limited way (a bug in the "wortschatz" website  
> doesn't let me search for umlauts and DWDS only allows wildcards at  
> the beginning *or* end of words).  The only occurrence that I could  
> not immediately rule out as an obvious typo was:
>
> "Casa Scanteü" in Bukarest
>
> which doesn't even google (so probably also a typo).
>
> Do you have access to better corpora?


Yes, generally speaking.

The 34 Million word Frankfurter Rundschau corpus for example brings up  
a few hapax words, correctly spelled, which are, without exception,  
compound nouns:

Diaüberblendschau
Guerillaüberfall
Klimaübereinkommen
Klimaüberwachungs
Yogaübung
Kameraüberwachung

This is for aü only.
For eü, we get this at morpheme boundaries: "geöffenten", for  
instance.  I find a lot of legitimate eü (mostly at word boundaries in  
compounds).

At least for aü, I also get a LOT of misspelled words (like  
"Aüßerungen") - more than the correctly spelled ones (most of which  
are, as I said  frequency=1).

"qü" does not occur.

I presume Quail has some provisions for typing something like  
"Yogaübung" even with your patch?
For "eü", perhaps we should consider holding off.

- D




Begin forwarded message:

> From: Carsten Bormann <cabo@tzi.org>
> Date: 21 August 2008 17:24:55 EDT
> To: Carsten Bormann <cabo@tzi.org>
> Cc: David Reitter <david.reitter@gmail.com>
> Subject: Re: German-postfix fix
>
> Aargh, wrong version (it pays to do another C-x v =), should be:
>
>
> --- latin-post.el.~1.31.~	2008-05-07 05:37:06.000000000 +0200
> +++ latin-post.el	2008-08-21 23:22:20.000000000 +0200
> @@ -1085,7 +1085,7 @@
> aee -> ae
> oe  -> ö
> oee -> oe
> -ue  -> ü
> +ue  -> ü (not after a/e/q)
> uee -> ue
> sz  -> ß
> szz -> sz
> @@ -1108,6 +1108,13 @@
>  ("UEE" ["UE"])
>  ("uee" ["ue"])
>  ("szz" ["sz"])
> +
> + ("eue" ["eue"])
> + ("Eue" ["Eue"])
> + ("aue" ["aue"])
> + ("Aue" ["Aue"])
> + ("que" ["que"])
> + ("Que" ["Que"])
>  )
>
> (quail-define-package
>
>
> Sorry, Carsten
>
> On Aug 21 2008, at 23:21, Carsten Bormann wrote:
>
>> David,
>>
>> would it be in your purview to apply this important little patch?
>> (I believe you speak German, so you should be able to understand  
>> why "Steür" and "Qülle" and "Maür" make no sense :-)
>>
>> Gruesse, Carsten
>>
>> --- latin-post.el.~1.31.~	2008-05-07 05:37:06.000000000 +0200
>> +++ latin-post.el	2008-08-21 23:16:29.000000000 +0200
>> @@ -1085,7 +1085,7 @@
>> aee -> ae
>> oe  -> ö
>> oee -> oe
>> -ue  -> ü
>> +ue  -> ü (not after a/e/q)
>> uee -> ue
>> sz  -> ß
>> szz -> sz
>> @@ -1108,6 +1108,13 @@
>> ("UEE" ["UE"])
>> ("uee" ["ue"])
>> ("szz" ["sz"])
>> +
>> + ("eue" ["eue"])
>> + ("Eue" ["eue"])
>> + ("aue" ["aue"])
>> + ("Aue" ["Aue"])
>> + ("que" ["que"])
>> + ("Que" ["Que"])
>> )
>>
>> (quail-define-package
>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2193 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: German-postfix fix for Quail
  2008-08-21 23:43       ` German-postfix fix for Quail David Reitter
@ 2008-08-22  1:25         ` Carsten Bormann
  2008-08-27 21:31           ` David Reitter
  2008-08-28 16:44           ` Stefan Monnier
  0 siblings, 2 replies; 4+ messages in thread
From: Carsten Bormann @ 2008-08-22  1:25 UTC (permalink / raw)
  To: David Reitter; +Cc: Carsten Bormann, Emacs-Devel devel

On Aug 22 2008, at 01:43, David Reitter wrote:

> For "eü", perhaps we should consider holding off.

But "eue" also happens to be the most likely of the combinations in  
German, with highly common words/forms like euer, neue, Steuer, teuer  
(333 occurrences of these in a random ~1M Gutenberg text)!
(These quite common words were what motivated me to make these little  
enhancements.)

Since "geübt" (and combinations like "ausgeübt") are actually  
relatively common word forms, I have added the rule

  ("ge" ["ge"])

below.  I believe anything else is well below diminishing returns.

> I presume Quail has some provisions for typing something like  
> "Yogaübung" even with your patch?

Typing Yogaübung, reüssieren etc. would need the typist to do  
something else that breaks the input sequence before typing the ü;  
e.g., when you see "Yogaue" and backspace twice, type ue again, you  
naturally get the right result.
Gymnastics like this are nothing new with german-postfix, e.g.  
Europäer has a similar problem today (gets turned into Europaer if you  
don't pay attention, which is even slightly worse).

Gruesse, Carsten

--- latin-post.el.~1.31.~	2008-05-07 05:37:06.000000000 +0200
+++ latin-post.el	2008-08-22 02:52:11.000000000 +0200
@@ -1085,7 +1085,7 @@
  aee -> ae
  oe  -> ö
  oee -> oe
-ue  -> ü
+ue  -> ü (not after a/e/q)
  uee -> ue
  sz  -> ß
  szz -> sz
@@ -1108,6 +1108,14 @@
   ("UEE" ["UE"])
   ("uee" ["ue"])
   ("szz" ["sz"])
+
+ ("ge" ["ge"])
+ ("eue" ["eue"])
+ ("Eue" ["Eue"])
+ ("aue" ["aue"])
+ ("Aue" ["Aue"])
+ ("que" ["que"])
+ ("Que" ["Que"])
   )

  (quail-define-package





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: German-postfix fix for Quail
  2008-08-22  1:25         ` Carsten Bormann
@ 2008-08-27 21:31           ` David Reitter
  2008-08-28 16:44           ` Stefan Monnier
  1 sibling, 0 replies; 4+ messages in thread
From: David Reitter @ 2008-08-27 21:31 UTC (permalink / raw)
  To: Emacs-Devel devel, ntakahas; +Cc: Carsten Bormann

[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]

On 21 Aug 2008, at 21:25, Carsten Bormann wrote:

> On Aug 22 2008, at 01:43, David Reitter wrote:
>
>> For "eü", perhaps we should consider holding off.
>
> But "eue" also happens to be the most likely of the combinations in  
> German, with highly common words/forms like euer, neue, Steuer,  
> teuer (333 occurrences of these in a random ~1M Gutenberg text)!
> (These quite common words were what motivated me to make these  
> little enhancements.)
>
> Since "geübt" (and combinations like "ausgeübt") are actually  
> relatively common word forms, I have added the rule
>
> ("ge" ["ge"])
> below.  I believe anything else is well below diminishing returns.

Since I haven't heard from anyone and I don't know whether to classify  
this as fix or as feature, I'll be committing Carsten's patch in two  
days.
Cc'ing Quail author.

I presume this is small enough to not require a consent form.



--- latin-post.el.~1.31.~	2008-05-07 05:37:06.000000000 +0200
+++ latin-post.el	2008-08-22 02:52:11.000000000 +0200
@@ -1085,7 +1085,7 @@
aee -> ae
oe  -> ö
oee -> oe
-ue  -> ü
+ue  -> ü (not after a/e/q)
uee -> ue
sz  -> ß
szz -> sz
@@ -1108,6 +1108,14 @@
  ("UEE" ["UE"])
  ("uee" ["ue"])
  ("szz" ["sz"])
+
+ ("ge" ["ge"])
+ ("eue" ["eue"])
+ ("Eue" ["Eue"])
+ ("aue" ["aue"])
+ ("Aue" ["Aue"])
+ ("que" ["que"])
+ ("Que" ["Que"])
  )

(quail-define-package



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2193 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: German-postfix fix for Quail
  2008-08-22  1:25         ` Carsten Bormann
  2008-08-27 21:31           ` David Reitter
@ 2008-08-28 16:44           ` Stefan Monnier
  1 sibling, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2008-08-28 16:44 UTC (permalink / raw)
  To: Carsten Bormann; +Cc: David Reitter, Emacs-Devel devel

> + ("ge" ["ge"])

This needs a comment.


        Stefan




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-28 16:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <56EC050C-FE4A-408A-8402-F8AFD817B5BA@tzi.org>
     [not found] ` <F8174BA2-E86D-4A07-BDDA-29A2F3037302@tzi.org>
     [not found]   ` <B2231AFA-B8CE-4A55-9BB2-8EAAA6716497@gmail.com>
     [not found]     ` <B38610C6-933F-4F40-A82D-5FABFE1D5397@tzi.org>
2008-08-21 23:43       ` German-postfix fix for Quail David Reitter
2008-08-22  1:25         ` Carsten Bormann
2008-08-27 21:31           ` David Reitter
2008-08-28 16:44           ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).