changing word boundaries

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* changing word boundaries
@ 2009-10-18 16:27 Ernest Adrogué
  2009-10-18 19:24 ` Peter Dyballa
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-18 16:27 UTC (permalink / raw)
  To: help-gnu-emacs

Hi there,

The Catalan language has a ligature consisting in one
"l" character, followed by a middle dot ("·"), followed
by another "l". See here for more details:
http://en.wikipedia.org/wiki/L·l#Catalan

Is there a way to make emacs aware of this, so that it
doesn't treat a word containing "l·l" as two separate
words?

Thanks.

PS. Please CC me, if you reply to this.

-- 
Ernest

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-10-18 16:27 changing word boundaries Ernest Adrogué
@ 2009-10-18 19:24 ` Peter Dyballa
  2009-10-18 21:19   ` Ernest Adrogué
  2009-10-18 21:08 ` Andreas Politz
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Peter Dyballa @ 2009-10-18 19:24 UTC (permalink / raw)
  To: Ernest Adrogué; +Cc: help-gnu-emacs


Am 18.10.2009 um 18:27 schrieb Ernest Adrogué:

> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?


How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U 
+0140. The problem is that · only between two l becomes a word  
constituent and in so many other cases it's a multiplication sign, a  
comma, a name separator, some kind of bullet sign...

--
Greetings

   Pete

The human animal differs from the lesser primates in his passion for  
lists of "Ten Best."
				– H. Allen Smith







^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-10-18 16:27 changing word boundaries Ernest Adrogué
  2009-10-18 19:24 ` Peter Dyballa
@ 2009-10-18 21:08 ` Andreas Politz
  2009-10-20  0:06   ` Ernest Adrogué
       [not found]   ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
  2009-10-18 21:09 ` Andreas Politz
       [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
  3 siblings, 2 replies; 11+ messages in thread
From: Andreas Politz @ 2009-10-18 21:08 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Ernest Adrogué

Ernest Adrogué <eadrogue@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>
> Thanks.
>
> PS. Please CC me, if you reply to this.



You could use dynamic syntax-tables via font-lock.

(add-hook 'text-mode-hook
          (lambda nil
            (set (make-variable-buffer-local
                  'parse-sexp-lookup-properties) t)
            ;; get font-lock started
            (unless font-lock-defaults
              (setq font-lock-defaults '(nil t)))
            (add-to-list
             (make-variable-buffer-local
              'font-lock-syntactic-keywords)
             ;; let ! between 2*a have word syntax
             '("a\\(!\\)a" 1 "w"))))


Replace `a' and `!' with your characters and it'll work,
hopefully.

-ap





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-10-18 16:27 changing word boundaries Ernest Adrogué
  2009-10-18 19:24 ` Peter Dyballa
  2009-10-18 21:08 ` Andreas Politz
@ 2009-10-18 21:09 ` Andreas Politz
       [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 11+ messages in thread
From: Andreas Politz @ 2009-10-18 21:09 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: Ernest Adrogué

Ernest Adrogué <eadrogue@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>
> Thanks.
>
> PS. Please CC me, if you reply to this.



You could use dynamic syntax-tables via font-lock.

(add-hook 'text-mode-hook
          (lambda nil
            (set (make-variable-buffer-local
                  'parse-sexp-lookup-properties) t)
            ;; get font-lock started
            (unless font-lock-defaults
              (setq font-lock-defaults '(nil t)))
            (add-to-list
             (make-variable-buffer-local
              'font-lock-syntactic-keywords)
             ;; let ! between 2*a have word syntax
             '("a\\(!\\)a" 1 "w"))))


Replace `a' and `!' with your characters and it'll work,
hopefully.

-ap





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-10-18 19:24 ` Peter Dyballa
@ 2009-10-18 21:19   ` Ernest Adrogué
  0 siblings, 0 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-18 21:19 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: help-gnu-emacs

Hallo,

18/10/09 @ 21:24 (+0200), thus spake Peter Dyballa:
> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

Seems the way to go, yes. Unfortunately, everybody still
uses the middle dot, for example, spell-checkers think ŀ is
a misspelling.

Cheers.

-- 
Ernest




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-10-18 21:08 ` Andreas Politz
@ 2009-10-20  0:06   ` Ernest Adrogué
       [not found]   ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-20  0:06 UTC (permalink / raw)
  To: help-gnu-emacs

18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:
> You could use dynamic syntax-tables via font-lock.
> 
> (add-hook 'text-mode-hook
>           (lambda nil
>             (set (make-variable-buffer-local
>                   'parse-sexp-lookup-properties) t)
>             ;; get font-lock started
>             (unless font-lock-defaults
>               (setq font-lock-defaults '(nil t)))
>             (add-to-list
>              (make-variable-buffer-local
>               'font-lock-syntactic-keywords)
>              ;; let ! between 2*a have word syntax
>              '("a\\(!\\)a" 1 "w"))))
> 
> 
> Replace `a' and `!' with your characters and it'll work,
> hopefully.

It does what I wanted. :)
Thanks!

Ernest




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
       [not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:09 ` Dave Love
  2009-11-08 17:07   ` Ernest Adrogué
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:09 UTC (permalink / raw)
  To: Ernest Adrogué; +Cc: help-gnu-emacs

Ernest Adrogué <eadrogue@gmx.net> writes:

> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?

[You're probably not really interested in word boundaries, just word
constituents.  For an illustration of the difference, see variable
`word-combining-categories' and what capitalized-words-mode does in
Emacs 23.]

You should define a Catalan language environment to be used in ca_ES
locales.  (I'm surprised I didn't do it, as there's a relevant input
method.)  It should set the base syntax of · to word, and set a suitable
default input method.  The existing one, `catalan-prefix', should
presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
and maybe needs other fixes.

The environment would be something like this (untested), which is
probably better then trying to use categories.  [The default Latin-1
character set is overridden in, say, ca_ES.UTF-8.]

  (push '("ca" . "Catalan") locale-language-names)

  (set-language-info-alist
   "Catalan" '((tutorial . "TUTORIAL.es")	; maybe...
  	    (charset iso-8859-1)
  	    (coding-system iso-latin-1 iso-latin-9)
  	    (coding-priority iso-latin-1)
  	    (input-method . "catalan-prefix")
  	    (nonascii-translation . iso-8859-1)
  	    (unibyte-display . iso-latin-1)
  	    (setup-function
  	     . (lambda ()
  		 (modify-syntax-entry ?· "w" (standard-syntax-table))))
  	    (exit-function
  	     . (lambda ()
  		 (modify-syntax-entry ?· "_" (standard-syntax-table))))
  	    ;; Fixme:
  	    ;; (sample-text . "Spanish (Español)	¡Hola!")
  	    (documentation . "\
  This language environment uses the Latin-1 character set, sets
  the default input method to \"catalan-prefix\", and sets the
  syntax of `·' to word.  It selects the Spanish tutorial, in the
  absence of a Catalan translation."))
   '("European"))

You could make a bug report if you have more luck than me with reports
about stuff I worked on.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
       [not found]   ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:10     ` Dave Love
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:10 UTC (permalink / raw)
  To: Ernest Adrogué; +Cc: help-gnu-emacs

Ernest Adrogué <eadrogue@gmx.net> writes:

> 18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:
>> You could use dynamic syntax-tables via font-lock.
>> 
>> (add-hook 'text-mode-hook
>>           (lambda nil
>>             (set (make-variable-buffer-local
>>                   'parse-sexp-lookup-properties) t)
>>             ;; get font-lock started
>>             (unless font-lock-defaults
>>               (setq font-lock-defaults '(nil t)))
>>             (add-to-list
>>              (make-variable-buffer-local
>>               'font-lock-syntactic-keywords)
>>              ;; let ! between 2*a have word syntax
>>              '("a\\(!\\)a" 1 "w"))))
>> 
>> 
>> Replace `a' and `!' with your characters and it'll work,
>> hopefully.
>
> It does what I wanted. :)

Well, it's a pretty odd way to do it.  If you really only want to use
the ligature in Text mode -- and not programming language comments, for
instance -- just amend `text-mode-syntax-table'.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
       [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:15   ` Dave Love
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:15 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...

It may be mis-used, but U+00B7 is MIDDLE DOT (punctuation).  BULLET is
U+2022 and the mathematical DOT OPERATOR is U+22C5.  It surely doesn't
really matter in this context anyhow.  A lot of character syntaxes have
long been wrong in Emacs anyhow.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-11-01 19:09 ` Dave Love
@ 2009-11-08 17:07   ` Ernest Adrogué
  2009-11-11 14:57     ` Kevin Rodgers
  0 siblings, 1 reply; 11+ messages in thread
From: Ernest Adrogué @ 2009-11-08 17:07 UTC (permalink / raw)
  To: Dave Love; +Cc: help-gnu-emacs

 1/11/09 @ 19:09 (+0000), thus spake Dave Love:
> Ernest Adrogué <eadrogue@gmx.net> writes:
> 
> > Hi there,
> >
> > The Catalan language has a ligature consisting in one
> > "l" character, followed by a middle dot ("·"), followed
> > by another "l". See here for more details:
> > http://en.wikipedia.org/wiki/L·l#Catalan
> >
> > Is there a way to make emacs aware of this, so that it
> > doesn't treat a word containing "l·l" as two separate
> > words?
> 
> [You're probably not really interested in word boundaries, just word
> constituents.  For an illustration of the difference, see variable
> `word-combining-categories' and what capitalized-words-mode does in
> Emacs 23.]
> 
> You should define a Catalan language environment to be used in ca_ES
> locales.  (I'm surprised I didn't do it, as there's a relevant input
> method.)  It should set the base syntax of · to word, and set a suitable
> default input method.  The existing one, `catalan-prefix', should
> presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
> and maybe needs other fixes.
> 
> The environment would be something like this (untested), which is
> probably better then trying to use categories.  [The default Latin-1
> character set is overridden in, say, ca_ES.UTF-8.]
>   
>   (push '("ca" . "Catalan") locale-language-names)
> 
>   (set-language-info-alist
>    "Catalan" '((tutorial . "TUTORIAL.es")	; maybe...
>   	    (charset iso-8859-1)
>   	    (coding-system iso-latin-1 iso-latin-9)
>   	    (coding-priority iso-latin-1)
>   	    (input-method . "catalan-prefix")
>   	    (nonascii-translation . iso-8859-1)
>   	    (unibyte-display . iso-latin-1)
>   	    (setup-function
>   	     . (lambda ()
>   		 (modify-syntax-entry ?· "w" (standard-syntax-table))))
>   	    (exit-function
>   	     . (lambda ()
>   		 (modify-syntax-entry ?· "_" (standard-syntax-table))))
>   	    ;; Fixme:
>   	    ;; (sample-text . "Spanish (Español)	¡Hola!")
>   	    (documentation . "\
>   This language environment uses the Latin-1 character set, sets
>   the default input method to \"catalan-prefix\", and sets the
>   syntax of `·' to word.  It selects the Spanish tutorial, in the
>   absence of a Catalan translation."))
>    '("European"))

Thanks a lot. Have you got any idea of where this should be
put in order to be loaded automatically at start-up?
I tried in init.el, and in a file in the "language" directory
in /usr/share/emacs/23.1/lisp/ to no avail.
It says that there's "no match", when I try to set the language
environment to Catalan interactively.

> You could make a bug report if you have more luck than me with reports
> about stuff I worked on.

I will try, once I get it to work :)

Cheers,

Ernest




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: changing word boundaries
  2009-11-08 17:07   ` Ernest Adrogué
@ 2009-11-11 14:57     ` Kevin Rodgers
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Rodgers @ 2009-11-11 14:57 UTC (permalink / raw)
  To: help-gnu-emacs

Ernest Adrogué wrote:
>  1/11/09 @ 19:09 (+0000), thus spake Dave Love:
>> Ernest Adrogué <eadrogue@gmx.net> writes:
>>
>>> Hi there,
>>>
>>> The Catalan language has a ligature consisting in one
>>> "l" character, followed by a middle dot ("·"), followed
>>> by another "l". See here for more details:
>>> http://en.wikipedia.org/wiki/L·l#Catalan
>>>
>>> Is there a way to make emacs aware of this, so that it
>>> doesn't treat a word containing "l·l" as two separate
>>> words?
>> [You're probably not really interested in word boundaries, just word
>> constituents.  For an illustration of the difference, see variable
>> `word-combining-categories' and what capitalized-words-mode does in
>> Emacs 23.]
>>
>> You should define a Catalan language environment to be used in ca_ES
>> locales.  (I'm surprised I didn't do it, as there's a relevant input
>> method.)  It should set the base syntax of · to word, and set a suitable
>> default input method.  The existing one, `catalan-prefix', should
>> presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
>> and maybe needs other fixes.
>>
>> The environment would be something like this (untested), which is
>> probably better then trying to use categories.  [The default Latin-1
>> character set is overridden in, say, ca_ES.UTF-8.]
>>   
>>   (push '("ca" . "Catalan") locale-language-names)
>>
>>   (set-language-info-alist
>>    "Catalan" '((tutorial . "TUTORIAL.es")	; maybe...
>>   	    (charset iso-8859-1)
>>   	    (coding-system iso-latin-1 iso-latin-9)
>>   	    (coding-priority iso-latin-1)
>>   	    (input-method . "catalan-prefix")
>>   	    (nonascii-translation . iso-8859-1)
>>   	    (unibyte-display . iso-latin-1)
>>   	    (setup-function
>>   	     . (lambda ()
>>   		 (modify-syntax-entry ?· "w" (standard-syntax-table))))
>>   	    (exit-function
>>   	     . (lambda ()
>>   		 (modify-syntax-entry ?· "_" (standard-syntax-table))))
>>   	    ;; Fixme:
>>   	    ;; (sample-text . "Spanish (Español)	¡Hola!")
>>   	    (documentation . "\
>>   This language environment uses the Latin-1 character set, sets
>>   the default input method to \"catalan-prefix\", and sets the
>>   syntax of `·' to word.  It selects the Spanish tutorial, in the
>>   absence of a Catalan translation."))
>>    '("European"))
> 
> Thanks a lot. Have you got any idea of where this should be
> put in order to be loaded automatically at start-up?

1. C-x C-f ~/.emacs

2. M-x find-library RET default.el

3. M-x find-library RET site-start.el

> I tried in init.el, and in a file in the "language" directory
> in /usr/share/emacs/23.1/lisp/ to no avail.
> It says that there's "no match", when I try to set the language
> environment to Catalan interactively.
> 
>> You could make a bug report if you have more luck than me with reports
>> about stuff I worked on.
> 
> I will try, once I get it to work :)
> 
> Cheers,
> 
> Ernest
> 
> 
> 


-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-11-11 14:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-18 16:27 changing word boundaries Ernest Adrogué
2009-10-18 19:24 ` Peter Dyballa
2009-10-18 21:19   ` Ernest Adrogué
2009-10-18 21:08 ` Andreas Politz
2009-10-20  0:06   ` Ernest Adrogué
     [not found]   ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:10     ` Dave Love
2009-10-18 21:09 ` Andreas Politz
     [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:15   ` Dave Love
     [not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:09 ` Dave Love
2009-11-08 17:07   ` Ernest Adrogué
2009-11-11 14:57     ` Kevin Rodgers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).