* changing word boundaries
@ 2009-10-18 16:27 Ernest Adrogué
2009-10-18 19:24 ` Peter Dyballa
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-18 16:27 UTC (permalink / raw)
To: help-gnu-emacs
Hi there,
The Catalan language has a ligature consisting in one
"l" character, followed by a middle dot ("·"), followed
by another "l". See here for more details:
http://en.wikipedia.org/wiki/L·l#Catalan
Is there a way to make emacs aware of this, so that it
doesn't treat a word containing "l·l" as two separate
words?
Thanks.
PS. Please CC me, if you reply to this.
--
Ernest
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-10-18 16:27 Ernest Adrogué
@ 2009-10-18 19:24 ` Peter Dyballa
2009-10-18 21:19 ` Ernest Adrogué
2009-10-18 21:08 ` Andreas Politz
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Peter Dyballa @ 2009-10-18 19:24 UTC (permalink / raw)
To: Ernest Adrogué; +Cc: help-gnu-emacs
Am 18.10.2009 um 18:27 schrieb Ernest Adrogué:
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
+0140. The problem is that · only between two l becomes a word
constituent and in so many other cases it's a multiplication sign, a
comma, a name separator, some kind of bullet sign...
--
Greetings
Pete
The human animal differs from the lesser primates in his passion for
lists of "Ten Best."
– H. Allen Smith
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-10-18 16:27 Ernest Adrogué
2009-10-18 19:24 ` Peter Dyballa
@ 2009-10-18 21:08 ` Andreas Politz
2009-10-20 0:06 ` Ernest Adrogué
[not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
2009-10-18 21:09 ` Andreas Politz
[not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
3 siblings, 2 replies; 11+ messages in thread
From: Andreas Politz @ 2009-10-18 21:08 UTC (permalink / raw)
To: help-gnu-emacs; +Cc: Ernest Adrogué
Ernest Adrogué <eadrogue@gmx.net> writes:
> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>
> Thanks.
>
> PS. Please CC me, if you reply to this.
You could use dynamic syntax-tables via font-lock.
(add-hook 'text-mode-hook
(lambda nil
(set (make-variable-buffer-local
'parse-sexp-lookup-properties) t)
;; get font-lock started
(unless font-lock-defaults
(setq font-lock-defaults '(nil t)))
(add-to-list
(make-variable-buffer-local
'font-lock-syntactic-keywords)
;; let ! between 2*a have word syntax
'("a\\(!\\)a" 1 "w"))))
Replace `a' and `!' with your characters and it'll work,
hopefully.
-ap
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-10-18 16:27 Ernest Adrogué
2009-10-18 19:24 ` Peter Dyballa
2009-10-18 21:08 ` Andreas Politz
@ 2009-10-18 21:09 ` Andreas Politz
[not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
3 siblings, 0 replies; 11+ messages in thread
From: Andreas Politz @ 2009-10-18 21:09 UTC (permalink / raw)
To: help-gnu-emacs; +Cc: Ernest Adrogué
Ernest Adrogué <eadrogue@gmx.net> writes:
> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
>
> Thanks.
>
> PS. Please CC me, if you reply to this.
You could use dynamic syntax-tables via font-lock.
(add-hook 'text-mode-hook
(lambda nil
(set (make-variable-buffer-local
'parse-sexp-lookup-properties) t)
;; get font-lock started
(unless font-lock-defaults
(setq font-lock-defaults '(nil t)))
(add-to-list
(make-variable-buffer-local
'font-lock-syntactic-keywords)
;; let ! between 2*a have word syntax
'("a\\(!\\)a" 1 "w"))))
Replace `a' and `!' with your characters and it'll work,
hopefully.
-ap
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-10-18 19:24 ` Peter Dyballa
@ 2009-10-18 21:19 ` Ernest Adrogué
0 siblings, 0 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-18 21:19 UTC (permalink / raw)
To: Peter Dyballa; +Cc: help-gnu-emacs
Hallo,
18/10/09 @ 21:24 (+0200), thus spake Peter Dyballa:
> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...
Seems the way to go, yes. Unfortunately, everybody still
uses the middle dot, for example, spell-checkers think ŀ is
a misspelling.
Cheers.
--
Ernest
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-10-18 21:08 ` Andreas Politz
@ 2009-10-20 0:06 ` Ernest Adrogué
[not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 11+ messages in thread
From: Ernest Adrogué @ 2009-10-20 0:06 UTC (permalink / raw)
To: help-gnu-emacs
18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:
> You could use dynamic syntax-tables via font-lock.
>
> (add-hook 'text-mode-hook
> (lambda nil
> (set (make-variable-buffer-local
> 'parse-sexp-lookup-properties) t)
> ;; get font-lock started
> (unless font-lock-defaults
> (setq font-lock-defaults '(nil t)))
> (add-to-list
> (make-variable-buffer-local
> 'font-lock-syntactic-keywords)
> ;; let ! between 2*a have word syntax
> '("a\\(!\\)a" 1 "w"))))
>
>
> Replace `a' and `!' with your characters and it'll work,
> hopefully.
It does what I wanted. :)
Thanks!
Ernest
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
[not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:09 ` Dave Love
2009-11-08 17:07 ` Ernest Adrogué
0 siblings, 1 reply; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:09 UTC (permalink / raw)
To: Ernest Adrogué; +Cc: help-gnu-emacs
Ernest Adrogué <eadrogue@gmx.net> writes:
> Hi there,
>
> The Catalan language has a ligature consisting in one
> "l" character, followed by a middle dot ("·"), followed
> by another "l". See here for more details:
> http://en.wikipedia.org/wiki/L·l#Catalan
>
> Is there a way to make emacs aware of this, so that it
> doesn't treat a word containing "l·l" as two separate
> words?
[You're probably not really interested in word boundaries, just word
constituents. For an illustration of the difference, see variable
`word-combining-categories' and what capitalized-words-mode does in
Emacs 23.]
You should define a Catalan language environment to be used in ca_ES
locales. (I'm surprised I didn't do it, as there's a relevant input
method.) It should set the base syntax of · to word, and set a suitable
default input method. The existing one, `catalan-prefix', should
presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
and maybe needs other fixes.
The environment would be something like this (untested), which is
probably better then trying to use categories. [The default Latin-1
character set is overridden in, say, ca_ES.UTF-8.]
(push '("ca" . "Catalan") locale-language-names)
(set-language-info-alist
"Catalan" '((tutorial . "TUTORIAL.es") ; maybe...
(charset iso-8859-1)
(coding-system iso-latin-1 iso-latin-9)
(coding-priority iso-latin-1)
(input-method . "catalan-prefix")
(nonascii-translation . iso-8859-1)
(unibyte-display . iso-latin-1)
(setup-function
. (lambda ()
(modify-syntax-entry ?· "w" (standard-syntax-table))))
(exit-function
. (lambda ()
(modify-syntax-entry ?· "_" (standard-syntax-table))))
;; Fixme:
;; (sample-text . "Spanish (Español) ¡Hola!")
(documentation . "\
This language environment uses the Latin-1 character set, sets
the default input method to \"catalan-prefix\", and sets the
syntax of `·' to word. It selects the Spanish tutorial, in the
absence of a Catalan translation."))
'("European"))
You could make a bug report if you have more luck than me with reports
about stuff I worked on.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
[not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:10 ` Dave Love
0 siblings, 0 replies; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:10 UTC (permalink / raw)
To: Ernest Adrogué; +Cc: help-gnu-emacs
Ernest Adrogué <eadrogue@gmx.net> writes:
> 18/10/09 @ 23:08 (+0200), thus spake Andreas Politz:
>> You could use dynamic syntax-tables via font-lock.
>>
>> (add-hook 'text-mode-hook
>> (lambda nil
>> (set (make-variable-buffer-local
>> 'parse-sexp-lookup-properties) t)
>> ;; get font-lock started
>> (unless font-lock-defaults
>> (setq font-lock-defaults '(nil t)))
>> (add-to-list
>> (make-variable-buffer-local
>> 'font-lock-syntactic-keywords)
>> ;; let ! between 2*a have word syntax
>> '("a\\(!\\)a" 1 "w"))))
>>
>>
>> Replace `a' and `!' with your characters and it'll work,
>> hopefully.
>
> It does what I wanted. :)
Well, it's a pretty odd way to do it. If you really only want to use
the ligature in Text mode -- and not programming language comments, for
instance -- just amend `text-mode-syntax-table'.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
[not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
@ 2009-11-01 19:15 ` Dave Love
0 siblings, 0 replies; 11+ messages in thread
From: Dave Love @ 2009-11-01 19:15 UTC (permalink / raw)
To: help-gnu-emacs
Peter Dyballa <Peter_Dyballa@Web.DE> writes:
> How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U
> +0140. The problem is that · only between two l becomes a word
> constituent and in so many other cases it's a multiplication sign, a
> comma, a name separator, some kind of bullet sign...
It may be mis-used, but U+00B7 is MIDDLE DOT (punctuation). BULLET is
U+2022 and the mathematical DOT OPERATOR is U+22C5. It surely doesn't
really matter in this context anyhow. A lot of character syntaxes have
long been wrong in Emacs anyhow.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-11-01 19:09 ` changing word boundaries Dave Love
@ 2009-11-08 17:07 ` Ernest Adrogué
2009-11-11 14:57 ` Kevin Rodgers
0 siblings, 1 reply; 11+ messages in thread
From: Ernest Adrogué @ 2009-11-08 17:07 UTC (permalink / raw)
To: Dave Love; +Cc: help-gnu-emacs
1/11/09 @ 19:09 (+0000), thus spake Dave Love:
> Ernest Adrogué <eadrogue@gmx.net> writes:
>
> > Hi there,
> >
> > The Catalan language has a ligature consisting in one
> > "l" character, followed by a middle dot ("·"), followed
> > by another "l". See here for more details:
> > http://en.wikipedia.org/wiki/L·l#Catalan
> >
> > Is there a way to make emacs aware of this, so that it
> > doesn't treat a word containing "l·l" as two separate
> > words?
>
> [You're probably not really interested in word boundaries, just word
> constituents. For an illustration of the difference, see variable
> `word-combining-categories' and what capitalized-words-mode does in
> Emacs 23.]
>
> You should define a Catalan language environment to be used in ca_ES
> locales. (I'm surprised I didn't do it, as there's a relevant input
> method.) It should set the base syntax of · to word, and set a suitable
> default input method. The existing one, `catalan-prefix', should
> presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
> and maybe needs other fixes.
>
> The environment would be something like this (untested), which is
> probably better then trying to use categories. [The default Latin-1
> character set is overridden in, say, ca_ES.UTF-8.]
>
> (push '("ca" . "Catalan") locale-language-names)
>
> (set-language-info-alist
> "Catalan" '((tutorial . "TUTORIAL.es") ; maybe...
> (charset iso-8859-1)
> (coding-system iso-latin-1 iso-latin-9)
> (coding-priority iso-latin-1)
> (input-method . "catalan-prefix")
> (nonascii-translation . iso-8859-1)
> (unibyte-display . iso-latin-1)
> (setup-function
> . (lambda ()
> (modify-syntax-entry ?· "w" (standard-syntax-table))))
> (exit-function
> . (lambda ()
> (modify-syntax-entry ?· "_" (standard-syntax-table))))
> ;; Fixme:
> ;; (sample-text . "Spanish (Español) ¡Hola!")
> (documentation . "\
> This language environment uses the Latin-1 character set, sets
> the default input method to \"catalan-prefix\", and sets the
> syntax of `·' to word. It selects the Spanish tutorial, in the
> absence of a Catalan translation."))
> '("European"))
Thanks a lot. Have you got any idea of where this should be
put in order to be loaded automatically at start-up?
I tried in init.el, and in a file in the "language" directory
in /usr/share/emacs/23.1/lisp/ to no avail.
It says that there's "no match", when I try to set the language
environment to Catalan interactively.
> You could make a bug report if you have more luck than me with reports
> about stuff I worked on.
I will try, once I get it to work :)
Cheers,
Ernest
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries
2009-11-08 17:07 ` Ernest Adrogué
@ 2009-11-11 14:57 ` Kevin Rodgers
0 siblings, 0 replies; 11+ messages in thread
From: Kevin Rodgers @ 2009-11-11 14:57 UTC (permalink / raw)
To: help-gnu-emacs
Ernest Adrogué wrote:
> 1/11/09 @ 19:09 (+0000), thus spake Dave Love:
>> Ernest Adrogué <eadrogue@gmx.net> writes:
>>
>>> Hi there,
>>>
>>> The Catalan language has a ligature consisting in one
>>> "l" character, followed by a middle dot ("·"), followed
>>> by another "l". See here for more details:
>>> http://en.wikipedia.org/wiki/L·l#Catalan
>>>
>>> Is there a way to make emacs aware of this, so that it
>>> doesn't treat a word containing "l·l" as two separate
>>> words?
>> [You're probably not really interested in word boundaries, just word
>> constituents. For an illustration of the difference, see variable
>> `word-combining-categories' and what capitalized-words-mode does in
>> Emacs 23.]
>>
>> You should define a Catalan language environment to be used in ca_ES
>> locales. (I'm surprised I didn't do it, as there's a relevant input
>> method.) It should set the base syntax of · to word, and set a suitable
>> default input method. The existing one, `catalan-prefix', should
>> presumably bind `~.' to `·', as in latin-prefix; it doesn't currently,
>> and maybe needs other fixes.
>>
>> The environment would be something like this (untested), which is
>> probably better then trying to use categories. [The default Latin-1
>> character set is overridden in, say, ca_ES.UTF-8.]
>>
>> (push '("ca" . "Catalan") locale-language-names)
>>
>> (set-language-info-alist
>> "Catalan" '((tutorial . "TUTORIAL.es") ; maybe...
>> (charset iso-8859-1)
>> (coding-system iso-latin-1 iso-latin-9)
>> (coding-priority iso-latin-1)
>> (input-method . "catalan-prefix")
>> (nonascii-translation . iso-8859-1)
>> (unibyte-display . iso-latin-1)
>> (setup-function
>> . (lambda ()
>> (modify-syntax-entry ?· "w" (standard-syntax-table))))
>> (exit-function
>> . (lambda ()
>> (modify-syntax-entry ?· "_" (standard-syntax-table))))
>> ;; Fixme:
>> ;; (sample-text . "Spanish (Español) ¡Hola!")
>> (documentation . "\
>> This language environment uses the Latin-1 character set, sets
>> the default input method to \"catalan-prefix\", and sets the
>> syntax of `·' to word. It selects the Spanish tutorial, in the
>> absence of a Catalan translation."))
>> '("European"))
>
> Thanks a lot. Have you got any idea of where this should be
> put in order to be loaded automatically at start-up?
1. C-x C-f ~/.emacs
2. M-x find-library RET default.el
3. M-x find-library RET site-start.el
> I tried in init.el, and in a file in the "language" directory
> in /usr/share/emacs/23.1/lisp/ to no avail.
> It says that there's "no match", when I try to set the language
> environment to Catalan interactively.
>
>> You could make a bug report if you have more luck than me with reports
>> about stuff I worked on.
>
> I will try, once I get it to work :)
>
> Cheers,
>
> Ernest
>
>
>
--
Kevin Rodgers
Denver, Colorado, USA
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-11-11 14:57 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:09 ` changing word boundaries Dave Love
2009-11-08 17:07 ` Ernest Adrogué
2009-11-11 14:57 ` Kevin Rodgers
2009-10-18 16:27 Ernest Adrogué
2009-10-18 19:24 ` Peter Dyballa
2009-10-18 21:19 ` Ernest Adrogué
2009-10-18 21:08 ` Andreas Politz
2009-10-20 0:06 ` Ernest Adrogué
[not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:10 ` Dave Love
2009-10-18 21:09 ` Andreas Politz
[not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>
2009-11-01 19:15 ` Dave Love
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).