* Re: changing word boundaries [not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org> @ 2009-11-01 19:09 ` Dave Love 2009-11-08 17:07 ` Ernest Adrogué 0 siblings, 1 reply; 11+ messages in thread From: Dave Love @ 2009-11-01 19:09 UTC (permalink / raw) To: Ernest Adrogué; +Cc: help-gnu-emacs Ernest Adrogué <eadrogue@gmx.net> writes: > Hi there, > > The Catalan language has a ligature consisting in one > "l" character, followed by a middle dot ("·"), followed > by another "l". See here for more details: > http://en.wikipedia.org/wiki/L·l#Catalan > > Is there a way to make emacs aware of this, so that it > doesn't treat a word containing "l·l" as two separate > words? [You're probably not really interested in word boundaries, just word constituents. For an illustration of the difference, see variable `word-combining-categories' and what capitalized-words-mode does in Emacs 23.] You should define a Catalan language environment to be used in ca_ES locales. (I'm surprised I didn't do it, as there's a relevant input method.) It should set the base syntax of · to word, and set a suitable default input method. The existing one, `catalan-prefix', should presumably bind `~.' to `·', as in latin-prefix; it doesn't currently, and maybe needs other fixes. The environment would be something like this (untested), which is probably better then trying to use categories. [The default Latin-1 character set is overridden in, say, ca_ES.UTF-8.] (push '("ca" . "Catalan") locale-language-names) (set-language-info-alist "Catalan" '((tutorial . "TUTORIAL.es") ; maybe... (charset iso-8859-1) (coding-system iso-latin-1 iso-latin-9) (coding-priority iso-latin-1) (input-method . "catalan-prefix") (nonascii-translation . iso-8859-1) (unibyte-display . iso-latin-1) (setup-function . (lambda () (modify-syntax-entry ?· "w" (standard-syntax-table)))) (exit-function . (lambda () (modify-syntax-entry ?· "_" (standard-syntax-table)))) ;; Fixme: ;; (sample-text . "Spanish (Español) ¡Hola!") (documentation . "\ This language environment uses the Latin-1 character set, sets the default input method to \"catalan-prefix\", and sets the syntax of `·' to word. It selects the Spanish tutorial, in the absence of a Catalan translation.")) '("European")) You could make a bug report if you have more luck than me with reports about stuff I worked on. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-11-01 19:09 ` changing word boundaries Dave Love @ 2009-11-08 17:07 ` Ernest Adrogué 2009-11-11 14:57 ` Kevin Rodgers 0 siblings, 1 reply; 11+ messages in thread From: Ernest Adrogué @ 2009-11-08 17:07 UTC (permalink / raw) To: Dave Love; +Cc: help-gnu-emacs 1/11/09 @ 19:09 (+0000), thus spake Dave Love: > Ernest Adrogué <eadrogue@gmx.net> writes: > > > Hi there, > > > > The Catalan language has a ligature consisting in one > > "l" character, followed by a middle dot ("·"), followed > > by another "l". See here for more details: > > http://en.wikipedia.org/wiki/L·l#Catalan > > > > Is there a way to make emacs aware of this, so that it > > doesn't treat a word containing "l·l" as two separate > > words? > > [You're probably not really interested in word boundaries, just word > constituents. For an illustration of the difference, see variable > `word-combining-categories' and what capitalized-words-mode does in > Emacs 23.] > > You should define a Catalan language environment to be used in ca_ES > locales. (I'm surprised I didn't do it, as there's a relevant input > method.) It should set the base syntax of · to word, and set a suitable > default input method. The existing one, `catalan-prefix', should > presumably bind `~.' to `·', as in latin-prefix; it doesn't currently, > and maybe needs other fixes. > > The environment would be something like this (untested), which is > probably better then trying to use categories. [The default Latin-1 > character set is overridden in, say, ca_ES.UTF-8.] > > (push '("ca" . "Catalan") locale-language-names) > > (set-language-info-alist > "Catalan" '((tutorial . "TUTORIAL.es") ; maybe... > (charset iso-8859-1) > (coding-system iso-latin-1 iso-latin-9) > (coding-priority iso-latin-1) > (input-method . "catalan-prefix") > (nonascii-translation . iso-8859-1) > (unibyte-display . iso-latin-1) > (setup-function > . (lambda () > (modify-syntax-entry ?· "w" (standard-syntax-table)))) > (exit-function > . (lambda () > (modify-syntax-entry ?· "_" (standard-syntax-table)))) > ;; Fixme: > ;; (sample-text . "Spanish (Español) ¡Hola!") > (documentation . "\ > This language environment uses the Latin-1 character set, sets > the default input method to \"catalan-prefix\", and sets the > syntax of `·' to word. It selects the Spanish tutorial, in the > absence of a Catalan translation.")) > '("European")) Thanks a lot. Have you got any idea of where this should be put in order to be loaded automatically at start-up? I tried in init.el, and in a file in the "language" directory in /usr/share/emacs/23.1/lisp/ to no avail. It says that there's "no match", when I try to set the language environment to Catalan interactively. > You could make a bug report if you have more luck than me with reports > about stuff I worked on. I will try, once I get it to work :) Cheers, Ernest ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-11-08 17:07 ` Ernest Adrogué @ 2009-11-11 14:57 ` Kevin Rodgers 0 siblings, 0 replies; 11+ messages in thread From: Kevin Rodgers @ 2009-11-11 14:57 UTC (permalink / raw) To: help-gnu-emacs Ernest Adrogué wrote: > 1/11/09 @ 19:09 (+0000), thus spake Dave Love: >> Ernest Adrogué <eadrogue@gmx.net> writes: >> >>> Hi there, >>> >>> The Catalan language has a ligature consisting in one >>> "l" character, followed by a middle dot ("·"), followed >>> by another "l". See here for more details: >>> http://en.wikipedia.org/wiki/L·l#Catalan >>> >>> Is there a way to make emacs aware of this, so that it >>> doesn't treat a word containing "l·l" as two separate >>> words? >> [You're probably not really interested in word boundaries, just word >> constituents. For an illustration of the difference, see variable >> `word-combining-categories' and what capitalized-words-mode does in >> Emacs 23.] >> >> You should define a Catalan language environment to be used in ca_ES >> locales. (I'm surprised I didn't do it, as there's a relevant input >> method.) It should set the base syntax of · to word, and set a suitable >> default input method. The existing one, `catalan-prefix', should >> presumably bind `~.' to `·', as in latin-prefix; it doesn't currently, >> and maybe needs other fixes. >> >> The environment would be something like this (untested), which is >> probably better then trying to use categories. [The default Latin-1 >> character set is overridden in, say, ca_ES.UTF-8.] >> >> (push '("ca" . "Catalan") locale-language-names) >> >> (set-language-info-alist >> "Catalan" '((tutorial . "TUTORIAL.es") ; maybe... >> (charset iso-8859-1) >> (coding-system iso-latin-1 iso-latin-9) >> (coding-priority iso-latin-1) >> (input-method . "catalan-prefix") >> (nonascii-translation . iso-8859-1) >> (unibyte-display . iso-latin-1) >> (setup-function >> . (lambda () >> (modify-syntax-entry ?· "w" (standard-syntax-table)))) >> (exit-function >> . (lambda () >> (modify-syntax-entry ?· "_" (standard-syntax-table)))) >> ;; Fixme: >> ;; (sample-text . "Spanish (Español) ¡Hola!") >> (documentation . "\ >> This language environment uses the Latin-1 character set, sets >> the default input method to \"catalan-prefix\", and sets the >> syntax of `·' to word. It selects the Spanish tutorial, in the >> absence of a Catalan translation.")) >> '("European")) > > Thanks a lot. Have you got any idea of where this should be > put in order to be loaded automatically at start-up? 1. C-x C-f ~/.emacs 2. M-x find-library RET default.el 3. M-x find-library RET site-start.el > I tried in init.el, and in a file in the "language" directory > in /usr/share/emacs/23.1/lisp/ to no avail. > It says that there's "no match", when I try to set the language > environment to Catalan interactively. > >> You could make a bug report if you have more luck than me with reports >> about stuff I worked on. > > I will try, once I get it to work :) > > Cheers, > > Ernest > > > -- Kevin Rodgers Denver, Colorado, USA ^ permalink raw reply [flat|nested] 11+ messages in thread
* changing word boundaries @ 2009-10-18 16:27 Ernest Adrogué 2009-10-18 19:24 ` Peter Dyballa ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Ernest Adrogué @ 2009-10-18 16:27 UTC (permalink / raw) To: help-gnu-emacs Hi there, The Catalan language has a ligature consisting in one "l" character, followed by a middle dot ("·"), followed by another "l". See here for more details: http://en.wikipedia.org/wiki/L·l#Catalan Is there a way to make emacs aware of this, so that it doesn't treat a word containing "l·l" as two separate words? Thanks. PS. Please CC me, if you reply to this. -- Ernest ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-10-18 16:27 Ernest Adrogué @ 2009-10-18 19:24 ` Peter Dyballa 2009-10-18 21:19 ` Ernest Adrogué 2009-10-18 21:08 ` Andreas Politz ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Peter Dyballa @ 2009-10-18 19:24 UTC (permalink / raw) To: Ernest Adrogué; +Cc: help-gnu-emacs Am 18.10.2009 um 18:27 schrieb Ernest Adrogué: > Is there a way to make emacs aware of this, so that it > doesn't treat a word containing "l·l" as two separate > words? How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U +0140. The problem is that · only between two l becomes a word constituent and in so many other cases it's a multiplication sign, a comma, a name separator, some kind of bullet sign... -- Greetings Pete The human animal differs from the lesser primates in his passion for lists of "Ten Best." – H. Allen Smith ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-10-18 19:24 ` Peter Dyballa @ 2009-10-18 21:19 ` Ernest Adrogué 0 siblings, 0 replies; 11+ messages in thread From: Ernest Adrogué @ 2009-10-18 21:19 UTC (permalink / raw) To: Peter Dyballa; +Cc: help-gnu-emacs Hallo, 18/10/09 @ 21:24 (+0200), thus spake Peter Dyballa: > How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U > +0140. The problem is that · only between two l becomes a word > constituent and in so many other cases it's a multiplication sign, a > comma, a name separator, some kind of bullet sign... Seems the way to go, yes. Unfortunately, everybody still uses the middle dot, for example, spell-checkers think ŀ is a misspelling. Cheers. -- Ernest ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-10-18 16:27 Ernest Adrogué 2009-10-18 19:24 ` Peter Dyballa @ 2009-10-18 21:08 ` Andreas Politz 2009-10-20 0:06 ` Ernest Adrogué [not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org> 2009-10-18 21:09 ` Andreas Politz [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org> 3 siblings, 2 replies; 11+ messages in thread From: Andreas Politz @ 2009-10-18 21:08 UTC (permalink / raw) To: help-gnu-emacs; +Cc: Ernest Adrogué Ernest Adrogué <eadrogue@gmx.net> writes: > Hi there, > > The Catalan language has a ligature consisting in one > "l" character, followed by a middle dot ("·"), followed > by another "l". See here for more details: > http://en.wikipedia.org/wiki/L·l#Catalan > > Is there a way to make emacs aware of this, so that it > doesn't treat a word containing "l·l" as two separate > words? > > Thanks. > > PS. Please CC me, if you reply to this. You could use dynamic syntax-tables via font-lock. (add-hook 'text-mode-hook (lambda nil (set (make-variable-buffer-local 'parse-sexp-lookup-properties) t) ;; get font-lock started (unless font-lock-defaults (setq font-lock-defaults '(nil t))) (add-to-list (make-variable-buffer-local 'font-lock-syntactic-keywords) ;; let ! between 2*a have word syntax '("a\\(!\\)a" 1 "w")))) Replace `a' and `!' with your characters and it'll work, hopefully. -ap ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-10-18 21:08 ` Andreas Politz @ 2009-10-20 0:06 ` Ernest Adrogué [not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 11+ messages in thread From: Ernest Adrogué @ 2009-10-20 0:06 UTC (permalink / raw) To: help-gnu-emacs 18/10/09 @ 23:08 (+0200), thus spake Andreas Politz: > You could use dynamic syntax-tables via font-lock. > > (add-hook 'text-mode-hook > (lambda nil > (set (make-variable-buffer-local > 'parse-sexp-lookup-properties) t) > ;; get font-lock started > (unless font-lock-defaults > (setq font-lock-defaults '(nil t))) > (add-to-list > (make-variable-buffer-local > 'font-lock-syntactic-keywords) > ;; let ! between 2*a have word syntax > '("a\\(!\\)a" 1 "w")))) > > > Replace `a' and `!' with your characters and it'll work, > hopefully. It does what I wanted. :) Thanks! Ernest ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org>]
* Re: changing word boundaries [not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org> @ 2009-11-01 19:10 ` Dave Love 0 siblings, 0 replies; 11+ messages in thread From: Dave Love @ 2009-11-01 19:10 UTC (permalink / raw) To: Ernest Adrogué; +Cc: help-gnu-emacs Ernest Adrogué <eadrogue@gmx.net> writes: > 18/10/09 @ 23:08 (+0200), thus spake Andreas Politz: >> You could use dynamic syntax-tables via font-lock. >> >> (add-hook 'text-mode-hook >> (lambda nil >> (set (make-variable-buffer-local >> 'parse-sexp-lookup-properties) t) >> ;; get font-lock started >> (unless font-lock-defaults >> (setq font-lock-defaults '(nil t))) >> (add-to-list >> (make-variable-buffer-local >> 'font-lock-syntactic-keywords) >> ;; let ! between 2*a have word syntax >> '("a\\(!\\)a" 1 "w")))) >> >> >> Replace `a' and `!' with your characters and it'll work, >> hopefully. > > It does what I wanted. :) Well, it's a pretty odd way to do it. If you really only want to use the ligature in Text mode -- and not programming language comments, for instance -- just amend `text-mode-syntax-table'. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: changing word boundaries 2009-10-18 16:27 Ernest Adrogué 2009-10-18 19:24 ` Peter Dyballa 2009-10-18 21:08 ` Andreas Politz @ 2009-10-18 21:09 ` Andreas Politz [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org> 3 siblings, 0 replies; 11+ messages in thread From: Andreas Politz @ 2009-10-18 21:09 UTC (permalink / raw) To: help-gnu-emacs; +Cc: Ernest Adrogué Ernest Adrogué <eadrogue@gmx.net> writes: > Hi there, > > The Catalan language has a ligature consisting in one > "l" character, followed by a middle dot ("·"), followed > by another "l". See here for more details: > http://en.wikipedia.org/wiki/L·l#Catalan > > Is there a way to make emacs aware of this, so that it > doesn't treat a word containing "l·l" as two separate > words? > > Thanks. > > PS. Please CC me, if you reply to this. You could use dynamic syntax-tables via font-lock. (add-hook 'text-mode-hook (lambda nil (set (make-variable-buffer-local 'parse-sexp-lookup-properties) t) ;; get font-lock started (unless font-lock-defaults (setq font-lock-defaults '(nil t))) (add-to-list (make-variable-buffer-local 'font-lock-syntactic-keywords) ;; let ! between 2*a have word syntax '("a\\(!\\)a" 1 "w")))) Replace `a' and `!' with your characters and it'll work, hopefully. -ap ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org>]
* Re: changing word boundaries [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org> @ 2009-11-01 19:15 ` Dave Love 0 siblings, 0 replies; 11+ messages in thread From: Dave Love @ 2009-11-01 19:15 UTC (permalink / raw) To: help-gnu-emacs Peter Dyballa <Peter_Dyballa@Web.DE> writes: > How about using ŀ? It's LATIN SMALL LETTER L WITH MIDDLE DOT at U > +0140. The problem is that · only between two l becomes a word > constituent and in so many other cases it's a multiplication sign, a > comma, a name separator, some kind of bullet sign... It may be mis-used, but U+00B7 is MIDDLE DOT (punctuation). BULLET is U+2022 and the mathematical DOT OPERATOR is U+22C5. It surely doesn't really matter in this context anyhow. A lot of character syntaxes have long been wrong in Emacs anyhow. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-11-11 14:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <mailman.9059.1255887881.2239.help-gnu-emacs@gnu.org> 2009-11-01 19:09 ` changing word boundaries Dave Love 2009-11-08 17:07 ` Ernest Adrogué 2009-11-11 14:57 ` Kevin Rodgers 2009-10-18 16:27 Ernest Adrogué 2009-10-18 19:24 ` Peter Dyballa 2009-10-18 21:19 ` Ernest Adrogué 2009-10-18 21:08 ` Andreas Politz 2009-10-20 0:06 ` Ernest Adrogué [not found] ` <mailman.9139.1255997204.2239.help-gnu-emacs@gnu.org> 2009-11-01 19:10 ` Dave Love 2009-10-18 21:09 ` Andreas Politz [not found] ` <mailman.9065.1255893858.2239.help-gnu-emacs@gnu.org> 2009-11-01 19:15 ` Dave Love
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).