searching for non ascii characters

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* searching for non ascii characters
@ 2005-08-02 20:27 Radomir Hejl
  2005-08-02 20:55 ` Peter Dyballa
       [not found] ` <mailman.2370.1123016502.20277.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Radomir Hejl @ 2005-08-02 20:27 UTC (permalink / raw)


Hello,
when in a text mode, I usually use input method. I am able to find any character
with C-s. After saving and reading the file from a disc non ascii characters
cannot be found. When I do C-u C-x C-= on non ascii char before saving I see
a charset latin-iso8859-2. Doing C-u C-x C-= after saving there's usually
mule-unicode-0100-24ff or latin-iso8859-1 charset.

So now I can only search with success for ascii chars. What should I trim in
emacs so that the searching be efficient?
I already asked in comp.emacs with no response.

Thanks, Radek.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching for non ascii characters
  2005-08-02 20:27 searching for non ascii characters Radomir Hejl
@ 2005-08-02 20:55 ` Peter Dyballa
       [not found] ` <mailman.2370.1123016502.20277.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Dyballa @ 2005-08-02 20:55 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 02.08.2005 um 22:27 schrieb Radomir Hejl:

> Hello,
> when in a text mode, I usually use input method. I am able to find any 
> character
> with C-s. After saving and reading the file from a disc non ascii 
> characters
> cannot be found. When I do C-u C-x C-= on non ascii char before saving 
> I see
> a charset latin-iso8859-2. Doing C-u C-x C-= after saving there's 
> usually
> mule-unicode-0100-24ff or latin-iso8859-1 charset.
>
> So now I can only search with success for ascii chars. What should I 
> trim in
> emacs so that the searching be efficient?
>

Put a line like

	;;; -*- mode: Text; coding: iso-8859-2; -*-

in the file's header. Could be a

	(prefer-coding-system   'iso-latin-2-unix)

is already OK. The environment variable LC_CTYPE is important: GNU 
Emacs sets a few things after this. In particular 
default-buffer-file-coding-system gets derived from this. Then there's 
file-coding-system-alist ...

--
Greetings

   Pete

Bake Pizza not war!

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <mailman.2370.1123016502.20277.help-gnu-emacs@gnu.org>]

* Re: searching for non ascii characters
       [not found] ` <mailman.2370.1123016502.20277.help-gnu-emacs@gnu.org>
@ 2005-08-03 13:28   ` rahed
  2005-08-03 14:09     ` Peter Dyballa
       [not found]     ` <mailman.2456.1123078766.20277.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 6+ messages in thread
From: rahed @ 2005-08-03 13:28 UTC (permalink / raw)


Peter Dyballa <Peter_Dyballa@Web.DE> writes:

>> with C-s. After saving and reading the file from a disc non ascii
>> characters
>> cannot be found. When I do C-u C-x C-= on non ascii char before
>
> Put a line like
>
> 	;;; -*- mode: Text; coding: iso-8859-2; -*-
>
> in the file's header. Could be a

I put the line as my first file line. The symptoms are unchanged.

> 	(prefer-coding-system   'iso-latin-2-unix)
>
> is already OK. The environment variable LC_CTYPE is important: GNU
> Emacs sets a few things after this. In particular
> default-buffer-file-coding-system gets derived from this. Then there's
> file-coding-system-alist ...

I also included (prefer-coding-system   'iso-latin-2-unix) in my .emacs (before I had cp1250).
So charsets are now as if I didn't do any changes.

character listing after writing and reading from a disc:

  character: š (01210241, 331937, 0x510a1)
    charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 33 33
     syntax: word
   category: l:Latin  
buffer code: 0x9C 0xF4 0xA1 0xA1
  file code: B9 (encoded by coding system iso-latin-2-unix)
       font: -outline-Courier New-normal-r-normal-normal-13-97-96-96-c-80-iso10646-1


Thank you, any hints appreciated.
Radek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching for non ascii characters
  2005-08-03 13:28   ` rahed
@ 2005-08-03 14:09     ` Peter Dyballa
       [not found]     ` <mailman.2456.1123078766.20277.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Dyballa @ 2005-08-03 14:09 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 03.08.2005 um 15:28 schrieb rahed@cwazy.co.uk:

>   character: š (01210241, 331937, 0x510a1)
>     charset: mule-unicode-0100-24ff (Unicode characters of the range 
> U+0100..U+24FF.)
>  code point: 33 33
>      syntax: word
>    category: l:Latin
> buffer code: 0x9C 0xF4 0xA1 0xA1
>   file code: B9 (encoded by coding system iso-latin-2-unix)
>        font: -outline-Courier 
> New-normal-r-normal-normal-13-97-96-96-c-80-iso10646-1
>

My own test file with ISO 8859-2 encoding has this in GNU Emacs 23:

         character: š (0541, 353, 0x161)
preferred charset: iso-8859-2 (ISO/IEC 8859/2)
        code point: 0xB9
            syntax: w 	which means: word
          category: j:Japanese   l:Latin
       buffer code: 0xC5 0xA1
         file code: 0xB9 (encoded by coding system iso-latin-2-unix)
           display: by this font (glyph code)
      
-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60-ISO10646-1 
(0x161)

and this in GNU Emacs 22 and 21.3:

   character: š (04471, 2361, 0x939, U+0161)
     charset: [latin-iso8859-2]
	     (Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2): 
ISO-IR-101.)
  code point: [57]
      syntax: w 	which means: word
    category: l:Latin
buffer code: 0x82 0xB9
   file code: 0xB9 (encoded by coding system iso-latin-2-unix)
     display: by this font (glyph code)
      
-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60-ISO8859-2 
(0xB9)

Both use the right charset and encoding. If you close and open again 
that file and it has that '-*- coding: iso-8859-2; -*-' in its header, 
among the first six or nine lines, Emacs should switch to that coding 
-- except you have at the file's end a block of local or file variables 
that say something different. Or it has a fixation to a specific 
coding-system. Did you launch your Emacs after changing .emacs? Can you 
check the variable's state (C-h v on this variable in .emacs in newly 
launched Emacs)? If it's something different than set then you either 
have this statement not executed or it exists more than once and gets 
reset some time after this line ... What does your file's tail look 
like?

The last thing I think of is the use of fontsets instead of fonts. What 
is your status?

Your file has at LATIN SMALL LETTER S WITH CARON's position the correct 
byte, 0xB9. So it is presumingly still correctly encoded. To see it in 
ISO/IEC 8859-2 you can revert-buffer-with-coding-system, C-x RET r 
CODING-SYSTEM. Use M-x list-coding-systems to see what your system has.

--
Greetings

   Pete

Windows, c'est un peu comme le beaujolais nouveau: à chaque nouvelle 
cuvée on sait que ce sera dégueulasse, mais on en prend quand même, par 
masochisme.

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <mailman.2456.1123078766.20277.help-gnu-emacs@gnu.org>]

* Re: searching for non ascii characters
       [not found]     ` <mailman.2456.1123078766.20277.help-gnu-emacs@gnu.org>
@ 2005-08-03 14:52       ` rahed
  2005-08-03 15:11         ` Peter Dyballa
  0 siblings, 1 reply; 6+ messages in thread
From: rahed @ 2005-08-03 14:52 UTC (permalink / raw)


Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> Both use the right charset and encoding. If you close and open again
> that file and it has that '-*- coding: iso-8859-2; -*-' in its header,
> among the first six or nine lines, Emacs should switch to that coding
> -- except you have at the file's end a block of local or file
> variables that say something different. Or it has a fixation to a
> specific coding-system. Did you launch your Emacs after changing
> .emacs? Can you check the variable's state (C-h v on this variable in
> .emacs in newly launched Emacs)? If it's something different than set
> then you either have this statement not executed or it exists more
> than once and gets reset some time after this line ... What does your
> file's tail look like?
>From C-h v, my prefer-coding-system's value is iso-latin-2. My test file now has only two lines, the relevant header and non ascii chars.

>
> The last thing I think of is the use of fontsets instead of
> fonts. What is your status?
I'm not sure what status, but M-x list-fontsets renders
Fontset: -*-*-*-*-*-*-*-*-*-*-*-*-fontset-default
Fontset: -*-courier new-normal-r-*-*-13-*-*-*-c-*-fontset-standard

> Your file has at LATIN SMALL LETTER S WITH CARON's position the
> correct byte, 0xB9. So it is presumingly still correctly encoded. To
> see it in ISO/IEC 8859-2 you can revert-buffer-with-coding-system, C-x
> RET r CODING-SYSTEM. Use M-x list-coding-systems to see what your
> system has.

I think there is no revert-buffer-with-coding-system function with my emacs (M-x apropos).
I can only revert-buffer (no coding system change). I use 21.3 on WXP.

Radek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching for non ascii characters
  2005-08-03 14:52       ` rahed
@ 2005-08-03 15:11         ` Peter Dyballa
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Dyballa @ 2005-08-03 15:11 UTC (permalink / raw)
  Cc: help-gnu-emacs

Am 03.08.2005 um 16:52 schrieb rahed@cwazy.co.uk:

> I think there is no revert-buffer-with-coding-system function with my 
> emacs (M-x apropos).
> I can only revert-buffer (no coding system change). I use 21.3 on WXP.
>

Could be, my 21.3.50 came from CVS ... probaly 21.4 isn't any better in 
this. I'm not with Windows, but when Unices have ready built Emacsen 
from CVS there should be some for Windows too.

--
Greetings

   Pete

Mac OS X is like a wigwam: no fences, no gates, but an apache inside.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-08-03 15:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-02 20:27 searching for non ascii characters Radomir Hejl
2005-08-02 20:55 ` Peter Dyballa
     [not found] ` <mailman.2370.1123016502.20277.help-gnu-emacs@gnu.org>
2005-08-03 13:28   ` rahed
2005-08-03 14:09     ` Peter Dyballa
     [not found]     ` <mailman.2456.1123078766.20277.help-gnu-emacs@gnu.org>
2005-08-03 14:52       ` rahed
2005-08-03 15:11         ` Peter Dyballa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.