all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Emacspeak and UTF-8 -- possible?
@ 2007-08-03 19:24 cmr.Pent
  2007-08-07 10:41 ` Tim X
  0 siblings, 1 reply; 19+ messages in thread
From: cmr.Pent @ 2007-08-03 19:24 UTC (permalink / raw
  To: help-gnu-emacs

I'm having the following problem: when I run emacspeak, I cannot input
or load cyrillic characters. My emacs version is 21.4, my language
environment is UTF-8, and in plain emacs cyrillics work just fine.
When I invoke emacs using "emacspeak" command, all non-latin
characters turn into umlauts.

The funny thing is that even on gnu.org site (I use w3m) some unicode
symbols (like quotation marks) turn into umlauts. I have a suspicion
that the problem is in emacspeaks' inability to work with multibyte
characters but I'm not sure because of my poor understanding of emacs'
character coding internals.

I'm really only a occasional user of emacspeak, but nevertheless I'd
be glad to know if there is a way to overcome the problem. For now it
would be Ok if cyrillics are displayed properly even if they are not
read aloud at all or are mistakenly read as umlauts.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-03 19:24 Emacspeak and UTF-8 -- possible? cmr.Pent
@ 2007-08-07 10:41 ` Tim X
  2007-08-08  9:09   ` cmr.Pent
  2007-08-08 14:56   ` Stefan Monnier
  0 siblings, 2 replies; 19+ messages in thread
From: Tim X @ 2007-08-07 10:41 UTC (permalink / raw
  To: help-gnu-emacs

"cmr.Pent@gmail.com" <cmr.Pent@gmail.com> writes:

> I'm having the following problem: when I run emacspeak, I cannot input
> or load cyrillic characters. My emacs version is 21.4, my language
> environment is UTF-8, and in plain emacs cyrillics work just fine.
> When I invoke emacs using "emacspeak" command, all non-latin
> characters turn into umlauts.
>
> The funny thing is that even on gnu.org site (I use w3m) some unicode
> symbols (like quotation marks) turn into umlauts. I have a suspicion
> that the problem is in emacspeaks' inability to work with multibyte
> characters but I'm not sure because of my poor understanding of emacs'
> character coding internals.
>
> I'm really only a occasional user of emacspeak, but nevertheless I'd
> be glad to know if there is a way to overcome the problem. For now it
> would be Ok if cyrillics are displayed properly even if they are not
> read aloud at all or are mistakenly read as umlauts.
>

Emacspeak AFAIK doesn't support multi-byte characters. The problem is that many
speech synthesises, particularly older hardware based ones like the dectalk,
don't understand UTF-8 character sets. If you send them a multibyte character,
they either lock up, speak garbage or do something else unexpected. 

Therre have been some branches that have forked off the original emacspeak
code, but I'm not sure what their status is or where you could find out more
(google?). There is also an emacspeak mailing list. 

One thing you could try, assuming your TTS synth handles multibyte characters
is to ensure the variable emacspeak-unibyte is not set to t

e.g. 

,----[ C-h v emacspeak-unibyte RET ]
| emacspeak-unibyte is a variable defined in `emacspeak-setup.el'.
| Its value is t
| 
| 
| Documentation:
| Set this to nil before starting  emacspeak
| if you are running in a multibyte enabled environment.
| 
| [back]
`----

This may or may not help, just a guess really.

Tim

-- 
tcross (at) rapttech dot com dot au

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-07 10:41 ` Tim X
@ 2007-08-08  9:09   ` cmr.Pent
  2007-08-08 14:56   ` Stefan Monnier
  1 sibling, 0 replies; 19+ messages in thread
From: cmr.Pent @ 2007-08-08  9:09 UTC (permalink / raw
  To: help-gnu-emacs

Thanks, I was able to solve problem. The solution was to hack Debian
emacspeak startup shell script and emacspeak-setup.el. After that
latin characters are read very nicely (I use eflite engine), and
cyrillics are displayed (and can be inputed) just fine as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-07 10:41 ` Tim X
  2007-08-08  9:09   ` cmr.Pent
@ 2007-08-08 14:56   ` Stefan Monnier
  2007-08-08 23:44     ` Robert D. Crawford
  2007-08-10  5:27     ` Emacspeak and UTF-8 -- possible? Tim X
  1 sibling, 2 replies; 19+ messages in thread
From: Stefan Monnier @ 2007-08-08 14:56 UTC (permalink / raw
  To: help-gnu-emacs

> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
> that many speech synthesises, particularly older hardware based ones like
> the dectalk, don't understand UTF-8 character sets.  If you send them
> a multibyte character, they either lock up, speak garbage or do something
> else unexpected.

That's not a good reason to prevent display of any other char.


        Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-08 14:56   ` Stefan Monnier
@ 2007-08-08 23:44     ` Robert D. Crawford
  2007-08-09 17:44       ` Stefan Monnier
  2007-08-10  5:39       ` Tim X
  2007-08-10  5:27     ` Emacspeak and UTF-8 -- possible? Tim X
  1 sibling, 2 replies; 19+ messages in thread
From: Robert D. Crawford @ 2007-08-08 23:44 UTC (permalink / raw
  To: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>> that many speech synthesises, particularly older hardware based ones like
>> the dectalk, don't understand UTF-8 character sets.  If you send them
>> a multibyte character, they either lock up, speak garbage or do something
>> else unexpected.
>
> That's not a good reason to prevent display of any other char.

Does any other char mean UTF-8?  If this is the case, wouldn't you agree
that it is better to not have UTF-8 support than to not be able to use
the computer because your speech synth locks up unexpectedly and often?

rdc
-- 
Robert D. Crawford                                      rdc1x@comcast.net

You will attract cultured and artistic people to your home.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-08 23:44     ` Robert D. Crawford
@ 2007-08-09 17:44       ` Stefan Monnier
  2007-08-10  7:40         ` Tim X
  2007-08-10  5:39       ` Tim X
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2007-08-09 17:44 UTC (permalink / raw
  To: help-gnu-emacs

>>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>>> that many speech synthesises, particularly older hardware based ones like
>>> the dectalk, don't understand UTF-8 character sets.  If you send them
>>> a multibyte character, they either lock up, speak garbage or do something
>>> else unexpected.
>> 
>> That's not a good reason to prevent display of any other char.

> Does any other char mean UTF-8?  If this is the case, wouldn't you agree
> that it is better to not have UTF-8 support than to not be able to use
> the computer because your speech synth locks up unexpectedly and often?

No, I'm saying that the place where they placed the check to filter out
unwanted chars is wrong.  They should have Emacs accept any random encoding
as always, and then encode/filter the text they send to the
underlying process.

Emacs constantly encodes decodes text between different encodings.  E.g. If
you visit a latin-1 file, it gets decoded into Emacs's internal
representation, and when you save it, it gets re-encoded into latin-1
(unless you've decided to change the file's encoding in which case it may
be reencoded in any other coding-system).

So if the speech process only understands latin-1, they should simply set
the coding-system used for that process accordingly and everything should
just work.  They may encounter difficulties finding the proper coding-system
that handles unencodable chars (e.g. cyrillic chars with
a latin-1 coding-system) in the way they want (e.g. drop the char
altogether or replace it with a "?" or some other special char), but people
on emacs-devel@gnu.org will be happy to help resolve those.


        Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-08 14:56   ` Stefan Monnier
  2007-08-08 23:44     ` Robert D. Crawford
@ 2007-08-10  5:27     ` Tim X
  2007-08-11  4:02       ` Stefan Monnier
  1 sibling, 1 reply; 19+ messages in thread
From: Tim X @ 2007-08-10  5:27 UTC (permalink / raw
  To: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>> that many speech synthesises, particularly older hardware based ones like
>> the dectalk, don't understand UTF-8 character sets.  If you send them
>> a multibyte character, they either lock up, speak garbage or do something
>> else unexpected.
>
> That's not a good reason to prevent display of any other char.
>
>
>         Stefan

Nobody said it was a good reason, its just the reason.
I'd suggest you need to have familiarity with how it works and its 
internals and history before you can make a judgement, 

Given your contributions and long involvement with emacs 
I'm surprised to see you make such an uninformed 
statement which contributes so little. 


Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-08 23:44     ` Robert D. Crawford
  2007-08-09 17:44       ` Stefan Monnier
@ 2007-08-10  5:39       ` Tim X
  2007-08-10  9:23         ` grep-find question (Is it a bug of GunWin32 version of "grep") brianjiang
  1 sibling, 1 reply; 19+ messages in thread
From: Tim X @ 2007-08-10  5:39 UTC (permalink / raw
  To: help-gnu-emacs

"Robert D. Crawford" <rdc1x@comcast.net> writes:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>>> that many speech synthesises, particularly older hardware based ones like
>>> the dectalk, don't understand UTF-8 character sets.  If you send them
>>> a multibyte character, they either lock up, speak garbage or do something
>>> else unexpected.
>>
>> That's not a good reason to prevent display of any other char.
>
> Does any other char mean UTF-8?  If this is the case, wouldn't you agree
> that it is better to not have UTF-8 support than to not be able to use
> the computer because your speech synth locks up unexpectedly and often?
>

The problem is a lack of familiarity with how emacspeak achieves what 
it does so simply. As you would know, making emacspeak support other 
encodings would involve a complete re-write of its internals - in fact,
a whole new translation layer would be required in order to enable full
support of emacs' supported character encodings *and* support both 
TTS engines that do and do not support encodings other than ASCII. 
I suspect that in order to keep it as flexible as it is now with 
respect to how little is required to support various add on
packages, a totally new architecture would be required. This is
also likely to introduce additional processing overhead which could
easily degrade the real time responsiveness to the point where the
system is not usable, particularly when using some of the free TTS
engines, which already are only just acceptable with respect to 
responsiveness. 

Stefan makes some valuable contributions, especially to this group, but
in this instance, his opinion is not based on anything of substance 
and contributes nothing of relevance. 

Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-09 17:44       ` Stefan Monnier
@ 2007-08-10  7:40         ` Tim X
  0 siblings, 0 replies; 19+ messages in thread
From: Tim X @ 2007-08-10  7:40 UTC (permalink / raw
  To: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>>>> that many speech synthesises, particularly older hardware based ones like
>>>> the dectalk, don't understand UTF-8 character sets.  If you send them
>>>> a multibyte character, they either lock up, speak garbage or do something
>>>> else unexpected.
>>> 
>>> That's not a good reason to prevent display of any other char.
>
>> Does any other char mean UTF-8?  If this is the case, wouldn't you agree
>> that it is better to not have UTF-8 support than to not be able to use
>> the computer because your speech synth locks up unexpectedly and often?
>
> No, I'm saying that the place where they placed the check to filter out
> unwanted chars is wrong.  They should have Emacs accept any random encoding
> as always, and then encode/filter the text they send to the
> underlying process.
>

Yes, the way emacspeak handles it is wrong given emacs' current
internals and how it deals with the issue. But you have totally
overlooked the fact that emacspeak was designed and developed 
before emacs had this capability. If you were implementing 
emacspeak today, then this is likely how you would do it. 

> Emacs constantly encodes decodes text between different encodings.  E.g. If
> you visit a latin-1 file, it gets decoded into Emacs's internal
> representation, and when you save it, it gets re-encoded into latin-1
> (unless you've decided to change the file's encoding in which case it may
> be reencoded in any other coding-system).

Yes, and how many years has it taken to get this working well 
and reliably? This is not a criticism, just pointing out that 
this wasn't a trivial change. Likewise, it is not a trivial 
change with emacspeak, which is the largest and possibly most 
complex of all the add-on emacs packages I've seen.
>
> So if the speech process only understands latin-1, they should simply set
> the coding-system used for that process accordingly and everything should
> just work.  They may encounter difficulties finding the proper coding-system
> that handles unencodable chars (e.g. cyrillic chars with
> a latin-1 coding-system) in the way they want (e.g. drop the char
> altogether or replace it with a "?" or some other special char), but people
> on emacs-devel@gnu.org will be happy to help resolve those.

Again, I generally agree and this is along the lines of previous
discussions on the topic amongst emacspeak users. However, your 
description makes it sound like all that needs to be done is a couple
of minor changes. This is not the case. The design of emacspeak 
was done back when essentially all you had to worry about was ascii 
characters and at the time, you essentially had one decent quality 
hardware synthesiser which could only handle the basic ascii character set. I
don't think it even handled extended ascii. 

There were decisions made, which in hindsight were probably 
incorrect. For example, emacspeak does a fair amount of processing
of characters prior to sending them to the speech device - in fact,
it sends them to an intermediate layer written in TCL which does 
further processing. Originally, a lot of the internal processing within 
the elisp part of emacspeak was not modular or done in a single 
location that would make it easy to change. There are also a number
of other issues about how to process these characters, what to translate
and what to translate to, determining when to translate and when not
to and how to control all of this to get the best results while keeping
the whole system as responsive as possible. 

However, the main issue I have with your analysis is that you 
obviously don't understand what emacspeak does and how it 
works. It is not simply a screen reader that just sends 
the text as it appears on the screen to a TTS engine.
Emacspeak adds a lot of contextual information, which is one of its 
strengths and what makes it so much more than a 'dumb' screen reader.
Other systems, like speechd handles character encodings better in this
respect, but it is simpler in design and has the advantage of being designed 
after emacs had itself incorporated support for various encodings. 
It also has the advantage of using a speech interface that has also been 
designed with multi character encoding support. 

As emacs' own handling of character encodings has matured, work
has been going on to refactor emacspeak code to make the necessary
changes to support various encodings easier. Over the last couple 
of years, TTS synthesisers have improved and a growing number 
now  support UTF-8 and other encodings. The TCL interface layer has 
now got support for handling different character encodings etc. So, in
many ways, some of the required basics are in place to make the 
necessary changes, but it is still a major undertaking. So major 
in fact that everyone who has previously started looking at this has
generally decided it was too much work for just one person.

You also need to realise that for the majority of emacspeak users,
what is displayed on the screen is irrelevant. In fact, I know of
a number of emacspeak users who don't even use a screen at all. 
Remember that emacspeak is a specialised add-on with a targetted 
audience, not a general purpose emacs package. The fact it limits
the range of characters that can be displayed (and even entered)  to
what can be turned into speech by the TTS engine was not an issue 
for most users and has only relatively recently become an issue 
because of the evolution of both emacs and available TTS engines.

Until demand is sufficient that enough people are prepared to actually
do the work needed to change how emacspeak works, nothing will change
making sweeping generalised statements about how it is wrong achieves 
nothing and contributes even less and totally fails to recognise
the hard and very innovative work of the author in not only providing
the first really functional interface on Linux for blind and VI users, 
but also in demonstrating radically different approaches to computer
interfaces for those requiring assistive technology. 

Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* grep-find question (Is it a bug of  GunWin32 version of "grep")
  2007-08-10  5:39       ` Tim X
@ 2007-08-10  9:23         ` brianjiang
  2007-08-10 14:07           ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: brianjiang @ 2007-08-10  9:23 UTC (permalink / raw
  To: help-gnu-emacs

I tried to use grep-find today (in windows XP). So I downgraded the
GunWin32 version grep and findutil. But it seems that the "-i"
(--ignore-case") doesn't work correctly.

When I didn't use "-i" option, searching succeeded:
=====================================
D:\WiKi>find . -type f -exec grep -nH RS17 {} ";"
./MyBase.muse:116: - RS17:

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nH RS17
./MyBase.muse:116: - RS17:


Then when I added the "-i" option, all the searching failed:
======================================-===
D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17

D:\WiKi>find . -type f -exec grep -nHi RS17 {} ";"

D:\WiKi>find . -type f -exec grep -nHi rs17 {} ";"


And I try the "mingw" version of these tools, the "-i" version works
well:
===================================================
Brian@BRIANJIANG /d/WiKi
$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:116: - RS17:

Brian@BRIANJIANG /d/WiKi
$ find . -type f -print0 | xargs -0 -e grep -nHi rs17
./MyBase.muse:116: - RS17:


Is it a bug of GnuWin32 version of "grep" program?


Regards,
Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-10  9:23         ` grep-find question (Is it a bug of GunWin32 version of "grep") brianjiang
@ 2007-08-10 14:07           ` Eli Zaretskii
  2007-08-11  5:55             ` brianjiang
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2007-08-10 14:07 UTC (permalink / raw
  To: help-gnu-emacs

> Date: Fri, 10 Aug 2007 17:23:32 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> Then when I added the "-i" option, all the searching failed:
> ======================================-===
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17
> 
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17
> 
> D:\WiKi>find . -type f -exec grep -nHi RS17 {} ";"
> 
> D:\WiKi>find . -type f -exec grep -nHi rs17 {} ";"

I cannot reproduce this with the GnuWin32 port of Grep 2.5.1 (from
grep-2.5.1a-bin.zip on the GnuWin32 site).  What version do you have
on your machine?

If you have the same version as I do, maybe you have some problem with
pcre.dll, the regexp library on which Grep depends (like if some other
package you installed overwrote the version of pcre.dll that came with
Grep 2.5.1)?

> And I try the "mingw" version of these tools, the "-i" version works
> well:

What is the "mingw" version? where did you get the binaries?  Do you
mean the MSYS version, perhaps?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-10  5:27     ` Emacspeak and UTF-8 -- possible? Tim X
@ 2007-08-11  4:02       ` Stefan Monnier
  2007-08-13 21:47         ` Raman
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2007-08-11  4:02 UTC (permalink / raw
  To: help-gnu-emacs

>>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>>> that many speech synthesises, particularly older hardware based ones like
>>> the dectalk, don't understand UTF-8 character sets.  If you send them
>>> a multibyte character, they either lock up, speak garbage or do something
>>> else unexpected.
>> 
>> That's not a good reason to prevent display of any other char.

> Nobody said it was a good reason, its just the reason.  I'd suggest you
> need to have familiarity with how it works and its internals and history
> before you can make a judgement,

> Given your contributions and long involvement with Emacs I'm surprised to
> see you make such an uninformed statement which contributes so little.

I'm sorry if I offended you.  I didn't mean to say that Emacspeak is crap or
stupid, really.  I just felt like it was useful to point out that the
justification that the limitation is due to the speech synthesis process's
own limitations was not quite correct.

The real reason is that Emacspeak has not been adjusted to the way Emacs
handles encoding, so the limitation is a result of historical circumstances
coupled with a lack of manpower/expertise/motivation to rework the code in
order to lift this restriction.  Nothing to be ashamed here.

Again, I'm sorry if I sounded like I was criticizing Emacspeak,


        Stefan


PS: How ironic that Emacspeak provides "the first really functional
    interface on Linux for [blind and] VI users".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-10 14:07           ` Eli Zaretskii
@ 2007-08-11  5:55             ` brianjiang
  2007-08-11  7:06               ` brianjiang
  2007-08-11 10:06               ` Eli Zaretskii
  0 siblings, 2 replies; 19+ messages in thread
From: brianjiang @ 2007-08-11  5:55 UTC (permalink / raw
  To: eliz, help-gnu-emacs

I use the GnuWin32 of Grep 2.5.1 too (install using grep-2.5.1a-2-setup.exe)
------------------------------------------------------------------------------------------------------------------------------------
D:\WiKi>grep -V
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


And I cannot find the pcre.dll in my computer. Instead, I find pcre3.dll (installed by GnuWin32):
---------------------------------------------------------------------------------------------------
D:\WiKi>which pcre3.dll
C:/Program Files/GnuWin32/bin/pcre3.dll

The command path has no problem:
----------------------------------------------------------
D:\WiKi>which grep
C:/Program Files/GnuWin32/bin/grep.EXE

D:\WiKi>which find
C:/Program Files/GnuWin32/bin/find.EXE

D:\WiKi>which xargs
C:/Program Files/GnuWin32/bin/xargs.EXE


And I found if when text in the file is lowcase, e.g., "rs17", then I can use "-i" option to find it successfully. e.g., "-i RS17".
But if the text in the file is uppercase, e.g., "RS17", then neither "-i rs17" nor "-i RS17" can found it :( See below: 
(Have you tried that?)
-------------------------------------------------------------------------------------------------------------------------------
D:\WiKi>
D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nH RS17
./MyBase.muse:116: - RS17:

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nH rs17
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17


Yes, when I say "mingw" version, I means MSYS version. I always mess up these two term :(
The MSYS version of grep is 2.4.2:
------------------------------------------------------------------------
$ grep -V
grep (GNU grep) 2.4.2

Copyright 1988, 1992-1999, 2000 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And it can find the thing correctly (3 matches):
------------------------------------------------
Brian@BRIANJIANG /D/WiKi
$ find . -type f -print0 | xargs -0 -e grep -nHi rs17
./MyBase.muse:116: - RS17:
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

Brian@BRIANJIANG /D/WiKi
$ find . -type f -print0 | xargs -0 -e grep -nHi RS17
./MyBase.muse:116: - RS17:
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17


I currently use the MSYS version for my Emacs and it works well.


Regards,
Brian


-----Original Message-----
From: help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org [mailto:help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org] On Behalf Of Eli Zaretskii
Sent: 2007年8月10日 22:07
To: help-gnu-emacs@gnu.org
Subject: Re: grep-find question (Is it a bug of GunWin32 version of "grep")

> Date: Fri, 10 Aug 2007 17:23:32 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> Then when I added the "-i" option, all the searching failed:
> ======================================-===
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17
> 
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17
> 
> D:\WiKi>find . -type f -exec grep -nHi RS17 {} ";"
> 
> D:\WiKi>find . -type f -exec grep -nHi rs17 {} ";"

I cannot reproduce this with the GnuWin32 port of Grep 2.5.1 (from grep-2.5.1a-bin.zip on the GnuWin32 site).  What version do you have on your machine?

If you have the same version as I do, maybe you have some problem with pcre.dll, the regexp library on which Grep depends (like if some other package you installed overwrote the version of pcre.dll that came with Grep 2.5.1)?

> And I try the "mingw" version of these tools, the "-i" version works
> well:

What is the "mingw" version? where did you get the binaries?  Do you mean the MSYS version, perhaps?



_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-11  5:55             ` brianjiang
@ 2007-08-11  7:06               ` brianjiang
  2007-08-11 10:06               ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: brianjiang @ 2007-08-11  7:06 UTC (permalink / raw
  To: eliz, help-gnu-emacs

 Just reformat my previous mail....

-----Original Message-----
From: help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org [mailto:help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org] On Behalf Of brianjiang@gdnt.com.cn
Sent: 2007年8月11日 13:55
To: eliz@gnu.org; help-gnu-emacs@gnu.org
Subject: RE: grep-find question (Is it a bug of GunWin32 version of "grep")

I use the GnuWin32 of Grep 2.5.1 too (install using grep-2.5.1a-2-setup.exe)
------------------------------------------------------------------------------------------------------------------------------------
D:\WiKi>grep -V
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


And I cannot find the pcre.dll in my computer. Instead, I find pcre3.dll (installed by GnuWin32):
---------------------------------------------------------------------------------------------------
D:\WiKi>which pcre3.dll
C:/Program Files/GnuWin32/bin/pcre3.dll

The command path has no problem:
----------------------------------------------------------
D:\WiKi>which grep
C:/Program Files/GnuWin32/bin/grep.EXE

D:\WiKi>which find
C:/Program Files/GnuWin32/bin/find.EXE

D:\WiKi>which xargs
C:/Program Files/GnuWin32/bin/xargs.EXE


And I found if when text in the file is lowcase, e.g., "rs17", then I can use "-i" option to find it successfully. e.g., "-i RS17".
But if the text in the file is uppercase, e.g., "RS17", then neither "-i rs17" nor "-i RS17" can found it :( See below: 
(Have you tried that?)
-------------------------------------------------------------------------------------------------------------------------------
D:\WiKi>
D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17 
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17 
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nH RS17
./MyBase.muse:116: - RS17:

D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nH rs17 
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17


Yes, when I say "mingw" version, I means MSYS version. I always mess up these two term :( The MSYS version of grep is 2.4.2:
------------------------------------------------------------------------
$ grep -V
grep (GNU grep) 2.4.2

Copyright 1988, 1992-1999, 2000 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And it can find the thing correctly (3 matches):
------------------------------------------------
Brian@BRIANJIANG /D/WiKi
$ find . -type f -print0 | xargs -0 -e grep -nHi rs17
./MyBase.muse:116: - RS17:
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17

Brian@BRIANJIANG /D/WiKi
$ find . -type f -print0 | xargs -0 -e grep -nHi RS17
./MyBase.muse:116: - RS17:
./MyBase.muse:284:$ find . -type f -exec grep -nHi rs17 {} NUL ";"
./MyBase.muse:285:$ find . -type f -print0 | xargs -0 -e grep -nHi rs17


I currently use the MSYS version for my Emacs and it works well.


Regards,
Brian


-----Original Message-----
From: help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org [mailto:help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org] On Behalf Of Eli Zaretskii
Sent: 2007年8月10日 22:07
To: help-gnu-emacs@gnu.org
Subject: Re: grep-find question (Is it a bug of GunWin32 version of "grep")

> Date: Fri, 10 Aug 2007 17:23:32 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> Then when I added the "-i" option, all the searching failed:
> ======================================-===
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi RS17
> 
> D:\WiKi>find . -type f -print0 | xargs -0 -e grep -nHi rs17
> 
> D:\WiKi>find . -type f -exec grep -nHi RS17 {} ";"
> 
> D:\WiKi>find . -type f -exec grep -nHi rs17 {} ";"

I cannot reproduce this with the GnuWin32 port of Grep 2.5.1 (from grep-2.5.1a-bin.zip on the GnuWin32 site).  What version do you have on your machine?

If you have the same version as I do, maybe you have some problem with pcre.dll, the regexp library on which Grep depends (like if some other package you installed overwrote the version of pcre.dll that came with Grep 2.5.1)?

> And I try the "mingw" version of these tools, the "-i" version works
> well:

What is the "mingw" version? where did you get the binaries?  Do you mean the MSYS version, perhaps?



_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs


_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-11  5:55             ` brianjiang
  2007-08-11  7:06               ` brianjiang
@ 2007-08-11 10:06               ` Eli Zaretskii
  2007-08-12  9:58                 ` brianjiang
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2007-08-11 10:06 UTC (permalink / raw
  To: help-gnu-emacs

> Date: Sat, 11 Aug 2007 13:55:17 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> And I cannot find the pcre.dll in my computer. Instead, I find pcre3.dll (installed by GnuWin32):
> ---------------------------------------------------------------------------------------------------
> D:\WiKi>which pcre3.dll
> C:/Program Files/GnuWin32/bin/pcre3.dll

Strange.  I installed Grep from a zip file, not with a self-installing
setup program, and I clearly see only pcre.dll in the
grep-2.5.1a-dep.zip archive I still have on my machine.

What is the time stamp of pcre3.dll on your system?

Also, does it help to reinstall Grep?

> And I found if when text in the file is lowcase, e.g., "rs17", then I can use "-i" option to find it successfully. e.g., "-i RS17".
> But if the text in the file is uppercase, e.g., "RS17", then neither "-i rs17" nor "-i RS17" can found it :( See below: 
> (Have you tried that?)

Yes, I've tried that, and it works as I expect: it finds text
case-insensitively no matter if I specify the search string in
uppercase or lowercase.

> I currently use the MSYS version for my Emacs and it works well.

FWIW, I don't recommend this.  MSYS ports are meant for one purpose
only: to be able to build MinGW ports of other tools.  For that
purpose, they sometimes tweak the command-line arguments in order to
allow running those commands from Unix shell scripts.  In particular,
they convert Windows file names with drive letters into pseudo-Posix
file names that start with a forward slash.  If a command-line
argument looks like a file name, but really isn't, this conversion
could have devastating effect on the Grep command.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-11 10:06               ` Eli Zaretskii
@ 2007-08-12  9:58                 ` brianjiang
  2007-08-12 18:59                   ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: brianjiang @ 2007-08-12  9:58 UTC (permalink / raw
  To: eliz, help-gnu-emacs

 
Hi Eli,

Where did you download the package? I downgraded it from
http://gnuwin32.sourceforge.net/packages/grep.htm.

I have uninstall and re-install it, but the problem still exists. And
I also try the ZIP package, the result is the same. And I cannot find
pcre.dll in these packages. Only pcre3.dll there.

My pcre3.dll is shipped with SETUP package and ZIP package. The
timestamp is as follows:

-------------------------------------------------------
 Directory of C:\Program Files\GnuWin32\bin

2007-08-12  17:41    <DIR>          .
2007-08-12  17:41    <DIR>          ..
2007-04-03  03:40                33 egrep
2007-04-03  03:40                33 fgrep
2005-04-19  02:53           160,256 find.exe
2007-06-23  03:35           122,368 grep.exe
2004-03-17  04:37           898,048 libiconv2.dll
2005-05-07  03:52           103,424 libintl3.dll
2005-04-19  02:53           113,664 locate.exe
2007-03-17  17:56           140,288 pcre3.dll
2006-10-10  13:48                30 rgrep
2005-03-18  06:38             8,228 updatedb
2005-05-01  17:13            61,440 which.exe
2005-04-19  02:53            32,768 xargs.exe
              12 File(s)      1,640,580 bytes
               2 Dir(s)   2,093,121,536 bytes free


> Yes, I've tried that, and it works as I expect: it finds text
> case-insensitively no matter if I specify the search string in
>  uppercase or lowercase.

Actually, I mean it has problem when the text string in the file is
uppercase. In my example, I have a text string "RS17" in file
"MyBase.muse". When I use "-i rs17" or "-i RS17" to search, I cannot
find "RS17" in that file. But I also have another text string "rs17"
in file "MyBase.muse", and I use "-i rs17" or "-i RS17" and I can find
it ("rs17" in the file) successfully. Very strange.


Regards,
Brian



-----Original Message-----
From: help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org [mailto:help-gnu-emacs-bounces+brianjiang=gdnt.com.cn@gnu.org] On Behalf Of Eli Zaretskii
Sent: 2007年8月11日 18:06
To: help-gnu-emacs@gnu.org
Subject: Re: grep-find question (Is it a bug of GunWin32 version of "grep")

> Date: Sat, 11 Aug 2007 13:55:17 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> And I cannot find the pcre.dll in my computer. Instead, I find pcre3.dll (installed by GnuWin32):
> ----------------------------------------------------------------------
> -----------------------------
> D:\WiKi>which pcre3.dll
> C:/Program Files/GnuWin32/bin/pcre3.dll

Strange.  I installed Grep from a zip file, not with a self-installing setup program, and I clearly see only pcre.dll in the grep-2.5.1a-dep.zip archive I still have on my machine.

What is the time stamp of pcre3.dll on your system?

Also, does it help to reinstall Grep?

> And I found if when text in the file is lowcase, e.g., "rs17", then I can use "-i" option to find it successfully. e.g., "-i RS17".
> But if the text in the file is uppercase, e.g., "RS17", then neither "-i rs17" nor "-i RS17" can found it :( See below: 
> (Have you tried that?)

Yes, I've tried that, and it works as I expect: it finds text case-insensitively no matter if I specify the search string in uppercase or lowercase.

> I currently use the MSYS version for my Emacs and it works well.

FWIW, I don't recommend this.  MSYS ports are meant for one purpose
only: to be able to build MinGW ports of other tools.  For that purpose, they sometimes tweak the command-line arguments in order to allow running those commands from Unix shell scripts.  In particular, they convert Windows file names with drive letters into pseudo-Posix file names that start with a forward slash.  If a command-line argument looks like a file name, but really isn't, this conversion could have devastating effect on the Grep command.


_______________________________________________
help-gnu-emacs mailing list
help-gnu-emacs@gnu.org
http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: grep-find question (Is it a bug of GunWin32 version of "grep")
  2007-08-12  9:58                 ` brianjiang
@ 2007-08-12 18:59                   ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2007-08-12 18:59 UTC (permalink / raw
  To: help-gnu-emacs

> Date: Sun, 12 Aug 2007 17:58:48 +0800
> From: <brianjiang@gdnt.com.cn>
> 
> > Yes, I've tried that, and it works as I expect: it finds text
> > case-insensitively no matter if I specify the search string in
> >  uppercase or lowercase.
> 
> Actually, I mean it has problem when the text string in the file is
> uppercase.

For me, it works both with upper- and lower-case text in the file.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-11  4:02       ` Stefan Monnier
@ 2007-08-13 21:47         ` Raman
  2007-08-14 18:28           ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Raman @ 2007-08-13 21:47 UTC (permalink / raw
  To: help-gnu-emacs

>
>
> PS: How ironic that Emacspeak provides "the first really functional
>     interface on Linux for [blind and] VI users".

;-) It actually was a first in  helping  blind users run VI
independently on a
  workstation console --- thanks to the wonders of eterm.el in
  Emacs 19.26 --- which for the first time made curses-based apps
  runnable in an Emacs Term.

As for the character coding issues --- I forcibly set
EMACS_UNIBYTE to T in the Emacspeak setup files  a few years ago
to make it clear that the UTF-8 piece needed work. Doing anything
else would just cause bizarre/rude surprizes half-way while one
is working.

Now, when you have a system like Emacs itself which attracts a
large number of developers, it is possible to enable something as
complex as character encoding and translation and over time debug
all of the issues --- we've seen this happen in the case of
mainline Emacs over the last 8 years. However, Emacspeak does not
have this luxury, and pretending that multibyte support works
when it doesn't would only lead to large amounts of frustration
and support questions of the form "why doesnt xxx ." on the
mailing list.

For the record, it should be possible to now add multibyte
support by carefully binding buffer-encoding in the scratch
buffer used to transform text and by setting process-encoding for
that buffer before streaming out the text to the TTS process. But
doing this will require:

A)    Manpower Cycles(ramanpower cycles are vanishingly small for
this)
B)      Ability to test --- TTS engines, multibyte texts, and
        someone who uses it
C)      Early users who are sufficiently knowledgeable to be able
        to bear the pain, identify problems, and and define
        solutions

--Raman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Emacspeak and UTF-8 -- possible?
  2007-08-13 21:47         ` Raman
@ 2007-08-14 18:28           ` Stefan Monnier
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2007-08-14 18:28 UTC (permalink / raw
  To: help-gnu-emacs

> A)    Manpower Cycles(ramanpower cycles are vanishingly small for this)
> B)      Ability to test --- TTS engines, multibyte texts, and
>         someone who uses it
> C)      Early users who are sufficiently knowledgeable to be able
>         to bear the pain, identify problems, and and define
>         solutions

Maybe a first step will be to make it a customizable option, marked as
"please help test&debug it if you can".

If it ever starts to appear mildly usable, you can then change its default
to "enabled", and document how to turn it off at as many places as you can
think of.

If you ever intend to start this transition, better start early and better
let other people find&fix the bugs for you ;-)

Unibyte sessions in Emacs are becoming more and more problematic, so you'll
probably have to make the transition at some point (there's no indication
we'll ever remove support for unibyte *buffers and strings*, which have some
very important uses, but it's difficult to improve the support for multibyte
sessions without introducing minor problems in unibyte *sessions*).


        Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-08-14 18:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-03 19:24 Emacspeak and UTF-8 -- possible? cmr.Pent
2007-08-07 10:41 ` Tim X
2007-08-08  9:09   ` cmr.Pent
2007-08-08 14:56   ` Stefan Monnier
2007-08-08 23:44     ` Robert D. Crawford
2007-08-09 17:44       ` Stefan Monnier
2007-08-10  7:40         ` Tim X
2007-08-10  5:39       ` Tim X
2007-08-10  9:23         ` grep-find question (Is it a bug of GunWin32 version of "grep") brianjiang
2007-08-10 14:07           ` Eli Zaretskii
2007-08-11  5:55             ` brianjiang
2007-08-11  7:06               ` brianjiang
2007-08-11 10:06               ` Eli Zaretskii
2007-08-12  9:58                 ` brianjiang
2007-08-12 18:59                   ` Eli Zaretskii
2007-08-10  5:27     ` Emacspeak and UTF-8 -- possible? Tim X
2007-08-11  4:02       ` Stefan Monnier
2007-08-13 21:47         ` Raman
2007-08-14 18:28           ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.