From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Tim X <tcross@dev.null>
Newsgroups: gmane.emacs.help
Subject: Re: Emacspeak and UTF-8 -- possible?
Date: Fri, 10 Aug 2007 17:40:10 +1000
Organization: Posted via Supernews, http://www.supernews.com
Message-ID: <87ir7n7s85.fsf@lion.rapttech.com.au>
References: <1186169058.124273.9230@w3g2000hsg.googlegroups.com>
	<87sl6viq3s.fsf@lion.rapttech.com.au>
	<jwvodhigjoa.fsf-monnier+gnu.emacs.help@gnu.org>
	<87tzr9vbf7.fsf@comcast.net>
	<jwvir7od31b.fsf-monnier+gnu.emacs.help@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1186739731 14085 80.91.229.12 (10 Aug 2007 09:55:31 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 10 Aug 2007 09:55:31 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Aug 10 11:55:28 2007
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1IJRDL-0000wp-9q
	for geh-help-gnu-emacs@m.gmane.org; Fri, 10 Aug 2007 11:55:27 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1IJRDK-0001PZ-Rv
	for geh-help-gnu-emacs@m.gmane.org; Fri, 10 Aug 2007 05:55:26 -0400
Original-Path: shelby.stanford.edu!headwall.stanford.edu!newshub.sdsu.edu!tethys.csu.net!nntp.csufresno.edu!sn-xt-sjc-03!sn-xt-sjc-09!sn-post-sjc-01!supernews.com!corp.supernews.com!not-for-mail
Original-Newsgroups: gnu.emacs.help
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1.50 (gnu/linux)
Cancel-Lock: sha1:gg+lZpL1iHsASZQDIK1HnE/2Zuw=
Original-X-Complaints-To: abuse@supernews.com
Original-Lines: 111
Original-Xref: shelby.stanford.edu gnu.emacs.help:150885
X-Mailman-Approved-At: Fri, 10 Aug 2007 05:55:08 -0400
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:46462
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/46462>

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>>> Emacspeak AFAIK doesn't support multi-byte characters.  The problem is
>>>> that many speech synthesises, particularly older hardware based ones like
>>>> the dectalk, don't understand UTF-8 character sets.  If you send them
>>>> a multibyte character, they either lock up, speak garbage or do something
>>>> else unexpected.
>>> 
>>> That's not a good reason to prevent display of any other char.
>
>> Does any other char mean UTF-8?  If this is the case, wouldn't you agree
>> that it is better to not have UTF-8 support than to not be able to use
>> the computer because your speech synth locks up unexpectedly and often?
>
> No, I'm saying that the place where they placed the check to filter out
> unwanted chars is wrong.  They should have Emacs accept any random encoding
> as always, and then encode/filter the text they send to the
> underlying process.
>

Yes, the way emacspeak handles it is wrong given emacs' current
internals and how it deals with the issue. But you have totally
overlooked the fact that emacspeak was designed and developed 
before emacs had this capability. If you were implementing 
emacspeak today, then this is likely how you would do it. 

> Emacs constantly encodes decodes text between different encodings.  E.g. If
> you visit a latin-1 file, it gets decoded into Emacs's internal
> representation, and when you save it, it gets re-encoded into latin-1
> (unless you've decided to change the file's encoding in which case it may
> be reencoded in any other coding-system).

Yes, and how many years has it taken to get this working well 
and reliably? This is not a criticism, just pointing out that 
this wasn't a trivial change. Likewise, it is not a trivial 
change with emacspeak, which is the largest and possibly most 
complex of all the add-on emacs packages I've seen.
>
> So if the speech process only understands latin-1, they should simply set
> the coding-system used for that process accordingly and everything should
> just work.  They may encounter difficulties finding the proper coding-system
> that handles unencodable chars (e.g. cyrillic chars with
> a latin-1 coding-system) in the way they want (e.g. drop the char
> altogether or replace it with a "?" or some other special char), but people
> on emacs-devel@gnu.org will be happy to help resolve those.

Again, I generally agree and this is along the lines of previous
discussions on the topic amongst emacspeak users. However, your 
description makes it sound like all that needs to be done is a couple
of minor changes. This is not the case. The design of emacspeak 
was done back when essentially all you had to worry about was ascii 
characters and at the time, you essentially had one decent quality 
hardware synthesiser which could only handle the basic ascii character set. I
don't think it even handled extended ascii. 

There were decisions made, which in hindsight were probably 
incorrect. For example, emacspeak does a fair amount of processing
of characters prior to sending them to the speech device - in fact,
it sends them to an intermediate layer written in TCL which does 
further processing. Originally, a lot of the internal processing within 
the elisp part of emacspeak was not modular or done in a single 
location that would make it easy to change. There are also a number
of other issues about how to process these characters, what to translate
and what to translate to, determining when to translate and when not
to and how to control all of this to get the best results while keeping
the whole system as responsive as possible. 

However, the main issue I have with your analysis is that you 
obviously don't understand what emacspeak does and how it 
works. It is not simply a screen reader that just sends 
the text as it appears on the screen to a TTS engine.
Emacspeak adds a lot of contextual information, which is one of its 
strengths and what makes it so much more than a 'dumb' screen reader.
Other systems, like speechd handles character encodings better in this
respect, but it is simpler in design and has the advantage of being designed 
after emacs had itself incorporated support for various encodings. 
It also has the advantage of using a speech interface that has also been 
designed with multi character encoding support. 

As emacs' own handling of character encodings has matured, work
has been going on to refactor emacspeak code to make the necessary
changes to support various encodings easier. Over the last couple 
of years, TTS synthesisers have improved and a growing number 
now  support UTF-8 and other encodings. The TCL interface layer has 
now got support for handling different character encodings etc. So, in
many ways, some of the required basics are in place to make the 
necessary changes, but it is still a major undertaking. So major 
in fact that everyone who has previously started looking at this has
generally decided it was too much work for just one person.

You also need to realise that for the majority of emacspeak users,
what is displayed on the screen is irrelevant. In fact, I know of
a number of emacspeak users who don't even use a screen at all. 
Remember that emacspeak is a specialised add-on with a targetted 
audience, not a general purpose emacs package. The fact it limits
the range of characters that can be displayed (and even entered)  to
what can be turned into speech by the TTS engine was not an issue 
for most users and has only relatively recently become an issue 
because of the evolution of both emacs and available TTS engines.

Until demand is sufficient that enough people are prepared to actually
do the work needed to change how emacspeak works, nothing will change
making sweeping generalised statements about how it is wrong achieves 
nothing and contributes even less and totally fails to recognise
the hard and very innovative work of the author in not only providing
the first really functional interface on Linux for blind and VI users, 
but also in demonstrating radically different approaches to computer
interfaces for those requiring assistive technology. 

Tim