From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Tim X Newsgroups: gmane.emacs.help Subject: Re: Emacspeak and UTF-8 -- possible? Date: Fri, 10 Aug 2007 17:40:10 +1000 Organization: Posted via Supernews, http://www.supernews.com Message-ID: <87ir7n7s85.fsf@lion.rapttech.com.au> References: <1186169058.124273.9230@w3g2000hsg.googlegroups.com> <87sl6viq3s.fsf@lion.rapttech.com.au> <87tzr9vbf7.fsf@comcast.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1186739731 14085 80.91.229.12 (10 Aug 2007 09:55:31 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 10 Aug 2007 09:55:31 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Aug 10 11:55:28 2007 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1IJRDL-0000wp-9q for geh-help-gnu-emacs@m.gmane.org; Fri, 10 Aug 2007 11:55:27 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IJRDK-0001PZ-Rv for geh-help-gnu-emacs@m.gmane.org; Fri, 10 Aug 2007 05:55:26 -0400 Original-Path: shelby.stanford.edu!headwall.stanford.edu!newshub.sdsu.edu!tethys.csu.net!nntp.csufresno.edu!sn-xt-sjc-03!sn-xt-sjc-09!sn-post-sjc-01!supernews.com!corp.supernews.com!not-for-mail Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1.50 (gnu/linux) Cancel-Lock: sha1:gg+lZpL1iHsASZQDIK1HnE/2Zuw= Original-X-Complaints-To: abuse@supernews.com Original-Lines: 111 Original-Xref: shelby.stanford.edu gnu.emacs.help:150885 X-Mailman-Approved-At: Fri, 10 Aug 2007 05:55:08 -0400 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:46462 Archived-At: Stefan Monnier writes: >>>> Emacspeak AFAIK doesn't support multi-byte characters. The problem is >>>> that many speech synthesises, particularly older hardware based ones like >>>> the dectalk, don't understand UTF-8 character sets. If you send them >>>> a multibyte character, they either lock up, speak garbage or do something >>>> else unexpected. >>> >>> That's not a good reason to prevent display of any other char. > >> Does any other char mean UTF-8? If this is the case, wouldn't you agree >> that it is better to not have UTF-8 support than to not be able to use >> the computer because your speech synth locks up unexpectedly and often? > > No, I'm saying that the place where they placed the check to filter out > unwanted chars is wrong. They should have Emacs accept any random encoding > as always, and then encode/filter the text they send to the > underlying process. > Yes, the way emacspeak handles it is wrong given emacs' current internals and how it deals with the issue. But you have totally overlooked the fact that emacspeak was designed and developed before emacs had this capability. If you were implementing emacspeak today, then this is likely how you would do it. > Emacs constantly encodes decodes text between different encodings. E.g. If > you visit a latin-1 file, it gets decoded into Emacs's internal > representation, and when you save it, it gets re-encoded into latin-1 > (unless you've decided to change the file's encoding in which case it may > be reencoded in any other coding-system). Yes, and how many years has it taken to get this working well and reliably? This is not a criticism, just pointing out that this wasn't a trivial change. Likewise, it is not a trivial change with emacspeak, which is the largest and possibly most complex of all the add-on emacs packages I've seen. > > So if the speech process only understands latin-1, they should simply set > the coding-system used for that process accordingly and everything should > just work. They may encounter difficulties finding the proper coding-system > that handles unencodable chars (e.g. cyrillic chars with > a latin-1 coding-system) in the way they want (e.g. drop the char > altogether or replace it with a "?" or some other special char), but people > on emacs-devel@gnu.org will be happy to help resolve those. Again, I generally agree and this is along the lines of previous discussions on the topic amongst emacspeak users. However, your description makes it sound like all that needs to be done is a couple of minor changes. This is not the case. The design of emacspeak was done back when essentially all you had to worry about was ascii characters and at the time, you essentially had one decent quality hardware synthesiser which could only handle the basic ascii character set. I don't think it even handled extended ascii. There were decisions made, which in hindsight were probably incorrect. For example, emacspeak does a fair amount of processing of characters prior to sending them to the speech device - in fact, it sends them to an intermediate layer written in TCL which does further processing. Originally, a lot of the internal processing within the elisp part of emacspeak was not modular or done in a single location that would make it easy to change. There are also a number of other issues about how to process these characters, what to translate and what to translate to, determining when to translate and when not to and how to control all of this to get the best results while keeping the whole system as responsive as possible. However, the main issue I have with your analysis is that you obviously don't understand what emacspeak does and how it works. It is not simply a screen reader that just sends the text as it appears on the screen to a TTS engine. Emacspeak adds a lot of contextual information, which is one of its strengths and what makes it so much more than a 'dumb' screen reader. Other systems, like speechd handles character encodings better in this respect, but it is simpler in design and has the advantage of being designed after emacs had itself incorporated support for various encodings. It also has the advantage of using a speech interface that has also been designed with multi character encoding support. As emacs' own handling of character encodings has matured, work has been going on to refactor emacspeak code to make the necessary changes to support various encodings easier. Over the last couple of years, TTS synthesisers have improved and a growing number now support UTF-8 and other encodings. The TCL interface layer has now got support for handling different character encodings etc. So, in many ways, some of the required basics are in place to make the necessary changes, but it is still a major undertaking. So major in fact that everyone who has previously started looking at this has generally decided it was too much work for just one person. You also need to realise that for the majority of emacspeak users, what is displayed on the screen is irrelevant. In fact, I know of a number of emacspeak users who don't even use a screen at all. Remember that emacspeak is a specialised add-on with a targetted audience, not a general purpose emacs package. The fact it limits the range of characters that can be displayed (and even entered) to what can be turned into speech by the TTS engine was not an issue for most users and has only relatively recently become an issue because of the evolution of both emacs and available TTS engines. Until demand is sufficient that enough people are prepared to actually do the work needed to change how emacspeak works, nothing will change making sweeping generalised statements about how it is wrong achieves nothing and contributes even less and totally fails to recognise the hard and very innovative work of the author in not only providing the first really functional interface on Linux for blind and VI users, but also in demonstrating radically different approaches to computer interfaces for those requiring assistive technology. Tim