Creating a coding system

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Creating a coding system
@ 2014-12-20  9:05 David Kastrup
  2014-12-20 10:20 ` Thien-Thi Nguyen
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20  9:05 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 87 bytes --]


Ok, what am I doing wrong here?  Why does decode-coding-string not do
anything here?


[-- Attachment #2: midi.el --]
[-- Type: application/emacs-lisp, Size: 3835 bytes --]

[-- Attachment #3: Type: text/plain, Size: 19 bytes --]


-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20  9:05 Creating a coding system David Kastrup
@ 2014-12-20 10:20 ` Thien-Thi Nguyen
  2014-12-20 10:42   ` David Kastrup
  2014-12-20 13:51 ` Andreas Schwab
  2014-12-23  8:59 ` K. Handa
  2 siblings, 1 reply; 33+ messages in thread
From: Thien-Thi Nguyen @ 2014-12-20 10:20 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

() David Kastrup <dak@gnu.org>
() Sat, 20 Dec 2014 10:05:39 +0100

     :mnemonic ?M

I'm surprised you didn't opt for ‘?♬’ (or similar).

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 10:20 ` Thien-Thi Nguyen
@ 2014-12-20 10:42   ` David Kastrup
  0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 10:42 UTC (permalink / raw)
  To: emacs-devel

Thien-Thi Nguyen <ttn@gnu.org> writes:

> () David Kastrup <dak@gnu.org>
> () Sat, 20 Dec 2014 10:05:39 +0100
>
>      :mnemonic ?M
>
> I'm surprised you didn't opt for ‘?♬’ (or similar).

No need to worry about your image of me: I did try a few Multi_key #
combinations to find a single notehead here, but not being immediately
successful and this not being ever an encoding that actually would be
ever used as a buffer encoding (rather than just a process encoding),
I decided not to waste more time on it right now.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20  9:05 Creating a coding system David Kastrup
  2014-12-20 10:20 ` Thien-Thi Nguyen
@ 2014-12-20 13:51 ` Andreas Schwab
  2014-12-20 14:19   ` David Kastrup
  2014-12-23  8:59 ` K. Handa
  2 siblings, 1 reply; 33+ messages in thread
From: Andreas Schwab @ 2014-12-20 13:51 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup <dak@gnu.org> writes:

> Ok, what am I doing wrong here?  Why does decode-coding-string not do
> anything here?
>
> (define-translation-table 'midi-decode-table
>   (make-translation-table-from-alist
>    (mapcar
>     (lambda (p)
>       (cons (car p) (string-to-vector (cdr p))))
>     '(([144 0] . "c,,,,")

The translation table operates on the decoded contents.

> (prin1 (decode-coding-string "\220\040" 'midi))

(aref (string-make-multibyte "\220\040") 0) => 4194192

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 13:51 ` Andreas Schwab
@ 2014-12-20 14:19   ` David Kastrup
  2014-12-20 14:50     ` Eli Zaretskii
                       ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 14:19 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Ok, what am I doing wrong here?  Why does decode-coding-string not do
>> anything here?
>>
>> (define-translation-table 'midi-decode-table
>>   (make-translation-table-from-alist
>>    (mapcar
>>     (lambda (p)
>>       (cons (car p) (string-to-vector (cdr p))))
>>     '(([144 0] . "c,,,,")
>
> The translation table operates on the decoded contents.

Why would something be decoded twice in a row?

>> (prin1 (decode-coding-string "\220\040" 'midi))

> (aref (string-make-multibyte "\220\040") 0) => 4194192

(aref "\220\040" 0) => 144

But admittedly,

    (decode-coding-string (string 144 64) 'midi) => "e'"

which does not make a whole lot of sense to me since my use case

(make-serial-process :port "/dev/snd/midiC1D0"
		     :speed nil
		     :coding 'midi
		     :noquery t)

does absolutely nothing, and that should be decoding from _bytes_ to
_characters_ rather than _characters_ (including some raw-character code
page) to _characters_.

I am missing the big picture here in some manner.  Does decoding not
start from a byte stream but rather from an emacs-utf-8 encoded version
of a byte stream?  That does not seem to make sense to me.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 14:19   ` David Kastrup
@ 2014-12-20 14:50     ` Eli Zaretskii
  2014-12-20 15:56     ` Thien-Thi Nguyen
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 14:50 UTC (permalink / raw)
  To: David Kastrup; +Cc: schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 15:19:18 +0100
> Cc: emacs-devel@gnu.org
> 
> Does decoding not start from a byte stream but rather from an
> emacs-utf-8 encoded version of a byte stream?

If you decode a unibyte string or buffer, then the former; otherwise
the latter.  IOW, you can decode either version, and both should work.

(Caveat: I don't really understand what you are trying to do -- my
fault, not yours -- so I hope I understood this question correctly.)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 14:19   ` David Kastrup
  2014-12-20 14:50     ` Eli Zaretskii
@ 2014-12-20 15:56     ` Thien-Thi Nguyen
  2014-12-20 16:11     ` Andreas Schwab
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Thien-Thi Nguyen @ 2014-12-20 15:56 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1.1: Type: text/plain, Size: 434 bytes --]

() David Kastrup <dak@gnu.org>
() Sat, 20 Dec 2014 15:19:18 +0100

   I am missing the big picture here in some manner.  Does
   decoding not start from a byte stream but rather from an
   emacs-utf-8 encoded version of a byte stream?  That does not
   seem to make sense to me.

Have you tried specifying ‘:coding-type raw-text’?  I see in
src/coding.c line 5339 (from a mid-October pre-Git tree) the
nice number 1:

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: nice-number-1.c --]
[-- Type: text/x-csrc, Size: 131 bytes --]

static void
decode_coding_raw_text (struct coding_system *coding)
{
  bool eol_dos = [...]

  coding->chars_at_source = 1;
  ...
}

[-- Attachment #1.3: Type: text/plain, Size: 736 bytes --]

It is the only instance in src/*.c of that literal value being
assigned to that struct member.  A truly singular hint?  :-D

I imagine that aside from ‘raw-text’, decoding is multi-layered
due to the presence of ‘undecided’ (which requires guesswork,
heuristics, and (maybe) backtracking) and that the design tries
to move data from bytes to characters as soon as possible, to
reduce downstream complexity and for cohesion w/ the rest of
Emacs.  But, that's merely ignorant speculation...

-- 
Thien-Thi Nguyen
   GPG key: 4C807502
   (if you're human and you know it)
      read my lisp: (responsep (questions 'technical)
                               (not (via 'mailing-list)))
                     => nil

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 14:19   ` David Kastrup
  2014-12-20 14:50     ` Eli Zaretskii
  2014-12-20 15:56     ` Thien-Thi Nguyen
@ 2014-12-20 16:11     ` Andreas Schwab
  2014-12-20 16:14     ` Andreas Schwab
  2014-12-20 16:21     ` Stephen J. Turnbull
  4 siblings, 0 replies; 33+ messages in thread
From: Andreas Schwab @ 2014-12-20 16:11 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup <dak@gnu.org> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> David Kastrup <dak@gnu.org> writes:
>>
>>> Ok, what am I doing wrong here?  Why does decode-coding-string not do
>>> anything here?
>>>
>>> (define-translation-table 'midi-decode-table
>>>   (make-translation-table-from-alist
>>>    (mapcar
>>>     (lambda (p)
>>>       (cons (car p) (string-to-vector (cdr p))))
>>>     '(([144 0] . "c,,,,")
>>
>> The translation table operates on the decoded contents.
>
> Why would something be decoded twice in a row?

   During decoding, the translation table's translations are applied to
the characters that result from ordinary decoding.  If a coding system

   During encoding, the translation table's translations are applied to
the characters in the buffer, and the result of translation is actually
encoded.  If a coding system has property `:encode-translation-table',

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 14:19   ` David Kastrup
                       ` (2 preceding siblings ...)
  2014-12-20 16:11     ` Andreas Schwab
@ 2014-12-20 16:14     ` Andreas Schwab
  2014-12-20 16:43       ` David Kastrup
  2014-12-20 16:21     ` Stephen J. Turnbull
  4 siblings, 1 reply; 33+ messages in thread
From: Andreas Schwab @ 2014-12-20 16:14 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup <dak@gnu.org> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> David Kastrup <dak@gnu.org> writes:
>>
>>> Ok, what am I doing wrong here?  Why does decode-coding-string not do
>>> anything here?
>>>
>>> (define-translation-table 'midi-decode-table
>>>   (make-translation-table-from-alist
>>>    (mapcar
>>>     (lambda (p)
>>>       (cons (car p) (string-to-vector (cdr p))))
>>>     '(([144 0] . "c,,,,")
>>
>> The translation table operates on the decoded contents.
>
> Why would something be decoded twice in a row?

You don't know the characters before decoding.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 14:19   ` David Kastrup
                       ` (3 preceding siblings ...)
  2014-12-20 16:14     ` Andreas Schwab
@ 2014-12-20 16:21     ` Stephen J. Turnbull
  2014-12-20 16:52       ` David Kastrup
  4 siblings, 1 reply; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-20 16:21 UTC (permalink / raw)
  To: David Kastrup; +Cc: Andreas Schwab, emacs-devel

David Kastrup writes:

 > > The translation table operates on the decoded contents.
 > 
 > Why would something be decoded twice in a row?

Who can give a complete description?  I can't, but sometimes it's
useful, just as sometimes tr(1) is useful.

 > I am missing the big picture here in some manner.  Does decoding not
 > start from a byte stream but rather from an emacs-utf-8 encoded version
 > of a byte stream?

Either (see Eli's reply).  But the translation table is an add-on, not
part of the bytes-to-character coding system itself.  XEmacs's coding
system stuff is quite different (don't start, I'm trying to be helpful
in lieu of a real expert), so I can't be sure, but your definition

(define-coding-system 'midi
  "This converts Midi note-on events to note names"
  :mnemonic ?M
  :coding-type 'charset
  :eol-type 'unix
  :decode-translation-table 'midi-decode-table
  :mime-text-unsuitable t)

doesn't match the idioms used throughout the rest of the :coding-type
'charset coding systems.  Specifically, I suspect the :charset-list
property is required for anything useful to happen.  Also, I suspect
:coding-type 'charset requires a unibyte charset.  I'm not sure it can
be made to work for you, as your "characters" seem to be multibyte.

Try adding :charset-list '(latin-1) (or maybe '(raw-text)) to your
definition.  That should just translate bytes to internal
representation of the characters with codes 0-255, and then your
translation table should DTRT.  If that doesn't do the trick, you'll
have to ask Ken Handa most likely.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 16:14     ` Andreas Schwab
@ 2014-12-20 16:43       ` David Kastrup
  2014-12-20 16:53         ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 16:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel

Andreas Schwab <schwab@linux-m68k.org> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>
>>> David Kastrup <dak@gnu.org> writes:
>>>
>>>> Ok, what am I doing wrong here?  Why does decode-coding-string not do
>>>> anything here?
>>>>
>>>> (define-translation-table 'midi-decode-table
>>>>   (make-translation-table-from-alist
>>>>    (mapcar
>>>>     (lambda (p)
>>>>       (cons (car p) (string-to-vector (cdr p))))
>>>>     '(([144 0] . "c,,,,")
>>>
>>> The translation table operates on the decoded contents.
>>
>> Why would something be decoded twice in a row?
>
> You don't know the characters before decoding.

That's an "Andreas" answer: an insider joke comprehensible only once you
already know the answer.  It's not even a hint.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 16:21     ` Stephen J. Turnbull
@ 2014-12-20 16:52       ` David Kastrup
  2014-12-20 18:34         ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 16:52 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Andreas Schwab, emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> David Kastrup writes:
>
>  > > The translation table operates on the decoded contents.
>  > 
>  > Why would something be decoded twice in a row?
>
> Who can give a complete description?  I can't, but sometimes it's
> useful, just as sometimes tr(1) is useful.
>
>  > I am missing the big picture here in some manner.  Does decoding not
>  > start from a byte stream but rather from an emacs-utf-8 encoded version
>  > of a byte stream?
>
> Either (see Eli's reply).  But the translation table is an add-on, not
> part of the bytes-to-character coding system itself.

Huh.  Then maybe I am not doing this right.  I want to use a "process
coding system" to efficiently sift through Midi data and leaves some
useful form (a more versatile form would likely be the Lisp reader
equivalent of MidiXML).

> XEmacs's coding system stuff is quite different (don't start, I'm
> trying to be helpful in lieu of a real expert), so I can't be sure,
> but your definition
>
> (define-coding-system 'midi
>   "This converts Midi note-on events to note names"
>   :mnemonic ?M
>   :coding-type 'charset
>   :eol-type 'unix
>   :decode-translation-table 'midi-decode-table
>   :mime-text-unsuitable t)
>
> doesn't match the idioms used throughout the rest of the :coding-type
> 'charset coding systems.

I'd have no idea.  The DOC string for define-coding-system lists the
various possibilities, but does not bother mentioning the actual
_meaning_ of the setting.

> Specifically, I suspect the :charset-list property is required for
> anything useful to happen.

Again, the DOC string for define-coding-system does not bother
mentioning what this is supposed to be for.  And the Elisp manual only
bothers with using coding systems rather than defining them.

> Also, I suspect :coding-type 'charset requires a unibyte charset.

I have no idea.

> I'm not sure it can be made to work for you, as your "characters" seem
> to be multibyte.

I have no idea.  The Midi device I open with make-serial-process
obviously delivers bytes.

> Try adding :charset-list '(latin-1) (or maybe '(raw-text)) to your
> definition.  That should just translate bytes to internal
> representation of the characters with codes 0-255, and then your
> translation table should DTRT.  If that doesn't do the trick, you'll
> have to ask Ken Handa most likely.

It would seem that Handa-san is currently the only documentation for the
coding system details Emacs has.  Some more redundancy might be a good
idea.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 16:43       ` David Kastrup
@ 2014-12-20 16:53         ` Eli Zaretskii
  2014-12-20 17:38           ` David Kastrup
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 16:53 UTC (permalink / raw)
  To: David Kastrup; +Cc: schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 17:43:40 +0100
> Cc: emacs-devel@gnu.org
> 
> Andreas Schwab <schwab@linux-m68k.org> writes:
> 
> >>> The translation table operates on the decoded contents.
> >>
> >> Why would something be decoded twice in a row?
> >
> > You don't know the characters before decoding.
> 
> That's an "Andreas" answer: an insider joke comprehensible only once you
> already know the answer.  It's not even a hint.

:-)

Let me try being a bit more helpful.  The translation table is a
char-table that converts one character into another.  Since these
tables convert _characters_, you need to have a character before you
apply the table.  To have a character, you need to decode the byte
stream into characters.

You can find more details on this in the node "Translation of
Characters" in the ELisp manual.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 16:53         ` Eli Zaretskii
@ 2014-12-20 17:38           ` David Kastrup
  2014-12-20 18:31             ` Stephen J. Turnbull
  2014-12-20 18:35             ` Eli Zaretskii
  0 siblings, 2 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 17:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Date: Sat, 20 Dec 2014 17:43:40 +0100
>> Cc: emacs-devel@gnu.org
>> 
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>> 
>> >>> The translation table operates on the decoded contents.
>> >>
>> >> Why would something be decoded twice in a row?
>> >
>> > You don't know the characters before decoding.
>> 
>> That's an "Andreas" answer: an insider joke comprehensible only once you
>> already know the answer.  It's not even a hint.
>
> :-)
>
> Let me try being a bit more helpful.  The translation table is a
> char-table that converts one character into another.  Since these
> tables convert _characters_, you need to have a character before you
> apply the table.

Ok, so the table is not actually a "decoding-table" but rather a table
applied after decoding.  So basically it provides the same sort of
functionality/level as

    `:post-read-conversion'

    VALUE must be a function to call after some text is inserted and
    decoded by the coding system itself and before any functions in
    `after-insert-functions' are called.  This function is passed one
    argument; the number of characters in the text to convert, with
    point at the start of the text.  The function should leave point
    the same, and return the new character count.

does, but employing a different mechanism.

> To have a character, you need to decode the byte stream into
> characters.
>
> You can find more details on this in the node "Translation of
> Characters" in the ELisp manual.

Huh.  I'll look again, but I _did_ look in the manual previously.  Maybe
a coding system is not even the right way to do this, but at least it
seems like a builtin mechanism and consequently fast.  But at least the
decoding table does not appear to be able to _drop_ unwanted bytes: if
my TO vector is empty for any sequence, the table construction fails,
apparently when constructing the (unneeded) reverse table.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 17:38           ` David Kastrup
@ 2014-12-20 18:31             ` Stephen J. Turnbull
  2014-12-20 18:40               ` David Kastrup
  2014-12-20 18:35             ` Eli Zaretskii
  1 sibling, 1 reply; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-20 18:31 UTC (permalink / raw)
  To: David Kastrup; +Cc: Eli Zaretskii, schwab, emacs-devel

David Kastrup writes:

 > Maybe a coding system is not even the right way to do this, but at
 > least it seems like a builtin mechanism and consequently fast.

I think a coding system is the right way to do it, but you probably
need to construct the coding system a different way.  You might need
to write a little CCL (Aidan Kehoe can probably help if Handa-san is
unavailable).  There's some CCL documentation in the XEmacs manuals,
but it's probably not much more than a Texinfoization of the comments
in the files defining CCL.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 16:52       ` David Kastrup
@ 2014-12-20 18:34         ` Eli Zaretskii
  2014-12-20 18:42           ` David Kastrup
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 18:34 UTC (permalink / raw)
  To: David Kastrup; +Cc: stephen, schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 17:52:29 +0100
> Cc: Andreas Schwab <schwab@linux-m68k.org>, emacs-devel@gnu.org
> 
> >  > I am missing the big picture here in some manner.  Does decoding not
> >  > start from a byte stream but rather from an emacs-utf-8 encoded version
> >  > of a byte stream?
> >
> > Either (see Eli's reply).  But the translation table is an add-on, not
> > part of the bytes-to-character coding system itself.
> 
> Huh.  Then maybe I am not doing this right.  I want to use a "process
> coding system" to efficiently sift through Midi data and leaves some
> useful form (a more versatile form would likely be the Lisp reader
> equivalent of MidiXML).

May I suggest to describe your problem in more detail?  I couldn't
glean that from the snippet that started this thread.  (I know nothing
about Midi.)  Maybe if we knew what is it that you are trying to
accomplish with your coding-system, we could be of more assistance.

> It would seem that Handa-san is currently the only documentation for the
> coding system details Emacs has.  Some more redundancy might be a good
> idea.

We already have that:


  32.10 Coding Systems
  ====================

  When Emacs reads or writes a file, and when Emacs sends text to a
  subprocess or receives text from a subprocess, it normally performs
  character code conversion and end-of-line conversion as specified by a
  particular "coding system".

     How to define a coding system is an arcane matter, and is not
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  documented here.
  ^^^^^^^^^^^^^^^^



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 17:38           ` David Kastrup
  2014-12-20 18:31             ` Stephen J. Turnbull
@ 2014-12-20 18:35             ` Eli Zaretskii
  2014-12-20 19:06               ` David Kastrup
  1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 18:35 UTC (permalink / raw)
  To: David Kastrup; +Cc: schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 18:38:23 +0100
> 
> Huh.  I'll look again, but I _did_ look in the manual previously.  Maybe
> a coding system is not even the right way to do this

Well, as I wrote elsewhere, I don't really understand what "this" is,
so please consider elaborating.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 18:31             ` Stephen J. Turnbull
@ 2014-12-20 18:40               ` David Kastrup
  0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 18:40 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: Eli Zaretskii, schwab, emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> David Kastrup writes:
>
>  > Maybe a coding system is not even the right way to do this, but at
>  > least it seems like a builtin mechanism and consequently fast.
>
> I think a coding system is the right way to do it, but you probably
> need to construct the coding system a different way.  You might need
> to write a little CCL (Aidan Kehoe can probably help if Handa-san is
> unavailable).  There's some CCL documentation in the XEmacs manuals,
> but it's probably not much more than a Texinfoization of the comments
> in the files defining CCL.

Oh wow.  I thought CCL was Emacs-only, and it seemed like it was
deprecated anyway.  But CCL seems better suited for converting Midi in a
more complete manner, particularly where numeric parameters are
concerned.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 18:34         ` Eli Zaretskii
@ 2014-12-20 18:42           ` David Kastrup
  0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 18:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: stephen, schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>
>> It would seem that Handa-san is currently the only documentation for
>> the coding system details Emacs has.  Some more redundancy might be a
>> good idea.
>
> We already have that:
>
>
>   32.10 Coding Systems
>   ====================
>
>   When Emacs reads or writes a file, and when Emacs sends text to a
>   subprocess or receives text from a subprocess, it normally performs
>   character code conversion and end-of-line conversion as specified by a
>   particular "coding system".
>
>      How to define a coding system is an arcane matter, and is not
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   documented here.
>   ^^^^^^^^^^^^^^^^

I did not mean that we need more redundancy for the statement "It would
seem that Handa-san is currently the only documentation".

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 18:35             ` Eli Zaretskii
@ 2014-12-20 19:06               ` David Kastrup
  2014-12-20 20:02                 ` Eli Zaretskii
  2014-12-21  5:54                 ` Stephen J. Turnbull
  0 siblings, 2 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 19:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 18:38:23 +0100
>> 
>> Huh.  I'll look again, but I _did_ look in the manual previously.  Maybe
>> a coding system is not even the right way to do this
>
> Well, as I wrote elsewhere, I don't really understand what "this" is,
> so please consider elaborating.

Midi is a real-time byte-oriented protocol conveying musical data and is
primarily used for hooking up Midi-code generating keyboards with
Midi-playing devices (originally, a serial protocol at something like
31250  Baud over galvanically separated current-loop).

<URL:http://www.gweep.net/~prefect/eng/reference/protocol/midispec.html>
contains some useful information.  The raw midi messages, as read from a
midi device, contain information like "key-on on channel #9 with pitch
#83 and velocity #112".  Command bytes are #128 to #255, data bytes are
#0 to #127 and most commands come with a fixed amount of data anyway.
So resynchronization for lost bytes is usually immediate.

There are also Midi _files_: those contain additional data, most
particularly an encoding of the _timing_ of events (because as opposed
to ordinary written text, the timing of events is crucial for Midi).
Those are, more or less, the preferred storage format for "Midi
sequencers" which are able to produce or reproduce a timed stream of raw
Midi events.

There is some age-old application called "rumor" that converts Midi
messages into input for the music typesetter LilyPond, and rumor can be
hooked up with some Emacs mode.  All that is pretty arcane and not all
that well-supported.

Since Emacs can use make-serial-process for opening a raw Midi port, the
idea was to bypass the additional executable and create an application
useful for directly interfacing with Midi keyboards so that one can at
least key in pitches faster than by using a computer keyboard.

Part of the reason this may be faster than a computer keyboard is that
the typical musician looks at a score, employs some cognitive
processing, names the resulting pitch, translates it into LilyPond's
representation of a pitch (see
<URL:http://www.lilypond.org/doc/v2.19/Documentation/notation/writing-pitches>)
and then types it in using multiple keypresses.

With a Midi keyboard, typing the pitch entails looking at the score and
typing a single key (or a whole sequence of keys) using pretrained
visual/motoric connections.

Now the first iteration would be just using the pitches from a Midi
keyboard.  The second iteration would then try working with durations.
Either by putting time stamps on the events as they trundle in, or by
reading from a "cooked" Midi stream already containing encoded time time
stamps.

Conversion of time stamps to musical durations, particularly when the
time stamps have not been generated by a notation program but by a human
player, is non-trivial and would hopefully end up as an incremental
process where a human corrects durations and Emacs improves its overall
guesswork.

But time stamps are for another time.  Right now I'd be glad just to
convert the pitches from they keyboard into LilyPond's pitch notation in
order to make key entry faster.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 19:06               ` David Kastrup
@ 2014-12-20 20:02                 ` Eli Zaretskii
  2014-12-20 20:11                   ` David Kastrup
  2014-12-21  5:54                 ` Stephen J. Turnbull
  1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 20:02 UTC (permalink / raw)
  To: David Kastrup; +Cc: schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 20:06:45 +0100
> 
> Now the first iteration would be just using the pitches from a Midi
> keyboard.  The second iteration would then try working with durations.
> Either by putting time stamps on the events as they trundle in, or by
> reading from a "cooked" Midi stream already containing encoded time time
> stamps.
> 
> Conversion of time stamps to musical durations, particularly when the
> time stamps have not been generated by a notation program but by a human
> player, is non-trivial and would hopefully end up as an incremental
> process where a human corrects durations and Emacs improves its overall
> guesswork.
> 
> But time stamps are for another time.  Right now I'd be glad just to
> convert the pitches from they keyboard into LilyPond's pitch notation in
> order to make key entry faster.

I might be mistaken, but this doesn't look to me like a job for a
coding-system.  You are talking about parsing input into some abstract
notation, then generating a representation of that input in a
different language.  This is sufficiently different from converting
characters from one encoding to another that you should perhaps look
at cedet/semantic/ stuff instead.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 20:02                 ` Eli Zaretskii
@ 2014-12-20 20:11                   ` David Kastrup
  2014-12-20 20:45                     ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 20:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 20:06:45 +0100
>> 
>> Now the first iteration would be just using the pitches from a Midi
>> keyboard.  The second iteration would then try working with durations.
>> Either by putting time stamps on the events as they trundle in, or by
>> reading from a "cooked" Midi stream already containing encoded time time
>> stamps.
>> 
>> Conversion of time stamps to musical durations, particularly when the
>> time stamps have not been generated by a notation program but by a human
>> player, is non-trivial and would hopefully end up as an incremental
>> process where a human corrects durations and Emacs improves its overall
>> guesswork.
>> 
>> But time stamps are for another time.  Right now I'd be glad just to
>> convert the pitches from they keyboard into LilyPond's pitch notation in
>> order to make key entry faster.
>
> I might be mistaken, but this doesn't look to me like a job for a
> coding-system.  You are talking about parsing input into some abstract
> notation,

"parsing input" is sort of bombastic for interpreting a binary
representation consisting of isolated minimal words.

> then generating a representation of that input in a different
> language.  This is sufficiently different from converting characters
> from one encoding to another that you should perhaps look at
> cedet/semantic/ stuff instead.

Uh, there is no grammar involved here, no context, most certainly not a
push-down stack or something.  There is nothing that would warrant using
a parser.  And we are talking about real-time processing needs,
basically an input method using a different device than the keyboard.

cedet/semantic seem like much more of a mismatch, both in functionality
and response time, than using a process encoding here.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 20:11                   ` David Kastrup
@ 2014-12-20 20:45                     ` Eli Zaretskii
  2014-12-20 21:15                       ` David Kastrup
  2014-12-21 21:25                       ` David Kastrup
  0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 20:45 UTC (permalink / raw)
  To: David Kastrup; +Cc: schwab, emacs-devel

> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 21:11:49 +0100
> 
> > I might be mistaken, but this doesn't look to me like a job for a
> > coding-system.  You are talking about parsing input into some abstract
> > notation,
> 
> "parsing input" is sort of bombastic for interpreting a binary
> representation consisting of isolated minimal words.

Yes, but coding-systems machinery is not a general-purpose bytestream
conversion facility.  It was designed and implemented specifically for
converting between known families of encodings.  You might be able to
tweak it enough to do what you want, eventually, but it doesn't look
like a piece of cake to me.  Programming in CCL is like writing
assembly code in a restricted machine language, hardly something well
suited to converting one complex bytestream into another.

> > then generating a representation of that input in a different
> > language.  This is sufficiently different from converting characters
> > from one encoding to another that you should perhaps look at
> > cedet/semantic/ stuff instead.
> 
> Uh, there is no grammar involved here, no context, most certainly not a
> push-down stack or something.

But there's definitely some kind of "lexing", no?  You are talking
about sequences of symbols, not about letters from some alphabet.  If
you try representing each sequence as an encoding of a letter, won't
you get an enormously large alphabet?

Then again, I might be dead wrong.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 20:45                     ` Eli Zaretskii
@ 2014-12-20 21:15                       ` David Kastrup
  2014-12-21 19:46                         ` David Kastrup
  2014-12-21 21:25                       ` David Kastrup
  1 sibling, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 21:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>> 
>> > I might be mistaken, but this doesn't look to me like a job for a
>> > coding-system.  You are talking about parsing input into some abstract
>> > notation,
>> 
>> "parsing input" is sort of bombastic for interpreting a binary
>> representation consisting of isolated minimal words.
>
> Yes, but coding-systems machinery is not a general-purpose bytestream
> conversion facility.  It was designed and implemented specifically for
> converting between known families of encodings.  You might be able to
> tweak it enough to do what you want, eventually, but it doesn't look
> like a piece of cake to me.  Programming in CCL is like writing
> assembly code in a restricted machine language, hardly something well
> suited to converting one complex bytestream into another.

Uh, CCL is _exactly_ suited to converting one complex bytestream into
another.  It's overkill for converting regular character set to other
regular character sets which is probably the reason it is phased out.
But for this task it seems a reasonable match.

>> Uh, there is no grammar involved here, no context, most certainly not
>> a push-down stack or something.
>
> But there's definitely some kind of "lexing", no?

No.

> You are talking about sequences of symbols, not about letters from
> some alphabet.

No, Midi contains nothing like symbols.  Just codes with byte or word
sized parameters.  Converting the codes would be straightforward, but
converting the parameters as well would make the tables too large.  CCL
looks like it can come to the rescue for producing Lisp expressions with
the full parameters for _one_ approach.

> If you try representing each sequence as an encoding of a letter,
> won't you get an enormously large alphabet?

Which is exactly why CCL, which can do calculations like divide by 10
with remainder, will be able to save a lot of space if one wants to
arrive at decimal constants in a human-readable rendering of the
parameters.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 19:06               ` David Kastrup
  2014-12-20 20:02                 ` Eli Zaretskii
@ 2014-12-21  5:54                 ` Stephen J. Turnbull
  1 sibling, 0 replies; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-21  5:54 UTC (permalink / raw)
  To: David Kastrup; +Cc: Eli Zaretskii, schwab, emacs-devel

David Kastrup writes:

 > Since Emacs can use make-serial-process for opening a raw Midi port, the
 > idea was to bypass the additional executable and create an application
 > useful for directly interfacing with Midi keyboards so that one can at
 > least key in pitches faster than by using a computer keyboard.

This is a great application, one that shows the power of Emacs in a
multidimensional way.  Hat's off to you once again, David!

 > The second iteration would then try working with durations.  Either
 > by putting time stamps on the events as they trundle in,

This might need to be done at a lower level than coding systems, in
the event processing code itself.  But I imagine it's quite feasible.




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 21:15                       ` David Kastrup
@ 2014-12-21 19:46                         ` David Kastrup
  0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-21 19:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

David Kastrup <dak@gnu.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: David Kastrup <dak@gnu.org>
>>> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
>>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>>> 
>>> > I might be mistaken, but this doesn't look to me like a job for a
>>> > coding-system.  You are talking about parsing input into some abstract
>>> > notation,
>>> 
>>> "parsing input" is sort of bombastic for interpreting a binary
>>> representation consisting of isolated minimal words.
>>
>> Yes, but coding-systems machinery is not a general-purpose bytestream
>> conversion facility.  It was designed and implemented specifically for
>> converting between known families of encodings.  You might be able to
>> tweak it enough to do what you want, eventually, but it doesn't look
>> like a piece of cake to me.  Programming in CCL is like writing
>> assembly code in a restricted machine language, hardly something well
>> suited to converting one complex bytestream into another.
>
> Uh, CCL is _exactly_ suited to converting one complex bytestream into
> another.  It's overkill for converting regular character set to other
> regular character sets which is probably the reason it is phased out.
> But for this task it seems a reasonable match.
>
>>> Uh, there is no grammar involved here, no context, most certainly not
>>> a push-down stack or something.
>>
>> But there's definitely some kind of "lexing", no?
>
> No.
>
>> You are talking about sequences of symbols, not about letters from
>> some alphabet.
>
> No, Midi contains nothing like symbols.  Just codes with byte or word
> sized parameters.  Converting the codes would be straightforward, but
> converting the parameters as well would make the tables too large.  CCL
> looks like it can come to the rescue for producing Lisp expressions with
> the full parameters for _one_ approach.
>
>> If you try representing each sequence as an encoding of a letter,
>> won't you get an enormously large alphabet?
>
> Which is exactly why CCL, which can do calculations like divide by 10
> with remainder, will be able to save a lot of space if one wants to
> arrive at decimal constants in a human-readable rendering of the
> parameters.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20 20:45                     ` Eli Zaretskii
  2014-12-20 21:15                       ` David Kastrup
@ 2014-12-21 21:25                       ` David Kastrup
  1 sibling, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-21 21:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org,  emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>> 
>> > I might be mistaken, but this doesn't look to me like a job for a
>> > coding-system.  You are talking about parsing input into some abstract
>> > notation,
>> 
>> "parsing input" is sort of bombastic for interpreting a binary
>> representation consisting of isolated minimal words.
>
> Yes, but coding-systems machinery is not a general-purpose bytestream
> conversion facility.  It was designed and implemented specifically for
> converting between known families of encodings.  You might be able to
> tweak it enough to do what you want, eventually, but it doesn't look
> like a piece of cake to me.  Programming in CCL is like writing
> assembly code in a restricted machine language, hardly something well
> suited to converting one complex bytestream into another.
>
>> > then generating a representation of that input in a different
>> > language.  This is sufficiently different from converting characters
>> > from one encoding to another that you should perhaps look at
>> > cedet/semantic/ stuff instead.
>> 
>> Uh, there is no grammar involved here, no context, most certainly not a
>> push-down stack or something.
>
> But there's definitely some kind of "lexing", no?  You are talking
> about sequences of symbols, not about letters from some alphabet.  If
> you try representing each sequence as an encoding of a letter, won't
> you get an enormously large alphabet?
>
> Then again, I might be dead wrong.

(define-ccl-program midi-notenames
  '(9
    ((read r0)
     (if ((r0 & 240) == 144)
	 ((read r0)
	  (r0 //= 12)
	  (branch r7 "c" "des" "d" "es" "e" "f" "fis" "g" "as" "a" "bes" "b")
	  (branch r0 ",,,," ",,," ",," "," " " "'" "''" "'''" "''''" "'''''" "''''''"))))))

(define-ccl-program no-conversion
  '(1 ((read r0)(write r0))))

(define-coding-system 'midi-notenames
  "This converts Midi note-on events to note names"
  :mnemonic ?M
  :coding-type 'ccl
  :charset-list '(iso-8859-1)
  :eol-type 'unix
  :ccl-decoder 'midi-notenames
  :ccl-encoder 'no-conversion)

This actually seems to do the trick for the original stuff I'd been
wrestling with.  I still have to figure out how to parameterize stuff
without redefining CCL programs all the time.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-20  9:05 Creating a coding system David Kastrup
  2014-12-20 10:20 ` Thien-Thi Nguyen
  2014-12-20 13:51 ` Andreas Schwab
@ 2014-12-23  8:59 ` K. Handa
  2014-12-23  9:25   ` David Kastrup
  2 siblings, 1 reply; 33+ messages in thread
From: K. Handa @ 2014-12-23  8:59 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

Hi, sorry for the late response.

In article <87ppbeitcs.fsf@fencepost.gnu.org>, David Kastrup <dak@gnu.org> writes:
> Ok, what am I doing wrong here?  Why does decode-coding-string not do
> anything here?

> (define-translation-table 'midi-decode-table
>   (make-translation-table-from-alist
>    (mapcar
>     (lambda (p)
>       (cons (car p) (string-to-vector (cdr p))))
>     '(([144 0] . "c,,,,")
[...]
> (define-coding-system 'midi
>   "This converts Midi note-on events to note names"
>   :mnemonic ?M
>   :coding-type 'charset
>   :eol-type 'unix
>   :decode-translation-table 'midi-decode-table
>   :mime-text-unsuitable t)

Please add
  :charset-list '(iso-8859-1)
to the arguments of define-ccoding-system.

The translation table of coding system works AFTER byte
sequences are decoded into char sequences by the basic
decoding routine which is specified by :coding-type (and the
other additional attributes).  As it seems that you are
expecting that the basic decoding routine decodes the byte
144 to the character 144, using the following set is good:
  :coding-type 'charset
  :charset-list '(iso-8859-1)

The other method is to use CCL (i.e :coding-type 'ccl), but,
if the combination of the charset decoding and translation
table works, it's faster than running CCL code.  If you need
arithmetic or conditional operation, you have to use CCL, or
:post-read-conversion.

---
K. Handa
handa@gnu.org

PS.  Should I read the other mails of this thread?  I'm very
sorry for this lazy attitude, but I don't have a time to
read all emacs-devel mails.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-23  8:59 ` K. Handa
@ 2014-12-23  9:25   ` David Kastrup
  2014-12-24 15:06     ` K. Handa
  0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-23  9:25 UTC (permalink / raw)
  To: K. Handa; +Cc: emacs-devel

handa@gnu.org (K. Handa) writes:

> Hi, sorry for the late response.
>
> In article <87ppbeitcs.fsf@fencepost.gnu.org>, David Kastrup <dak@gnu.org> writes:
>> Ok, what am I doing wrong here?  Why does decode-coding-string not do
>> anything here?
>
>> (define-translation-table 'midi-decode-table
>>   (make-translation-table-from-alist
>>    (mapcar
>>     (lambda (p)
>>       (cons (car p) (string-to-vector (cdr p))))
>>     '(([144 0] . "c,,,,")
> [...]
>> (define-coding-system 'midi
>>   "This converts Midi note-on events to note names"
>>   :mnemonic ?M
>>   :coding-type 'charset
>>   :eol-type 'unix
>>   :decode-translation-table 'midi-decode-table
>>   :mime-text-unsuitable t)
>
> Please add
>   :charset-list '(iso-8859-1)
> to the arguments of define-ccoding-system.
>
> The translation table of coding system works AFTER byte
> sequences are decoded into char sequences by the basic
> decoding routine which is specified by :coding-type (and the
> other additional attributes).  As it seems that you are
> expecting that the basic decoding routine decodes the byte
> 144 to the character 144, using the following set is good:
>   :coding-type 'charset
>   :charset-list '(iso-8859-1)

It's one of the things I got to work.

> The other method is to use CCL (i.e :coding-type 'ccl), but,
> if the combination of the charset decoding and translation
> table works, it's faster than running CCL code.

The translation table is not happy about translating things to nothing.
Apparently that makes the calculation of the reverse translation go
wrong.

> If you need arithmetic or conditional operation, you have to use CCL,
> or :post-read-conversion.

> PS.  Should I read the other mails of this thread?  I'm very
> sorry for this lazy attitude, but I don't have a time to
> read all emacs-devel mails.

At the current point it would appear that I manage working with CCL.  It
was a big puzzler that data gets lost unless the CCL program is written
as a _loop_, and it's annoying that the documentation just mentions
(loop statement ...)
as a construct without bothering to point out that the loop will not
actually loop.  "loop" apparently just places a label you can jump to
using (repeat) or its ilk.

So my first approaches lost data when it was arriving fast, and it took
about a day to figure out why that was.

If you have time to spare on that topic, I'd rather you spend it on
putting some more info in the Elisp manual or at least the
define-ccl-program and define-coding-system doc strings.  At the current
point of time, it is quite opaque what the :coding-type specification in
define-coding-system does, it is not clear how CCL code is being run
under which conditions, the attribute :valids (apparently part of ccl
charset-type) is not documented at all.

And the following afterthought in define-ccl-program is quite opaque as
well:

    TRANSLATE :=
            (translate-character REG(table) REG(charset) REG(codepoint))
            | (translate-character SYMBOL REG(charset) REG(codepoint))
            ;; SYMBOL must refer to a table defined by `define-translation-table'.
    LOOKUP :=
            (lookup-character SYMBOL REG(charset) REG(codepoint))
            | (lookup-integer SYMBOL REG(integer))
            ;; SYMBOL refers to a table defined by `define-translation-hash-table'.
    MAP :=
         (iterate-multiple-map REG REG MAP-IDs)
         | (map-multiple REG REG (MAP-SET))
         | (map-single REG REG MAP-ID)
    MAP-IDs := MAP-ID ...
    MAP-SET := MAP-IDs | (MAP-IDs) MAP-SET
    MAP-ID := integer

It's not clear what is input and output for TRANSLATE and LOOKUP, the
operation itself can only be guessed by looking at the _data_ structures
given to `define-translation-hash-table' and `define-translation-table',
and there is absolutely no guessing what the MAP operations are.  And
looking at the code in src/ccl.c is creating rather more than less
confusion as the mapping stuff is really complex and only the mechanisms
(if at all) are documented rather than the purpose.

There is also nothing in the DOC string of `define-coding-system' or the
Elisp manual that would help in guessing what kind of options to choose
for what kind of task.

The purpose/definition of coding-type emacs-mule (particularly post
Emacs-23) is not given, or what the various options with coding-type
iso-2022 are.  It is not clear when one would use coding-type raw-text
and when utf-8 (and how does utf-8 relate to emacs-mule?).

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-23  9:25   ` David Kastrup
@ 2014-12-24 15:06     ` K. Handa
  2014-12-25  6:39       ` David Kastrup
  0 siblings, 1 reply; 33+ messages in thread
From: K. Handa @ 2014-12-24 15:06 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

In article <87h9wmd8ey.fsf@fencepost.gnu.org>, David Kastrup <dak@gnu.org> writes:
> > The other method is to use CCL (i.e :coding-type 'ccl), but,
> > if the combination of the charset decoding and translation
> > table works, it's faster than running CCL code.

> The translation table is not happy about translating things to nothing.
> Apparently that makes the calculation of the reverse translation go
> wrong.

Ah, perhaps make-translation-table-from-alist should have
optional argument to suppress building of reverse map, say
no-reverse-mmap.  Just for decoding, we don't need a reverse
map.

> If you have time to spare on that topic, I'd rather you spend it on
> putting some more info in the Elisp manual or at least the
> define-ccl-program and define-coding-system doc strings.

Ok, I'll work on it.


> And the following afterthought in define-ccl-program is quite opaque as
> well:

>     TRANSLATE :=
[...]
>     LOOKUP :=
[...]
>     MAP :=

They were introduced to support Unicode handling in Emacs 22
and prior (i.e. pre-Unicode emacsen).  I think we can get
rid of them (or at least make them more simple and usable).

By the way, long ago, I had a plan of converting restricted
Elisp code to CCL, and hide those criptic CCL programs from
users.  Unfortunately, I got busy for the other work, and
couldn't realize that plan.  :-(

> There is also nothing in the DOC string of `define-coding-system' or the
> Elisp manual that would help in guessing what kind of options to choose
> for what kind of task.

> The purpose/definition of coding-type emacs-mule (particularly post
> Emacs-23) is not given, or what the various options with coding-type
> iso-2022 are.  It is not clear when one would use coding-type raw-text
> and when utf-8 (and how does utf-8 relate to emacs-mule?).

I'll try to document them.

---
K.Handa
handa@gnu.org



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-24 15:06     ` K. Handa
@ 2014-12-25  6:39       ` David Kastrup
  2014-12-29 14:11         ` K. Handa
  0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-25  6:39 UTC (permalink / raw)
  To: K. Handa; +Cc: emacs-devel

handa@gnu.org (K. Handa) writes:

> In article <87h9wmd8ey.fsf@fencepost.gnu.org>, David Kastrup <dak@gnu.org> writes:
>> > The other method is to use CCL (i.e :coding-type 'ccl), but,
>> > if the combination of the charset decoding and translation
>> > table works, it's faster than running CCL code.
>
>> The translation table is not happy about translating things to nothing.
>> Apparently that makes the calculation of the reverse translation go
>> wrong.
>
> Ah, perhaps make-translation-table-from-alist should have
> optional argument to suppress building of reverse map, say
> no-reverse-mmap.  Just for decoding, we don't need a reverse
> map.

Whether we need it or not, empty productions should just be allowed and
ignored for reverse translation.  Of course, they are not compatible
with lossless conversions, but so are many other rules.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-25  6:39       ` David Kastrup
@ 2014-12-29 14:11         ` K. Handa
  2014-12-29 14:25           ` David Kastrup
  0 siblings, 1 reply; 33+ messages in thread
From: K. Handa @ 2014-12-29 14:11 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

In article <877fxgb5ct.fsf@fencepost.gnu.org>, David Kastrup <dak@gnu.org> writes:

> Whether we need it or not, empty productions should just be allowed and
> ignored for reverse translation.  Of course, they are not compatible
> with lossless conversions, but so are many other rules.

I agree with that change and just installed a patch.  I also
gave more information in the docstring of
define-coding-system.

---
K. Handa
handa@gnu.org



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Creating a coding system
  2014-12-29 14:11         ` K. Handa
@ 2014-12-29 14:25           ` David Kastrup
  0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-29 14:25 UTC (permalink / raw)
  To: K. Handa; +Cc: emacs-devel

handa@gnu.org (K. Handa) writes:

> In article <877fxgb5ct.fsf@fencepost.gnu.org>, David Kastrup
> <dak@gnu.org> writes:
>
>> Whether we need it or not, empty productions should just be allowed and
>> ignored for reverse translation.  Of course, they are not compatible
>> with lossless conversions, but so are many other rules.
>
> I agree with that change and just installed a patch.  I also
> gave more information in the docstring of
> define-coding-system.

Thanks.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-12-29 14:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-20  9:05 Creating a coding system David Kastrup
2014-12-20 10:20 ` Thien-Thi Nguyen
2014-12-20 10:42   ` David Kastrup
2014-12-20 13:51 ` Andreas Schwab
2014-12-20 14:19   ` David Kastrup
2014-12-20 14:50     ` Eli Zaretskii
2014-12-20 15:56     ` Thien-Thi Nguyen
2014-12-20 16:11     ` Andreas Schwab
2014-12-20 16:14     ` Andreas Schwab
2014-12-20 16:43       ` David Kastrup
2014-12-20 16:53         ` Eli Zaretskii
2014-12-20 17:38           ` David Kastrup
2014-12-20 18:31             ` Stephen J. Turnbull
2014-12-20 18:40               ` David Kastrup
2014-12-20 18:35             ` Eli Zaretskii
2014-12-20 19:06               ` David Kastrup
2014-12-20 20:02                 ` Eli Zaretskii
2014-12-20 20:11                   ` David Kastrup
2014-12-20 20:45                     ` Eli Zaretskii
2014-12-20 21:15                       ` David Kastrup
2014-12-21 19:46                         ` David Kastrup
2014-12-21 21:25                       ` David Kastrup
2014-12-21  5:54                 ` Stephen J. Turnbull
2014-12-20 16:21     ` Stephen J. Turnbull
2014-12-20 16:52       ` David Kastrup
2014-12-20 18:34         ` Eli Zaretskii
2014-12-20 18:42           ` David Kastrup
2014-12-23  8:59 ` K. Handa
2014-12-23  9:25   ` David Kastrup
2014-12-24 15:06     ` K. Handa
2014-12-25  6:39       ` David Kastrup
2014-12-29 14:11         ` K. Handa
2014-12-29 14:25           ` David Kastrup

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.