* Re: Creating a coding system
2014-12-20 14:19 ` David Kastrup
@ 2014-12-20 14:50 ` Eli Zaretskii
2014-12-20 15:56 ` Thien-Thi Nguyen
` (3 subsequent siblings)
4 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 14:50 UTC (permalink / raw)
To: David Kastrup; +Cc: schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 15:19:18 +0100
> Cc: emacs-devel@gnu.org
>
> Does decoding not start from a byte stream but rather from an
> emacs-utf-8 encoded version of a byte stream?
If you decode a unibyte string or buffer, then the former; otherwise
the latter. IOW, you can decode either version, and both should work.
(Caveat: I don't really understand what you are trying to do -- my
fault, not yours -- so I hope I understood this question correctly.)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 14:19 ` David Kastrup
2014-12-20 14:50 ` Eli Zaretskii
@ 2014-12-20 15:56 ` Thien-Thi Nguyen
2014-12-20 16:11 ` Andreas Schwab
` (2 subsequent siblings)
4 siblings, 0 replies; 33+ messages in thread
From: Thien-Thi Nguyen @ 2014-12-20 15:56 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 434 bytes --]
() David Kastrup <dak@gnu.org>
() Sat, 20 Dec 2014 15:19:18 +0100
I am missing the big picture here in some manner. Does
decoding not start from a byte stream but rather from an
emacs-utf-8 encoded version of a byte stream? That does not
seem to make sense to me.
Have you tried specifying ‘:coding-type raw-text’? I see in
src/coding.c line 5339 (from a mid-October pre-Git tree) the
nice number 1:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: nice-number-1.c --]
[-- Type: text/x-csrc, Size: 131 bytes --]
static void
decode_coding_raw_text (struct coding_system *coding)
{
bool eol_dos = [...]
coding->chars_at_source = 1;
...
}
[-- Attachment #1.3: Type: text/plain, Size: 736 bytes --]
It is the only instance in src/*.c of that literal value being
assigned to that struct member. A truly singular hint? :-D
I imagine that aside from ‘raw-text’, decoding is multi-layered
due to the presence of ‘undecided’ (which requires guesswork,
heuristics, and (maybe) backtracking) and that the design tries
to move data from bytes to characters as soon as possible, to
reduce downstream complexity and for cohesion w/ the rest of
Emacs. But, that's merely ignorant speculation...
--
Thien-Thi Nguyen
GPG key: 4C807502
(if you're human and you know it)
read my lisp: (responsep (questions 'technical)
(not (via 'mailing-list)))
=> nil
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 14:19 ` David Kastrup
2014-12-20 14:50 ` Eli Zaretskii
2014-12-20 15:56 ` Thien-Thi Nguyen
@ 2014-12-20 16:11 ` Andreas Schwab
2014-12-20 16:14 ` Andreas Schwab
2014-12-20 16:21 ` Stephen J. Turnbull
4 siblings, 0 replies; 33+ messages in thread
From: Andreas Schwab @ 2014-12-20 16:11 UTC (permalink / raw)
To: David Kastrup; +Cc: emacs-devel
David Kastrup <dak@gnu.org> writes:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> David Kastrup <dak@gnu.org> writes:
>>
>>> Ok, what am I doing wrong here? Why does decode-coding-string not do
>>> anything here?
>>>
>>> (define-translation-table 'midi-decode-table
>>> (make-translation-table-from-alist
>>> (mapcar
>>> (lambda (p)
>>> (cons (car p) (string-to-vector (cdr p))))
>>> '(([144 0] . "c,,,,")
>>
>> The translation table operates on the decoded contents.
>
> Why would something be decoded twice in a row?
During decoding, the translation table's translations are applied to
the characters that result from ordinary decoding. If a coding system
During encoding, the translation table's translations are applied to
the characters in the buffer, and the result of translation is actually
encoded. If a coding system has property `:encode-translation-table',
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 14:19 ` David Kastrup
` (2 preceding siblings ...)
2014-12-20 16:11 ` Andreas Schwab
@ 2014-12-20 16:14 ` Andreas Schwab
2014-12-20 16:43 ` David Kastrup
2014-12-20 16:21 ` Stephen J. Turnbull
4 siblings, 1 reply; 33+ messages in thread
From: Andreas Schwab @ 2014-12-20 16:14 UTC (permalink / raw)
To: David Kastrup; +Cc: emacs-devel
David Kastrup <dak@gnu.org> writes:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> David Kastrup <dak@gnu.org> writes:
>>
>>> Ok, what am I doing wrong here? Why does decode-coding-string not do
>>> anything here?
>>>
>>> (define-translation-table 'midi-decode-table
>>> (make-translation-table-from-alist
>>> (mapcar
>>> (lambda (p)
>>> (cons (car p) (string-to-vector (cdr p))))
>>> '(([144 0] . "c,,,,")
>>
>> The translation table operates on the decoded contents.
>
> Why would something be decoded twice in a row?
You don't know the characters before decoding.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 16:14 ` Andreas Schwab
@ 2014-12-20 16:43 ` David Kastrup
2014-12-20 16:53 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 16:43 UTC (permalink / raw)
To: Andreas Schwab; +Cc: emacs-devel
Andreas Schwab <schwab@linux-m68k.org> writes:
> David Kastrup <dak@gnu.org> writes:
>
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>
>>> David Kastrup <dak@gnu.org> writes:
>>>
>>>> Ok, what am I doing wrong here? Why does decode-coding-string not do
>>>> anything here?
>>>>
>>>> (define-translation-table 'midi-decode-table
>>>> (make-translation-table-from-alist
>>>> (mapcar
>>>> (lambda (p)
>>>> (cons (car p) (string-to-vector (cdr p))))
>>>> '(([144 0] . "c,,,,")
>>>
>>> The translation table operates on the decoded contents.
>>
>> Why would something be decoded twice in a row?
>
> You don't know the characters before decoding.
That's an "Andreas" answer: an insider joke comprehensible only once you
already know the answer. It's not even a hint.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 16:43 ` David Kastrup
@ 2014-12-20 16:53 ` Eli Zaretskii
2014-12-20 17:38 ` David Kastrup
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 16:53 UTC (permalink / raw)
To: David Kastrup; +Cc: schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 17:43:40 +0100
> Cc: emacs-devel@gnu.org
>
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
> >>> The translation table operates on the decoded contents.
> >>
> >> Why would something be decoded twice in a row?
> >
> > You don't know the characters before decoding.
>
> That's an "Andreas" answer: an insider joke comprehensible only once you
> already know the answer. It's not even a hint.
:-)
Let me try being a bit more helpful. The translation table is a
char-table that converts one character into another. Since these
tables convert _characters_, you need to have a character before you
apply the table. To have a character, you need to decode the byte
stream into characters.
You can find more details on this in the node "Translation of
Characters" in the ELisp manual.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 16:53 ` Eli Zaretskii
@ 2014-12-20 17:38 ` David Kastrup
2014-12-20 18:31 ` Stephen J. Turnbull
2014-12-20 18:35 ` Eli Zaretskii
0 siblings, 2 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 17:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>> Date: Sat, 20 Dec 2014 17:43:40 +0100
>> Cc: emacs-devel@gnu.org
>>
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>
>> >>> The translation table operates on the decoded contents.
>> >>
>> >> Why would something be decoded twice in a row?
>> >
>> > You don't know the characters before decoding.
>>
>> That's an "Andreas" answer: an insider joke comprehensible only once you
>> already know the answer. It's not even a hint.
>
> :-)
>
> Let me try being a bit more helpful. The translation table is a
> char-table that converts one character into another. Since these
> tables convert _characters_, you need to have a character before you
> apply the table.
Ok, so the table is not actually a "decoding-table" but rather a table
applied after decoding. So basically it provides the same sort of
functionality/level as
`:post-read-conversion'
VALUE must be a function to call after some text is inserted and
decoded by the coding system itself and before any functions in
`after-insert-functions' are called. This function is passed one
argument; the number of characters in the text to convert, with
point at the start of the text. The function should leave point
the same, and return the new character count.
does, but employing a different mechanism.
> To have a character, you need to decode the byte stream into
> characters.
>
> You can find more details on this in the node "Translation of
> Characters" in the ELisp manual.
Huh. I'll look again, but I _did_ look in the manual previously. Maybe
a coding system is not even the right way to do this, but at least it
seems like a builtin mechanism and consequently fast. But at least the
decoding table does not appear to be able to _drop_ unwanted bytes: if
my TO vector is empty for any sequence, the table construction fails,
apparently when constructing the (unneeded) reverse table.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 17:38 ` David Kastrup
@ 2014-12-20 18:31 ` Stephen J. Turnbull
2014-12-20 18:40 ` David Kastrup
2014-12-20 18:35 ` Eli Zaretskii
1 sibling, 1 reply; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-20 18:31 UTC (permalink / raw)
To: David Kastrup; +Cc: Eli Zaretskii, schwab, emacs-devel
David Kastrup writes:
> Maybe a coding system is not even the right way to do this, but at
> least it seems like a builtin mechanism and consequently fast.
I think a coding system is the right way to do it, but you probably
need to construct the coding system a different way. You might need
to write a little CCL (Aidan Kehoe can probably help if Handa-san is
unavailable). There's some CCL documentation in the XEmacs manuals,
but it's probably not much more than a Texinfoization of the comments
in the files defining CCL.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 18:31 ` Stephen J. Turnbull
@ 2014-12-20 18:40 ` David Kastrup
0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 18:40 UTC (permalink / raw)
To: Stephen J. Turnbull; +Cc: Eli Zaretskii, schwab, emacs-devel
"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> David Kastrup writes:
>
> > Maybe a coding system is not even the right way to do this, but at
> > least it seems like a builtin mechanism and consequently fast.
>
> I think a coding system is the right way to do it, but you probably
> need to construct the coding system a different way. You might need
> to write a little CCL (Aidan Kehoe can probably help if Handa-san is
> unavailable). There's some CCL documentation in the XEmacs manuals,
> but it's probably not much more than a Texinfoization of the comments
> in the files defining CCL.
Oh wow. I thought CCL was Emacs-only, and it seemed like it was
deprecated anyway. But CCL seems better suited for converting Midi in a
more complete manner, particularly where numeric parameters are
concerned.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 17:38 ` David Kastrup
2014-12-20 18:31 ` Stephen J. Turnbull
@ 2014-12-20 18:35 ` Eli Zaretskii
2014-12-20 19:06 ` David Kastrup
1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 18:35 UTC (permalink / raw)
To: David Kastrup; +Cc: schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 18:38:23 +0100
>
> Huh. I'll look again, but I _did_ look in the manual previously. Maybe
> a coding system is not even the right way to do this
Well, as I wrote elsewhere, I don't really understand what "this" is,
so please consider elaborating.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 18:35 ` Eli Zaretskii
@ 2014-12-20 19:06 ` David Kastrup
2014-12-20 20:02 ` Eli Zaretskii
2014-12-21 5:54 ` Stephen J. Turnbull
0 siblings, 2 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 19:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 18:38:23 +0100
>>
>> Huh. I'll look again, but I _did_ look in the manual previously. Maybe
>> a coding system is not even the right way to do this
>
> Well, as I wrote elsewhere, I don't really understand what "this" is,
> so please consider elaborating.
Midi is a real-time byte-oriented protocol conveying musical data and is
primarily used for hooking up Midi-code generating keyboards with
Midi-playing devices (originally, a serial protocol at something like
31250 Baud over galvanically separated current-loop).
<URL:http://www.gweep.net/~prefect/eng/reference/protocol/midispec.html>
contains some useful information. The raw midi messages, as read from a
midi device, contain information like "key-on on channel #9 with pitch
#83 and velocity #112". Command bytes are #128 to #255, data bytes are
#0 to #127 and most commands come with a fixed amount of data anyway.
So resynchronization for lost bytes is usually immediate.
There are also Midi _files_: those contain additional data, most
particularly an encoding of the _timing_ of events (because as opposed
to ordinary written text, the timing of events is crucial for Midi).
Those are, more or less, the preferred storage format for "Midi
sequencers" which are able to produce or reproduce a timed stream of raw
Midi events.
There is some age-old application called "rumor" that converts Midi
messages into input for the music typesetter LilyPond, and rumor can be
hooked up with some Emacs mode. All that is pretty arcane and not all
that well-supported.
Since Emacs can use make-serial-process for opening a raw Midi port, the
idea was to bypass the additional executable and create an application
useful for directly interfacing with Midi keyboards so that one can at
least key in pitches faster than by using a computer keyboard.
Part of the reason this may be faster than a computer keyboard is that
the typical musician looks at a score, employs some cognitive
processing, names the resulting pitch, translates it into LilyPond's
representation of a pitch (see
<URL:http://www.lilypond.org/doc/v2.19/Documentation/notation/writing-pitches>)
and then types it in using multiple keypresses.
With a Midi keyboard, typing the pitch entails looking at the score and
typing a single key (or a whole sequence of keys) using pretrained
visual/motoric connections.
Now the first iteration would be just using the pitches from a Midi
keyboard. The second iteration would then try working with durations.
Either by putting time stamps on the events as they trundle in, or by
reading from a "cooked" Midi stream already containing encoded time time
stamps.
Conversion of time stamps to musical durations, particularly when the
time stamps have not been generated by a notation program but by a human
player, is non-trivial and would hopefully end up as an incremental
process where a human corrects durations and Emacs improves its overall
guesswork.
But time stamps are for another time. Right now I'd be glad just to
convert the pitches from they keyboard into LilyPond's pitch notation in
order to make key entry faster.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 19:06 ` David Kastrup
@ 2014-12-20 20:02 ` Eli Zaretskii
2014-12-20 20:11 ` David Kastrup
2014-12-21 5:54 ` Stephen J. Turnbull
1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 20:02 UTC (permalink / raw)
To: David Kastrup; +Cc: schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 20:06:45 +0100
>
> Now the first iteration would be just using the pitches from a Midi
> keyboard. The second iteration would then try working with durations.
> Either by putting time stamps on the events as they trundle in, or by
> reading from a "cooked" Midi stream already containing encoded time time
> stamps.
>
> Conversion of time stamps to musical durations, particularly when the
> time stamps have not been generated by a notation program but by a human
> player, is non-trivial and would hopefully end up as an incremental
> process where a human corrects durations and Emacs improves its overall
> guesswork.
>
> But time stamps are for another time. Right now I'd be glad just to
> convert the pitches from they keyboard into LilyPond's pitch notation in
> order to make key entry faster.
I might be mistaken, but this doesn't look to me like a job for a
coding-system. You are talking about parsing input into some abstract
notation, then generating a representation of that input in a
different language. This is sufficiently different from converting
characters from one encoding to another that you should perhaps look
at cedet/semantic/ stuff instead.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 20:02 ` Eli Zaretskii
@ 2014-12-20 20:11 ` David Kastrup
2014-12-20 20:45 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 20:11 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 20:06:45 +0100
>>
>> Now the first iteration would be just using the pitches from a Midi
>> keyboard. The second iteration would then try working with durations.
>> Either by putting time stamps on the events as they trundle in, or by
>> reading from a "cooked" Midi stream already containing encoded time time
>> stamps.
>>
>> Conversion of time stamps to musical durations, particularly when the
>> time stamps have not been generated by a notation program but by a human
>> player, is non-trivial and would hopefully end up as an incremental
>> process where a human corrects durations and Emacs improves its overall
>> guesswork.
>>
>> But time stamps are for another time. Right now I'd be glad just to
>> convert the pitches from they keyboard into LilyPond's pitch notation in
>> order to make key entry faster.
>
> I might be mistaken, but this doesn't look to me like a job for a
> coding-system. You are talking about parsing input into some abstract
> notation,
"parsing input" is sort of bombastic for interpreting a binary
representation consisting of isolated minimal words.
> then generating a representation of that input in a different
> language. This is sufficiently different from converting characters
> from one encoding to another that you should perhaps look at
> cedet/semantic/ stuff instead.
Uh, there is no grammar involved here, no context, most certainly not a
push-down stack or something. There is nothing that would warrant using
a parser. And we are talking about real-time processing needs,
basically an input method using a different device than the keyboard.
cedet/semantic seem like much more of a mismatch, both in functionality
and response time, than using a process encoding here.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 20:11 ` David Kastrup
@ 2014-12-20 20:45 ` Eli Zaretskii
2014-12-20 21:15 ` David Kastrup
2014-12-21 21:25 ` David Kastrup
0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 20:45 UTC (permalink / raw)
To: David Kastrup; +Cc: schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
> Date: Sat, 20 Dec 2014 21:11:49 +0100
>
> > I might be mistaken, but this doesn't look to me like a job for a
> > coding-system. You are talking about parsing input into some abstract
> > notation,
>
> "parsing input" is sort of bombastic for interpreting a binary
> representation consisting of isolated minimal words.
Yes, but coding-systems machinery is not a general-purpose bytestream
conversion facility. It was designed and implemented specifically for
converting between known families of encodings. You might be able to
tweak it enough to do what you want, eventually, but it doesn't look
like a piece of cake to me. Programming in CCL is like writing
assembly code in a restricted machine language, hardly something well
suited to converting one complex bytestream into another.
> > then generating a representation of that input in a different
> > language. This is sufficiently different from converting characters
> > from one encoding to another that you should perhaps look at
> > cedet/semantic/ stuff instead.
>
> Uh, there is no grammar involved here, no context, most certainly not a
> push-down stack or something.
But there's definitely some kind of "lexing", no? You are talking
about sequences of symbols, not about letters from some alphabet. If
you try representing each sequence as an encoding of a letter, won't
you get an enormously large alphabet?
Then again, I might be dead wrong.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 20:45 ` Eli Zaretskii
@ 2014-12-20 21:15 ` David Kastrup
2014-12-21 19:46 ` David Kastrup
2014-12-21 21:25 ` David Kastrup
1 sibling, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 21:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>>
>> > I might be mistaken, but this doesn't look to me like a job for a
>> > coding-system. You are talking about parsing input into some abstract
>> > notation,
>>
>> "parsing input" is sort of bombastic for interpreting a binary
>> representation consisting of isolated minimal words.
>
> Yes, but coding-systems machinery is not a general-purpose bytestream
> conversion facility. It was designed and implemented specifically for
> converting between known families of encodings. You might be able to
> tweak it enough to do what you want, eventually, but it doesn't look
> like a piece of cake to me. Programming in CCL is like writing
> assembly code in a restricted machine language, hardly something well
> suited to converting one complex bytestream into another.
Uh, CCL is _exactly_ suited to converting one complex bytestream into
another. It's overkill for converting regular character set to other
regular character sets which is probably the reason it is phased out.
But for this task it seems a reasonable match.
>> Uh, there is no grammar involved here, no context, most certainly not
>> a push-down stack or something.
>
> But there's definitely some kind of "lexing", no?
No.
> You are talking about sequences of symbols, not about letters from
> some alphabet.
No, Midi contains nothing like symbols. Just codes with byte or word
sized parameters. Converting the codes would be straightforward, but
converting the parameters as well would make the tables too large. CCL
looks like it can come to the rescue for producing Lisp expressions with
the full parameters for _one_ approach.
> If you try representing each sequence as an encoding of a letter,
> won't you get an enormously large alphabet?
Which is exactly why CCL, which can do calculations like divide by 10
with remainder, will be able to save a lot of space if one wants to
arrive at decimal constants in a human-readable rendering of the
parameters.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 21:15 ` David Kastrup
@ 2014-12-21 19:46 ` David Kastrup
0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-21 19:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
David Kastrup <dak@gnu.org> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: David Kastrup <dak@gnu.org>
>>> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
>>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>>>
>>> > I might be mistaken, but this doesn't look to me like a job for a
>>> > coding-system. You are talking about parsing input into some abstract
>>> > notation,
>>>
>>> "parsing input" is sort of bombastic for interpreting a binary
>>> representation consisting of isolated minimal words.
>>
>> Yes, but coding-systems machinery is not a general-purpose bytestream
>> conversion facility. It was designed and implemented specifically for
>> converting between known families of encodings. You might be able to
>> tweak it enough to do what you want, eventually, but it doesn't look
>> like a piece of cake to me. Programming in CCL is like writing
>> assembly code in a restricted machine language, hardly something well
>> suited to converting one complex bytestream into another.
>
> Uh, CCL is _exactly_ suited to converting one complex bytestream into
> another. It's overkill for converting regular character set to other
> regular character sets which is probably the reason it is phased out.
> But for this task it seems a reasonable match.
>
>>> Uh, there is no grammar involved here, no context, most certainly not
>>> a push-down stack or something.
>>
>> But there's definitely some kind of "lexing", no?
>
> No.
>
>> You are talking about sequences of symbols, not about letters from
>> some alphabet.
>
> No, Midi contains nothing like symbols. Just codes with byte or word
> sized parameters. Converting the codes would be straightforward, but
> converting the parameters as well would make the tables too large. CCL
> looks like it can come to the rescue for producing Lisp expressions with
> the full parameters for _one_ approach.
>
>> If you try representing each sequence as an encoding of a letter,
>> won't you get an enormously large alphabet?
>
> Which is exactly why CCL, which can do calculations like divide by 10
> with remainder, will be able to save a lot of space if one wants to
> arrive at decimal constants in a human-readable rendering of the
> parameters.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 20:45 ` Eli Zaretskii
2014-12-20 21:15 ` David Kastrup
@ 2014-12-21 21:25 ` David Kastrup
1 sibling, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-21 21:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>> Cc: schwab@linux-m68k.org, emacs-devel@gnu.org
>> Date: Sat, 20 Dec 2014 21:11:49 +0100
>>
>> > I might be mistaken, but this doesn't look to me like a job for a
>> > coding-system. You are talking about parsing input into some abstract
>> > notation,
>>
>> "parsing input" is sort of bombastic for interpreting a binary
>> representation consisting of isolated minimal words.
>
> Yes, but coding-systems machinery is not a general-purpose bytestream
> conversion facility. It was designed and implemented specifically for
> converting between known families of encodings. You might be able to
> tweak it enough to do what you want, eventually, but it doesn't look
> like a piece of cake to me. Programming in CCL is like writing
> assembly code in a restricted machine language, hardly something well
> suited to converting one complex bytestream into another.
>
>> > then generating a representation of that input in a different
>> > language. This is sufficiently different from converting characters
>> > from one encoding to another that you should perhaps look at
>> > cedet/semantic/ stuff instead.
>>
>> Uh, there is no grammar involved here, no context, most certainly not a
>> push-down stack or something.
>
> But there's definitely some kind of "lexing", no? You are talking
> about sequences of symbols, not about letters from some alphabet. If
> you try representing each sequence as an encoding of a letter, won't
> you get an enormously large alphabet?
>
> Then again, I might be dead wrong.
(define-ccl-program midi-notenames
'(9
((read r0)
(if ((r0 & 240) == 144)
((read r0)
(r0 //= 12)
(branch r7 "c" "des" "d" "es" "e" "f" "fis" "g" "as" "a" "bes" "b")
(branch r0 ",,,," ",,," ",," "," " " "'" "''" "'''" "''''" "'''''" "''''''"))))))
(define-ccl-program no-conversion
'(1 ((read r0)(write r0))))
(define-coding-system 'midi-notenames
"This converts Midi note-on events to note names"
:mnemonic ?M
:coding-type 'ccl
:charset-list '(iso-8859-1)
:eol-type 'unix
:ccl-decoder 'midi-notenames
:ccl-encoder 'no-conversion)
This actually seems to do the trick for the original stuff I'd been
wrestling with. I still have to figure out how to parameterize stuff
without redefining CCL programs all the time.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 19:06 ` David Kastrup
2014-12-20 20:02 ` Eli Zaretskii
@ 2014-12-21 5:54 ` Stephen J. Turnbull
1 sibling, 0 replies; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-21 5:54 UTC (permalink / raw)
To: David Kastrup; +Cc: Eli Zaretskii, schwab, emacs-devel
David Kastrup writes:
> Since Emacs can use make-serial-process for opening a raw Midi port, the
> idea was to bypass the additional executable and create an application
> useful for directly interfacing with Midi keyboards so that one can at
> least key in pitches faster than by using a computer keyboard.
This is a great application, one that shows the power of Emacs in a
multidimensional way. Hat's off to you once again, David!
> The second iteration would then try working with durations. Either
> by putting time stamps on the events as they trundle in,
This might need to be done at a lower level than coding systems, in
the event processing code itself. But I imagine it's quite feasible.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 14:19 ` David Kastrup
` (3 preceding siblings ...)
2014-12-20 16:14 ` Andreas Schwab
@ 2014-12-20 16:21 ` Stephen J. Turnbull
2014-12-20 16:52 ` David Kastrup
4 siblings, 1 reply; 33+ messages in thread
From: Stephen J. Turnbull @ 2014-12-20 16:21 UTC (permalink / raw)
To: David Kastrup; +Cc: Andreas Schwab, emacs-devel
David Kastrup writes:
> > The translation table operates on the decoded contents.
>
> Why would something be decoded twice in a row?
Who can give a complete description? I can't, but sometimes it's
useful, just as sometimes tr(1) is useful.
> I am missing the big picture here in some manner. Does decoding not
> start from a byte stream but rather from an emacs-utf-8 encoded version
> of a byte stream?
Either (see Eli's reply). But the translation table is an add-on, not
part of the bytes-to-character coding system itself. XEmacs's coding
system stuff is quite different (don't start, I'm trying to be helpful
in lieu of a real expert), so I can't be sure, but your definition
(define-coding-system 'midi
"This converts Midi note-on events to note names"
:mnemonic ?M
:coding-type 'charset
:eol-type 'unix
:decode-translation-table 'midi-decode-table
:mime-text-unsuitable t)
doesn't match the idioms used throughout the rest of the :coding-type
'charset coding systems. Specifically, I suspect the :charset-list
property is required for anything useful to happen. Also, I suspect
:coding-type 'charset requires a unibyte charset. I'm not sure it can
be made to work for you, as your "characters" seem to be multibyte.
Try adding :charset-list '(latin-1) (or maybe '(raw-text)) to your
definition. That should just translate bytes to internal
representation of the characters with codes 0-255, and then your
translation table should DTRT. If that doesn't do the trick, you'll
have to ask Ken Handa most likely.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 16:21 ` Stephen J. Turnbull
@ 2014-12-20 16:52 ` David Kastrup
2014-12-20 18:34 ` Eli Zaretskii
0 siblings, 1 reply; 33+ messages in thread
From: David Kastrup @ 2014-12-20 16:52 UTC (permalink / raw)
To: Stephen J. Turnbull; +Cc: Andreas Schwab, emacs-devel
"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> David Kastrup writes:
>
> > > The translation table operates on the decoded contents.
> >
> > Why would something be decoded twice in a row?
>
> Who can give a complete description? I can't, but sometimes it's
> useful, just as sometimes tr(1) is useful.
>
> > I am missing the big picture here in some manner. Does decoding not
> > start from a byte stream but rather from an emacs-utf-8 encoded version
> > of a byte stream?
>
> Either (see Eli's reply). But the translation table is an add-on, not
> part of the bytes-to-character coding system itself.
Huh. Then maybe I am not doing this right. I want to use a "process
coding system" to efficiently sift through Midi data and leaves some
useful form (a more versatile form would likely be the Lisp reader
equivalent of MidiXML).
> XEmacs's coding system stuff is quite different (don't start, I'm
> trying to be helpful in lieu of a real expert), so I can't be sure,
> but your definition
>
> (define-coding-system 'midi
> "This converts Midi note-on events to note names"
> :mnemonic ?M
> :coding-type 'charset
> :eol-type 'unix
> :decode-translation-table 'midi-decode-table
> :mime-text-unsuitable t)
>
> doesn't match the idioms used throughout the rest of the :coding-type
> 'charset coding systems.
I'd have no idea. The DOC string for define-coding-system lists the
various possibilities, but does not bother mentioning the actual
_meaning_ of the setting.
> Specifically, I suspect the :charset-list property is required for
> anything useful to happen.
Again, the DOC string for define-coding-system does not bother
mentioning what this is supposed to be for. And the Elisp manual only
bothers with using coding systems rather than defining them.
> Also, I suspect :coding-type 'charset requires a unibyte charset.
I have no idea.
> I'm not sure it can be made to work for you, as your "characters" seem
> to be multibyte.
I have no idea. The Midi device I open with make-serial-process
obviously delivers bytes.
> Try adding :charset-list '(latin-1) (or maybe '(raw-text)) to your
> definition. That should just translate bytes to internal
> representation of the characters with codes 0-255, and then your
> translation table should DTRT. If that doesn't do the trick, you'll
> have to ask Ken Handa most likely.
It would seem that Handa-san is currently the only documentation for the
coding system details Emacs has. Some more redundancy might be a good
idea.
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 16:52 ` David Kastrup
@ 2014-12-20 18:34 ` Eli Zaretskii
2014-12-20 18:42 ` David Kastrup
0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-12-20 18:34 UTC (permalink / raw)
To: David Kastrup; +Cc: stephen, schwab, emacs-devel
> From: David Kastrup <dak@gnu.org>
> Date: Sat, 20 Dec 2014 17:52:29 +0100
> Cc: Andreas Schwab <schwab@linux-m68k.org>, emacs-devel@gnu.org
>
> > > I am missing the big picture here in some manner. Does decoding not
> > > start from a byte stream but rather from an emacs-utf-8 encoded version
> > > of a byte stream?
> >
> > Either (see Eli's reply). But the translation table is an add-on, not
> > part of the bytes-to-character coding system itself.
>
> Huh. Then maybe I am not doing this right. I want to use a "process
> coding system" to efficiently sift through Midi data and leaves some
> useful form (a more versatile form would likely be the Lisp reader
> equivalent of MidiXML).
May I suggest to describe your problem in more detail? I couldn't
glean that from the snippet that started this thread. (I know nothing
about Midi.) Maybe if we knew what is it that you are trying to
accomplish with your coding-system, we could be of more assistance.
> It would seem that Handa-san is currently the only documentation for the
> coding system details Emacs has. Some more redundancy might be a good
> idea.
We already have that:
32.10 Coding Systems
====================
When Emacs reads or writes a file, and when Emacs sends text to a
subprocess or receives text from a subprocess, it normally performs
character code conversion and end-of-line conversion as specified by a
particular "coding system".
How to define a coding system is an arcane matter, and is not
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
documented here.
^^^^^^^^^^^^^^^^
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Creating a coding system
2014-12-20 18:34 ` Eli Zaretskii
@ 2014-12-20 18:42 ` David Kastrup
0 siblings, 0 replies; 33+ messages in thread
From: David Kastrup @ 2014-12-20 18:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: stephen, schwab, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: David Kastrup <dak@gnu.org>
>
>> It would seem that Handa-san is currently the only documentation for
>> the coding system details Emacs has. Some more redundancy might be a
>> good idea.
>
> We already have that:
>
>
> 32.10 Coding Systems
> ====================
>
> When Emacs reads or writes a file, and when Emacs sends text to a
> subprocess or receives text from a subprocess, it normally performs
> character code conversion and end-of-line conversion as specified by a
> particular "coding system".
>
> How to define a coding system is an arcane matter, and is not
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> documented here.
> ^^^^^^^^^^^^^^^^
I did not mean that we need more redundancy for the statement "It would
seem that Handa-san is currently the only documentation".
--
David Kastrup
^ permalink raw reply [flat|nested] 33+ messages in thread