unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* compute ISBN-10, char-to-int?
@ 2019-09-05  1:32 Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-05  6:44 ` tomas
  0 siblings, 1 reply; 17+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2019-09-05  1:32 UTC (permalink / raw)
  To: help-gnu-emacs; +Cc: emacs-devel

Another quality release from
uXu and THE SECRET EMPIRE, this time there sure
was a long distance before the earlier, if
anyone remembers that one... hm... anyway
perhaps _the next one_ will be ISBN-13?
Or wasn't that NASA:s bad luck number?
Typing this in 2017, when the US astronauts,
since abandoning their space shuttle project,
have been reduced to mere *passengers* onboard
Russian Soyuz crafts? You know what I'm saying?

Anyway, questions:

that `char-to-int' looks a bit strange...

?

;;; -*- lexical-binding: t -*-

;; This file: http://user.it.uu.se/~embe8573/emacs-init/isbn-new.el
;;            https://dataswamp.org/~incal/emacs-init/isbn-new.el

;; Old ISBN stuff, partially still in use: (?)
;;   https://dataswamp.org/~incal/emacs-init/isbn.el

;; NOTE: This isn't a replacement of the old
;;       stuff URLd above, this is an all new
;;       little project to compute ISBN
;;       checksums with Elisp!

;; here is how the ISBN-10 stuff works:
;;   https://dataswamp.org/~incal/books/isbn.txt

(require 'cl-lib)

(defun char-to-int (c)
  (string-to-number (char-to-string c) ))
;; test:
;; (char-to-int ?0)

(defun checksum-isbn-10 (isbn)
  (let*((isbn-list      (string-to-list isbn))
        (isbn-numbers   (remove ?- isbn-list))
        (isbn-numbers-9 (cl-subseq isbn-numbers 0 9))
        (isbn-ints      (cl-map 'list
                                (lambda (e) (char-to-int e))
                                isbn-numbers-9) )
        (sum          0)
        )
    (cl-loop for e in isbn-ints
             for i downfrom 10
             do (cl-incf sum (* e i)) )
    (let ((checksum (mod (- 11 (mod sum 11)) 11)))
      (if (= 10 checksum) "X" checksum) )))


;; 9 test from [1]:
;;
;; (checksum-isbn-10 "91-7054-940-0")  ; 0 (#o0, #x0, ?\C-@)
;; (checksum-isbn-10 "0-201-53992-6")  ; 6 (#o6, #x6, ?\C-f)
;; (checksum-isbn-10 "91-85668-01-X")  ; "X"
;; (checksum-isbn-10 "91-7089-710-7")  ; 7 (#o7, #x7, ?\C-g)
;; (checksum-isbn-10 "9177988515")     ; 5 (#o5, #x5, ?\C-e)
;; (checksum-isbn-10 "0312168144")     ; 4 (#o4, #x4, ?\C-d)
;; (checksum-isbn-10 "1-4012-0622-0")  ; 0 (#o0, #x0, ?\C-@)
;; (checksum-isbn-10 "91-510-6483-9")  ; 9 (#o11, #x9, ?\C-i)
;; (checksum-isbn-10 "91-88930-23-8")  ; 8 (#o10, #x8, ?\C-h)
;;
;;
;; [1] https://dataswamp.org/~incal/books/books.bib

-- 
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05  1:32 compute ISBN-10, char-to-int? Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-05  6:44 ` tomas
  2019-09-05 17:28   ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-05 18:08   ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: tomas @ 2019-09-05  6:44 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]

On Thu, Sep 05, 2019 at 03:32:55AM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:
> Another quality release from
> uXu and THE SECRET EMPIRE [...]

> Anyway, questions:
> 
> that `char-to-int' looks a bit strange...

[...]

> (defun char-to-int (c)
>   (string-to-number (char-to-string c) ))

It looks a bit roundabout, sure. But how would you do it without
making any assumptions about the underlying encoding?

If you have no scruples, and since chars in Emacs Lisp are simply
integers, and since encoding is almost-UTF-8 which is basically
ASCII, you could do

(defun char-to-int (c)
  (- c ?0))

but...

  - it's ugly
  - I don't know if that addresses your question

It would be faster, yes. But that will start to count once we
have ISBN-4294967296 or something. By then, you'll have a faster
computer, too ;-)

Cheers
-- t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05  6:44 ` tomas
@ 2019-09-05 17:28   ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-05 18:08   ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2019-09-05 17:28 UTC (permalink / raw)
  To: help-gnu-emacs

tomas wrote:

> It looks a bit roundabout, sure. But how
> would you do it without making any
> assumptions about the underlying encoding?
>
> If you have no scruples, and since chars in
> Emacs Lisp are simply integers, and since
> encoding is almost-UTF-8 which is basically
> ASCII, you could do
>
> (defun char-to-int (c)
>   (- c ?0))
>
> but...
>
>   - it's ugly

Why, I think it's great! New version:
  <https://dataswamp.org/~incal/emacs-init/isbn-new.el>

>   - I don't know if that addresses your
>     question

Me neither... wait, what was my
question exactly?

Yeah, why isn't there a "char-to-int" in
vanilla Emacs already? Should be a pretty
standard thing, right?

> It would be faster, yes. But that will start
> to count once we have ISBN-4294967296 or
> something. By then, you'll have a faster
> computer, too ;-)

... I'm not following? :)

That isn't a valid ISBN-10, the checksum
(check digit) is 3:

  (mod (- 11 (mod (+ (* 4 10)
                     (* 2  9)
                     (* 9  8)
                     (* 4  7)
                     (* 9  6)
                     (* 6  5)
                     (* 7  4)
                     (* 2  3)
                     (* 9  2))
                  11)) 11) ; 3

  (checksum-isbn-10 "4294967296") ; 3

-- 
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05  6:44 ` tomas
  2019-09-05 17:28   ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-05 18:08   ` Eli Zaretskii
  2019-09-05 19:03     ` tomas
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-09-05 18:08 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Thu, 5 Sep 2019 08:44:37 +0200
> From: <tomas@tuxteam.de>
> 
> [...] since chars in Emacs Lisp are simply integers, and since
> encoding is almost-UTF-8 which is basically ASCII [...]

Actually, encoding is not relevant here.  We are not talking about how
characters are stored in buffers and strings, we are talking about the
characters themselves.  A character is represented by an integer whose
value is that character's Unicode codepoint.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05 18:08   ` Eli Zaretskii
@ 2019-09-05 19:03     ` tomas
  2019-09-05 19:16       ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-07 17:13       ` Stefan Monnier
  0 siblings, 2 replies; 17+ messages in thread
From: tomas @ 2019-09-05 19:03 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On Thu, Sep 05, 2019 at 09:08:55PM +0300, Eli Zaretskii wrote:
> > Date: Thu, 5 Sep 2019 08:44:37 +0200
> > From: <tomas@tuxteam.de>
> > 
> > [...] since chars in Emacs Lisp are simply integers, and since
> > encoding is almost-UTF-8 which is basically ASCII [...]
> 
> Actually, encoding is not relevant here.  We are not talking about how
> characters are stored in buffers and strings, we are talking about the
> characters themselves.  A character is represented by an integer whose
> value is that character's Unicode codepoint.

Well, if it's always Unicode code point, then we can make the above
"official".

Thanks, Eli
-- t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05 19:03     ` tomas
@ 2019-09-05 19:16       ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-06  6:58         ` Eli Zaretskii
  2019-09-07 17:13       ` Stefan Monnier
  1 sibling, 1 reply; 17+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2019-09-05 19:16 UTC (permalink / raw)
  To: help-gnu-emacs

tomas wrote:

> Well, if it's always Unicode code point, then
> we can make the above "official".

Am I normal or orthogonal? I still don't
understand this isn't already in Emacs.
Perhaps in some prominent ELPA library that one
should have by now but hasn't, and everybody
else has it, like in school but with
bubble gum instead?

-- 
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05 19:16       ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-06  6:58         ` Eli Zaretskii
  2019-09-06  8:32           ` tomas
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-09-06  6:58 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Thu, 05 Sep 2019 21:16:06 +0200
> From: Emanuel Berg via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org>
> 
> I still don't understand this isn't already in Emacs.

IMO, it isn't important enough to be in Emacs.  That one particular
application needs it is not yet a sign it should be in core.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-06  6:58         ` Eli Zaretskii
@ 2019-09-06  8:32           ` tomas
  2019-09-06 15:07             ` Emanuel Berg via Users list for the GNU Emacs text editor
  0 siblings, 1 reply; 17+ messages in thread
From: tomas @ 2019-09-06  8:32 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Fri, Sep 06, 2019 at 09:58:28AM +0300, Eli Zaretskii wrote:
> > Date: Thu, 05 Sep 2019 21:16:06 +0200
> > From: Emanuel Berg via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org>
> > 
> > I still don't understand this isn't already in Emacs.
> 
> IMO, it isn't important enough to be in Emacs.  That one particular
> application needs it is not yet a sign it should be in core.

And -- hey. It's a (short) one-liner, after all!

Cheers
-- t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-06  8:32           ` tomas
@ 2019-09-06 15:07             ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-06 18:02               ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2019-09-06 15:07 UTC (permalink / raw)
  To: help-gnu-emacs

tomas wrote:

> And -- hey. It's a (short) one-liner,
> after all!

Conversions between types and representations
are the building blocks of the universe.

Might as well have to spill milk over your
chemistry book so to have h2o in it.

Milk is 87% water:

   https://www.dairyherd.com/article/whole-lot-water-goes-milk

-- 
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-06 15:07             ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-06 18:02               ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2019-09-06 18:02 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Fri, 06 Sep 2019 17:07:26 +0200
> From: Emanuel Berg via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org>
> 
> Conversions between types and representations
> are the building blocks of the universe.

But what we are discussing here is not a conversion.  Characters in
Emacs _are_ integers.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-05 19:03     ` tomas
  2019-09-05 19:16       ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-07 17:13       ` Stefan Monnier
  2019-09-07 17:20         ` Eli Zaretskii
  2019-09-09 17:47         ` Emanuel Berg via Users list for the GNU Emacs text editor
  1 sibling, 2 replies; 17+ messages in thread
From: Stefan Monnier @ 2019-09-07 17:13 UTC (permalink / raw)
  To: help-gnu-emacs

>> > [...] since chars in Emacs Lisp are simply integers, and since
>> > encoding is almost-UTF-8 which is basically ASCII [...]
>> Actually, encoding is not relevant here.  We are not talking about how
>> characters are stored in buffers and strings, we are talking about the
>> characters themselves.  A character is represented by an integer whose
>> value is that character's Unicode codepoint.
> Well, if it's always Unicode code point, then we can make the above
> "official".

(- c ?0) and (+ n ?0) work not just with Unicode code points, but with
code points in any character set that is sane enough to put the digits
from 0 to 9 consecutively in this order.  That's the case in ASCII,
EBCDIC, and all other charsets I know.

I think the main threat could come from a charset where digits appear
multiple times (e.g. some kind of iso-2022 system where some of the
sub-charsets also include digits), in which case (+ n ?0) would still
work but (- c ?0) could occasionally fail.  But, I don't know of such
a charset either.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-07 17:13       ` Stefan Monnier
@ 2019-09-07 17:20         ` Eli Zaretskii
  2019-09-07 17:45           ` Stefan Monnier
  2019-09-09 17:47         ` Emanuel Berg via Users list for the GNU Emacs text editor
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-09-07 17:20 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 07 Sep 2019 13:13:45 -0400
> 
> I think the main threat could come from a charset where digits appear
> multiple times (e.g. some kind of iso-2022 system where some of the
> sub-charsets also include digits), in which case (+ n ?0) would still
> work but (- c ?0) could occasionally fail.  But, I don't know of such
> a charset either.

All true, but not really relevant to the issue at hand, because in
Emacs characters are always Unicode codepoints.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-07 17:20         ` Eli Zaretskii
@ 2019-09-07 17:45           ` Stefan Monnier
  2019-09-07 18:30             ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Monnier @ 2019-09-07 17:45 UTC (permalink / raw)
  To: help-gnu-emacs

> All true, but not really relevant to the issue at hand, because in
> Emacs characters are always Unicode codepoints.

That hasn't always been the case, tho: they started as basically ASCII,
and then switched to some iso-2022 system (in Emacs-20) before getting
to the current unicode (in Emacs-23).

I must admit that it seems highly unlikely it will change in the
foreseeable future, but it's always a possibility (Unicode has its
shortcomings, so it's possible that we'll be using something else next
century).


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-07 17:45           ` Stefan Monnier
@ 2019-09-07 18:30             ` Eli Zaretskii
  2019-09-07 19:01               ` Stefan Monnier
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2019-09-07 18:30 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 07 Sep 2019 13:45:35 -0400
> 
> I must admit that it seems highly unlikely it will change in the
> foreseeable future, but it's always a possibility

More than highly unlikely, I'd say.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-07 18:30             ` Eli Zaretskii
@ 2019-09-07 19:01               ` Stefan Monnier
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2019-09-07 19:01 UTC (permalink / raw)
  To: help-gnu-emacs

>> I must admit that it seems highly unlikely it will change in the
>> foreseeable future, but it's always a possibility
> More than highly unlikely, I'd say.

Agreed.  But even for those paranoid enough to worry about that,
(+ n ?0) and (- c ?0) should be safe ways to convert between a digit
character and its numerical value.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-07 17:13       ` Stefan Monnier
  2019-09-07 17:20         ` Eli Zaretskii
@ 2019-09-09 17:47         ` Emanuel Berg via Users list for the GNU Emacs text editor
  2019-09-10  1:20           ` Perry Smith
  1 sibling, 1 reply; 17+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2019-09-09 17:47 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier wrote:

> (- c ?0) and (+ n ?0) work not just with
> Unicode code points, but with code points in
> any character set that is sane enough to put
> the digits from 0 to 9 consecutively in this
> order. That's the case in ASCII, EBCDIC, and
> all other charsets I know.

EBCDIC = Extended Binary-Coded Decimal Interchange Code

http://www.barrcentral.com/help/3270/appendix_b._ascii_and_ebcdic_tables.htm?sa=X&ved=2ahUKEwiTptvPpcTkAhWqpIsKHVtRCRgQ9QF6BAgLEAI
    
Seems to have something to do with IBM...

-- 
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: compute ISBN-10, char-to-int?
  2019-09-09 17:47         ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2019-09-10  1:20           ` Perry Smith
  0 siblings, 0 replies; 17+ messages in thread
From: Perry Smith @ 2019-09-10  1:20 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]



> On Sep 9, 2019, at 12:47 PM, Emanuel Berg via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
> 
> Stefan Monnier wrote:
> 
>> (- c ?0) and (+ n ?0) work not just with
>> Unicode code points, but with code points in
>> any character set that is sane enough to put
>> the digits from 0 to 9 consecutively in this
>> order. That's the case in ASCII, EBCDIC, and
>> all other charsets I know.
> 
> EBCDIC = Extended Binary-Coded Decimal Interchange Code
> 
> http://www.barrcentral.com/help/3270/appendix_b._ascii_and_ebcdic_tables.htm?sa=X&ved=2ahUKEwiTptvPpcTkAhWqpIsKHVtRCRgQ9QF6BAgLEAI
> 
> Seems to have something to do with IBM...

IBM has the concept of “code pages”… I think that concept is not unique to IBM.  Examples would be all of the ISO-8859-nn code pages where there is one (more or less) for each country or region.  (And then you get into terminal specific code pages and all sorts of fun mental illnesses).

I wrote a web app talking to an old IBM system and to my shock… EBCDIC also has code pages — roughly one per region or country.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-09-10  1:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-05  1:32 compute ISBN-10, char-to-int? Emanuel Berg via Users list for the GNU Emacs text editor
2019-09-05  6:44 ` tomas
2019-09-05 17:28   ` Emanuel Berg via Users list for the GNU Emacs text editor
2019-09-05 18:08   ` Eli Zaretskii
2019-09-05 19:03     ` tomas
2019-09-05 19:16       ` Emanuel Berg via Users list for the GNU Emacs text editor
2019-09-06  6:58         ` Eli Zaretskii
2019-09-06  8:32           ` tomas
2019-09-06 15:07             ` Emanuel Berg via Users list for the GNU Emacs text editor
2019-09-06 18:02               ` Eli Zaretskii
2019-09-07 17:13       ` Stefan Monnier
2019-09-07 17:20         ` Eli Zaretskii
2019-09-07 17:45           ` Stefan Monnier
2019-09-07 18:30             ` Eli Zaretskii
2019-09-07 19:01               ` Stefan Monnier
2019-09-09 17:47         ` Emanuel Berg via Users list for the GNU Emacs text editor
2019-09-10  1:20           ` Perry Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).