unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Unicode numeric value
@ 2018-12-16  4:31 Freeman Gilmore
  2018-12-16  6:11 ` John Cowan
  2018-12-16  8:14 ` Mark H Weaver
  0 siblings, 2 replies; 7+ messages in thread
From: Freeman Gilmore @ 2018-12-16  4:31 UTC (permalink / raw)
  To: guile-user

I am looking for a procedure that will read the numeric value, field 8, of
an Unicode numeric character.   Has anyone written this procedure or know
where I can find it?

Thank you,

ƒg


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-16  4:31 Unicode numeric value Freeman Gilmore
@ 2018-12-16  6:11 ` John Cowan
  2018-12-16  8:14 ` Mark H Weaver
  1 sibling, 0 replies; 7+ messages in thread
From: John Cowan @ 2018-12-16  6:11 UTC (permalink / raw)
  To: Freeman Gilmore; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 464 bytes --]

I can't help (yet) if you need the full numeric value.  But if the decimal
digit value is enough for you, then the attached source file should work
(from Chibi).

On Sat, Dec 15, 2018 at 11:32 PM Freeman Gilmore <freeman.gilmore@gmail.com>
wrote:

> I am looking for a procedure that will read the numeric value, field 8, of
> an Unicode numeric character.   Has anyone written this procedure or know
> where I can find it?
>
> Thank you,
>
> ƒg
>

[-- Attachment #2: digit-value.scm --]
[-- Type: application/octet-stream, Size: 2777 bytes --]


(cond-expand
 (full-unicode
  (define zeros
    '#(#\x0030                ;DIGIT ZERO
       #\x0660                ;ARABIC-INDIC DIGIT ZERO
       #\x06F0                ;EXTENDED ARABIC-INDIC DIGIT ZERO
       #\x07C0                ;NKO DIGIT ZERO
       #\x0966                ;DEVANAGARI DIGIT ZERO
       #\x09E6                ;BENGALI DIGIT ZERO
       #\x0A66                ;GURMUKHI DIGIT ZERO
       #\x0AE6                ;GUJARATI DIGIT ZERO
       #\x0B66                ;ORIYA DIGIT ZERO
       #\x0BE6                ;TAMIL DIGIT ZERO
       #\x0C66                ;TELUGU DIGIT ZERO
       #\x0CE6                ;KANNADA DIGIT ZERO
       #\x0D66                ;MALAYALAM DIGIT ZERO
       #\x0E50                ;THAI DIGIT ZERO
       #\x0ED0                ;LAO DIGIT ZERO
       #\x0F20                ;TIBETAN DIGIT ZERO
       #\x1040                ;MYANMAR DIGIT ZERO
       #\x1090                ;MYANMAR SHAN DIGIT ZERO
       #\x17E0                ;KHMER DIGIT ZERO
       #\x1810                ;MONGOLIAN DIGIT ZERO
       #\x1946                ;LIMBU DIGIT ZERO
       #\x19D0                ;NEW TAI LUE DIGIT ZERO
       #\x1A80                ;TAI THAM HORA DIGIT ZERO
       #\x1A90                ;TAI THAM THAM DIGIT ZERO
       #\x1B50                ;BALINESE DIGIT ZERO
       #\x1BB0                ;SUNDANESE DIGIT ZERO
       #\x1C40                ;LEPCHA DIGIT ZERO
       #\x1C50                ;OL CHIKI DIGIT ZERO
       #\xA620                ;VAI DIGIT ZERO
       #\xA8D0                ;SAURASHTRA DIGIT ZERO
       #\xA900                ;KAYAH LI DIGIT ZERO
       #\xA9D0                ;JAVANESE DIGIT ZERO
       #\xAA50                ;CHAM DIGIT ZERO
       #\xABF0                ;MEETEI MAYEK DIGIT ZERO
       #\xFF10                ;FULLWIDTH DIGIT ZERO
       #\x104A0               ;OSMANYA DIGIT ZERO
       #\x11066               ;BRAHMI DIGIT ZERO
       #\x1D7CE               ;MATHEMATICAL BOLD DIGIT ZERO
       #\x1D7D8               ;MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO
       #\x1D7E2               ;MATHEMATICAL SANS-SERIF DIGIT ZERO
       #\x1D7EC               ;MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO
       #\x1D7F6               ;MATHEMATICAL MONOSPACE DIGIT ZERO
       )))
 (else
  (define zeros #(#\0))))

(define (digit-value ch)
  (let ((n (char->integer ch)))
    (let lp ((lo 0) (hi (- (vector-length zeros) 1)))
      (if (> lo hi)
          #f
          (let* ((mid (+ lo (quotient (- hi lo) 2)))
                 (mid-zero (char->integer (vector-ref zeros mid))))
            (cond
             ((<= mid-zero n (+ mid-zero 9))
              (- n mid-zero))
             ((< n mid-zero)
              (lp lo (- mid 1)))
             (else
              (lp (+ mid 1) hi))))))))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-16  4:31 Unicode numeric value Freeman Gilmore
  2018-12-16  6:11 ` John Cowan
@ 2018-12-16  8:14 ` Mark H Weaver
  2018-12-16 11:24   ` Freeman Gilmore
  1 sibling, 1 reply; 7+ messages in thread
From: Mark H Weaver @ 2018-12-16  8:14 UTC (permalink / raw)
  To: Freeman Gilmore; +Cc: guile-user

Freeman Gilmore <freeman.gilmore@gmail.com> writes:

> I am looking for a procedure that will read the numeric value, field 8, of
> an Unicode numeric character.   Has anyone written this procedure or know
> where I can find it?

The 'r7rs-wip' branch of the Guile git repository contains a procedure
that does this, with a lookup table derived from Unicode 6.3.0.

  https://git.savannah.gnu.org/cgit/guile.git/tree/module/scheme/char.scm?h=r7rs-wip

The file is written as an R7RS library form, which won't work on current
releases of Guile, but for now you could simply extract the
'digit-value' procedure from it, provided that you preserve the
copyright notice.

      Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-16  8:14 ` Mark H Weaver
@ 2018-12-16 11:24   ` Freeman Gilmore
  2018-12-17 18:42     ` Mark H Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Freeman Gilmore @ 2018-12-16 11:24 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

On Sun, Dec 16, 2018 at 3:15 AM Mark H Weaver <mhw@netris.org> wrote:

> Freeman Gilmore <freeman.gilmore@gmail.com> writes:
>
> > I am looking for a procedure that will read the numeric value, field 8,
> of
> > an Unicode numeric character.   Has anyone written this procedure or know
> > where I can find it?
>
> The 'r7rs-wip' branch of the Guile git repository contains a procedure
> that does this, with a lookup table derived from Unicode 6.3.0.
>
>
> https://git.savannah.gnu.org/cgit/guile.git/tree/module/scheme/char.scm?h=r7rs-wip
>
> The file is written as an R7RS library form, which won't work on current
> releases of Guile, but for now you could simply extract the
> 'digit-value' procedure from it, provided that you preserve the
> copyright notice.
>
>       Mark
>
Thank you Mark:

That is only half the battle, let me explain.    I do not want to read the
standard Unicode table.   I want to directly read field 8 of a numeric
character
in the privet use area of the Unicode.

This is not part of scheme.    The other half, I need to finger out how to
put the numeric values in field 8 for the characters I want to use.
Thank you,


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-16 11:24   ` Freeman Gilmore
@ 2018-12-17 18:42     ` Mark H Weaver
  2018-12-17 18:52       ` Mark H Weaver
  2019-01-05  1:07       ` Freeman Gilmore
  0 siblings, 2 replies; 7+ messages in thread
From: Mark H Weaver @ 2018-12-17 18:42 UTC (permalink / raw)
  To: Freeman Gilmore; +Cc: guile-user

Hi,

Freeman Gilmore <freeman.gilmore@gmail.com> writes:

> On Sun, Dec 16, 2018 at 3:15 AM Mark H Weaver <mhw@netris.org> wrote:
>
>  Freeman Gilmore <freeman.gilmore@gmail.com> writes:
>
>  > I am looking for a procedure that will read the numeric value, field 8, of
>  > an Unicode numeric character.   Has anyone written this procedure or know
>  > where I can find it?
>
>  The 'r7rs-wip' branch of the Guile git repository contains a procedure
>  that does this, with a lookup table derived from Unicode 6.3.0.
>
>    https://git.savannah.gnu.org/cgit/guile.git/tree/module/scheme/char.scm?h=r7rs-wip
>
>  The file is written as an R7RS library form, which won't work on current
>  releases of Guile, but for now you could simply extract the
>  'digit-value' procedure from it, provided that you preserve the
>  copyright notice.
>
>        Mark
>
> Thank you Mark:
>
> That is only half the battle, let me explain.  I do not want to read
> the standard Unicode table.  I want to directly read field 8 of a
> numeric character in the privet use area of the Unicode.
>
> This is not part of scheme.  The other half, I need to finger out how
> to put the numeric values in field 8 for the characters I want to use.

If the mapping from code points to numeric values is static, then you
could simply modify the lookup table in the code I suggested above.

If the mapping is dynamic, then you'll need a different strategy.  One
simple approach would be to use a hash table mapping from characters to
digit values:

  (define digit-value-table (make-hash-table))
  
  (define (set-digit-value! char value)
    (hashv-set! digit-value-table char value))

  (define (digit-value char)
    (hashv-ref digit-value-table char #f))

If the range of relevant code points is small enough, another approach
would be to use a vector:

  (define private-code-point-start #xE000)
  (define private-code-point-end   #xF900)

  (define (code-point-in-range? cp)
    (<= private-code-point-start
        cp
        private-code-point-end))

  (define digit-value-table
    (make-vector (- private-code-point-end
                    private-code-point-start)
                 #f))

  (define (set-digit-value! char value)
    (let ((cp (char->integer char)))
      (unless (code-point-in-range? cp)
        (error "set-digit-value!: code point out of range:" cp))
      (vector-set! digit-value-table
                   (- cp private-code-point-start)
                   value)))

  (define (digit-value char)
    (let ((cp (char->integer char)))
      (and (code-point-in-range? cp)
           (vector-ref digit-value-table
                       (- cp private-code-point-start)))))

For a more compact representation, you could use a SRFI-4 homogeneous
numeric vector instead, although you'd need to designate a special
numeric value to represent "not a digit".

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-17 18:42     ` Mark H Weaver
@ 2018-12-17 18:52       ` Mark H Weaver
  2019-01-05  1:07       ` Freeman Gilmore
  1 sibling, 0 replies; 7+ messages in thread
From: Mark H Weaver @ 2018-12-17 18:52 UTC (permalink / raw)
  To: Freeman Gilmore; +Cc: guile-user

Mark H Weaver <mhw@netris.org> writes:

> If the range of relevant code points is small enough, another approach
> would be to use a vector:
>
>   (define private-code-point-start #xE000)
>   (define private-code-point-end   #xF900)
>
>   (define (code-point-in-range? cp)
>     (<= private-code-point-start
>         cp
>         private-code-point-end))

Sorry, the definition above is incorrect.  It should be:

   (define (code-point-in-range? cp)
     (<= private-code-point-start
         cp
         (- private-code-point-end 1)))

     Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unicode numeric value
  2018-12-17 18:42     ` Mark H Weaver
  2018-12-17 18:52       ` Mark H Weaver
@ 2019-01-05  1:07       ` Freeman Gilmore
  1 sibling, 0 replies; 7+ messages in thread
From: Freeman Gilmore @ 2019-01-05  1:07 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Mark:

I have been away and just getting back to email.   Thank you for replying.

So it looks like the library is just a lookup table.   I though it was
more  complicated than that, reading from a Unicode data file.    The hash
table may be better and more portable.   I could also change the numeric
value for the given code points as needed.    I do not know what " SRFI-4
homogeneous numeric vector" is, but I did google it, a lot there.

I am to new to this but I did copy your hash table that you made for me and
added the correction. Thanks for your time here.   I will probably be using
it (if I learn enough).

You all have a good year.
ƒg

On Mon, Dec 17, 2018 at 1:43 PM Mark H Weaver <mhw@netris.org> wrote:

> Hi,
>
> Freeman Gilmore <freeman.gilmore@gmail.com> writes:
>
> > On Sun, Dec 16, 2018 at 3:15 AM Mark H Weaver <mhw@netris.org> wrote:
> >
> >  Freeman Gilmore <freeman.gilmore@gmail.com> writes:
> >
> >  > I am looking for a procedure that will read the numeric value, field
> 8, of
> >  > an Unicode numeric character.   Has anyone written this procedure or
> know
> >  > where I can find it?
> >
> >  The 'r7rs-wip' branch of the Guile git repository contains a procedure
> >  that does this, with a lookup table derived from Unicode 6.3.0.
> >
> >
> https://git.savannah.gnu.org/cgit/guile.git/tree/module/scheme/char.scm?h=r7rs-wip
> >
> >  The file is written as an R7RS library form, which won't work on current
> >  releases of Guile, but for now you could simply extract the
> >  'digit-value' procedure from it, provided that you preserve the
> >  copyright notice.
> >
> >        Mark
> >
> > Thank you Mark:
> >
> > That is only half the battle, let me explain.  I do not want to read
> > the standard Unicode table.  I want to directly read field 8 of a
> > numeric character in the privet use area of the Unicode.
> >
> > This is not part of scheme.  The other half, I need to finger out how
> > to put the numeric values in field 8 for the characters I want to use.
>
> If the mapping from code points to numeric values is static, then you
> could simply modify the lookup table in the code I suggested above.
>
> If the mapping is dynamic, then you'll need a different strategy.  One
> simple approach would be to use a hash table mapping from characters to
> digit values:
>
>   (define digit-value-table (make-hash-table))
>
>   (define (set-digit-value! char value)
>     (hashv-set! digit-value-table char value))
>
>   (define (digit-value char)
>     (hashv-ref digit-value-table char #f))
>
> If the range of relevant code points is small enough, another approach
> would be to use a vector:
>
>   (define private-code-point-start #xE000)
>   (define private-code-point-end   #xF900)
>
>   (define (code-point-in-range? cp)
>     (<= private-code-point-start
>         cp
>         private-code-point-end))
>
>   (define digit-value-table
>     (make-vector (- private-code-point-end
>                     private-code-point-start)
>                  #f))
>
>   (define (set-digit-value! char value)
>     (let ((cp (char->integer char)))
>       (unless (code-point-in-range? cp)
>         (error "set-digit-value!: code point out of range:" cp))
>       (vector-set! digit-value-table
>                    (- cp private-code-point-start)
>                    value)))
>
>   (define (digit-value char)
>     (let ((cp (char->integer char)))
>       (and (code-point-in-range? cp)
>            (vector-ref digit-value-table
>                        (- cp private-code-point-start)))))
>
> For a more compact representation, you could use a SRFI-4 homogeneous
> numeric vector instead, although you'd need to designate a special
> numeric value to represent "not a digit".
>
>     Regards,
>       Mark
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-05  1:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-16  4:31 Unicode numeric value Freeman Gilmore
2018-12-16  6:11 ` John Cowan
2018-12-16  8:14 ` Mark H Weaver
2018-12-16 11:24   ` Freeman Gilmore
2018-12-17 18:42     ` Mark H Weaver
2018-12-17 18:52       ` Mark H Weaver
2019-01-05  1:07       ` Freeman Gilmore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).