emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* oc-basic: CSL-JSON year as number vs. string (nativecomp?)
@ 2022-06-18  6:34 David Lukeš
  2022-06-19  1:44 ` Ihor Radchenko
  0 siblings, 1 reply; 12+ messages in thread
From: David Lukeš @ 2022-06-18  6:34 UTC (permalink / raw)
  To: emacs-orgmode

Hi all,

I've run into more problems with CSL-JSON support in oc-basic
(previously: https://list.orgmode.org/CAEPTPExcZKGAm3v-brzezfCwMM4h3hQtOq+89Qg+5ULJq1K4Yw@mail.gmail.com/).

I recently started to get errors like the following:

Error during redisplay: (jit-lock-function 544) signaled
(wrong-type-argument "Argument is not a string or a secondary string:
2007")

This patch makes them go away:

diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index a937f7513..9e00310a4 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -189,7 +189,7 @@ Return a hash table with citation references as
keys and fields alist as values.
                                 (cons 'year
                                       (cond
                                        ((consp date)
-                                        (caar date))
+                                        (number-to-string (caar date)))
                                        ((stringp date)
                                         (replace-regexp-in-string
                                           (rx

In this case, date is an array of numbers, so (caar date) is a number
(the publication year). Converting it to a string is the obvious fix.

Not sure why I haven't run into this error earlier, but I switched to
Emacs 28 somewhat recently, so nativecomp may be the problem here? It
sure seems plausible it wouldn't like a number where a string is
expected.

Best,

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-18  6:34 oc-basic: CSL-JSON year as number vs. string (nativecomp?) David Lukeš
@ 2022-06-19  1:44 ` Ihor Radchenko
  2022-06-19  1:55   ` Bruce D'Arcus
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-06-19  1:44 UTC (permalink / raw)
  To: David Lukeš; +Cc: emacs-orgmode

David Lukeš <dafydd.lukes@gmail.com> writes:

> I recently started to get errors like the following:
>
> Error during redisplay: (jit-lock-function 544) signaled
> (wrong-type-argument "Argument is not a string or a secondary string:
> 2007")
>
> This patch makes them go away:
>
> -                                        (caar date))
> +                                        (number-to-string (caar date)))

Can you provide an example json file demonstrating the problem?
I suspect that multiple json formats may be available in the wild. Some
parsed as a list of strings and some parsed as a list of numbers.

You patch may fix the former but break the latter.

Best,
Ihor


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-19  1:44 ` Ihor Radchenko
@ 2022-06-19  1:55   ` Bruce D'Arcus
  2022-06-19  3:31     ` David Lukeš
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce D'Arcus @ 2022-06-19  1:55 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: David Lukeš, org-mode-email

On Sat, Jun 18, 2022 at 9:43 PM Ihor Radchenko <yantar92@gmail.com> wrote:
>
> David Lukeš <dafydd.lukes@gmail.com> writes:
>
> > I recently started to get errors like the following:
> >
> > Error during redisplay: (jit-lock-function 544) signaled
> > (wrong-type-argument "Argument is not a string or a secondary string:
> > 2007")
> >
> > This patch makes them go away:
> >
> > -                                        (caar date))
> > +                                        (number-to-string (caar date)))
>
> Can you provide an example json file demonstrating the problem?
> I suspect that multiple json formats may be available in the wild. Some
> parsed as a list of strings and some parsed as a list of numbers.

The JSON schema allows either:

https://github.com/citation-style-language/schema/blob/5b8bbc824e026959417757d4ce4012a26b10e637/schemas/input/csl-data.json#L512

Bruce


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-19  1:55   ` Bruce D'Arcus
@ 2022-06-19  3:31     ` David Lukeš
  2022-06-19 13:39       ` Bruce D'Arcus
  2022-06-20 12:04       ` Ihor Radchenko
  0 siblings, 2 replies; 12+ messages in thread
From: David Lukeš @ 2022-06-19  3:31 UTC (permalink / raw)
  To: Bruce D'Arcus; +Cc: Ihor Radchenko, org-mode-email

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

> I suspect that multiple json formats may be available in the wild. Some
> parsed as a list of strings and some parsed as a list of numbers.

> The JSON schema allows either:

Ah, thanks for looking this up! So (format "%s" (caar date)) instead
of (number-to-string (caar date))?

(That was actually my initial solution, purely out of being defensive
and trying to make sure it doesn't break in yet a different way should
other things than numbers turn up in date-parts, even nil or such.
Then I thought it was too hamfisted and didn't have the time to make a
case for being so defensive here. But since it's needed even just to
be *compliant*, the case seems quite clear now.)

> Can you provide an example json file demonstrating the problem?

Sure, I'm attaching a short sample.

Best,

David

[-- Attachment #2: sample.json --]
[-- Type: application/json, Size: 1403 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-19  3:31     ` David Lukeš
@ 2022-06-19 13:39       ` Bruce D'Arcus
  2022-06-20 12:06         ` Ihor Radchenko
  2022-06-20 12:04       ` Ihor Radchenko
  1 sibling, 1 reply; 12+ messages in thread
From: Bruce D'Arcus @ 2022-06-19 13:39 UTC (permalink / raw)
  To: David Lukeš; +Cc: Ihor Radchenko, org-mode-email

On Sat, Jun 18, 2022 at 11:31 PM David Lukeš <dafydd.lukes@gmail.com> wrote:
>
> > I suspect that multiple json formats may be available in the wild. Some
> > parsed as a list of strings and some parsed as a list of numbers.
>
> > The JSON schema allows either:
>
> Ah, thanks for looking this up!

I created CSL, and have helped develop the data schema.

BTW, just to look forward, it's likely we'll change the date property
to prefer an EDTF string; same as biblatex does now.

Bruce


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-19  3:31     ` David Lukeš
  2022-06-19 13:39       ` Bruce D'Arcus
@ 2022-06-20 12:04       ` Ihor Radchenko
  1 sibling, 0 replies; 12+ messages in thread
From: Ihor Radchenko @ 2022-06-20 12:04 UTC (permalink / raw)
  To: David Lukeš; +Cc: Bruce D'Arcus, org-mode-email

David Lukeš <dafydd.lukes@gmail.com> writes:

>> The JSON schema allows either:
>
> Ah, thanks for looking this up! So (format "%s" (caar date)) instead
> of (number-to-string (caar date))?
>
> (That was actually my initial solution, purely out of being defensive
> and trying to make sure it doesn't break in yet a different way should
> other things than numbers turn up in date-parts, even nil or such.
> Then I thought it was too hamfisted and didn't have the time to make a
> case for being so defensive here. But since it's needed even just to
> be *compliant*, the case seems quite clear now.)

I'd prefer an explicit cond here.
format "%s" may silently work on malformed json files and will mask
issues with bibliography from the user. I personally hate when it
happens and it is often easy to miss issues with downloaded bibliography
entries.

>> Can you provide an example json file demonstrating the problem?
>
> Sure, I'm attaching a short sample.

Thanks! Would you mind creating a patch and possibly supplying a test
that will make sure that the example file and similar are correctly
parsed?

Best,
Ihor


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-19 13:39       ` Bruce D'Arcus
@ 2022-06-20 12:06         ` Ihor Radchenko
  2022-06-20 13:11           ` Bruce D'Arcus
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-06-20 12:06 UTC (permalink / raw)
  To: Bruce D'Arcus; +Cc: David Lukeš, org-mode-email

"Bruce D'Arcus" <bdarcus@gmail.com> writes:

> I created CSL, and have helped develop the data schema.
>
> BTW, just to look forward, it's likely we'll change the date property
> to prefer an EDTF string; same as biblatex does now.

Have you ever checked if oc-basic parser complies with the schema?
It is clear that is not for year field, but I am wondering about other
fields.

Best,
Ihor


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-20 12:06         ` Ihor Radchenko
@ 2022-06-20 13:11           ` Bruce D'Arcus
  2022-06-20 14:13             ` David Lukeš
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce D'Arcus @ 2022-06-20 13:11 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: David Lukeš, org-mode-email

On Mon, Jun 20, 2022 at 8:04 AM Ihor Radchenko <yantar92@gmail.com> wrote:
>
> "Bruce D'Arcus" <bdarcus@gmail.com> writes:
>
> > I created CSL, and have helped develop the data schema.
> >
> > BTW, just to look forward, it's likely we'll change the date property
> > to prefer an EDTF string; same as biblatex does now.
>
> Have you ever checked if oc-basic parser complies with the schema?
> It is clear that is not for year field, but I am wondering about other
> fields.

I had not. But the schema doesn't have a lot of rules.

Aside from dates, the other structured object definition is for
(author etc) names.

oc-basic ignores the properties beyond "family" and "given" names,
most of which probably don't matter for this purpose, except maybe
"literal" (for organizational authors).

Bruce


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-20 13:11           ` Bruce D'Arcus
@ 2022-06-20 14:13             ` David Lukeš
  2022-06-20 14:24               ` Bruce D'Arcus
  2022-06-21  3:28               ` Ihor Radchenko
  0 siblings, 2 replies; 12+ messages in thread
From: David Lukeš @ 2022-06-20 14:13 UTC (permalink / raw)
  To: Bruce D'Arcus; +Cc: Ihor Radchenko, org-mode-email

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

> I created CSL, and have helped develop the data schema.

Well, that's what I call a reliable source :) I suppose this is as
good a time as any to say thanks for all that!

> it's likely we'll change the date property to prefer an
> EDTF string

Will that be stored in the `raw` or `literal` field? In that case, the
current implementation should work with (not too wild) EDTF strings.
If not, code will have to be added to extract the value from the
appropriate field.

> I'd prefer an explicit cond here. format "%s" may silently
> work on malformed json files and will mask issues with
> bibliography from the user.

Sure, good idea :)

> Would you mind creating a patch and possibly supplying a test
> that will make sure that the example file and similar are correctly
> parsed?

I'm attaching a patch (just the patch for now, no commit message), let
me know what you think, in particular the error message. What would be
an acceptable way to wrap the format string?

As for tests -- if `oc-basic.el` already had some, I'd try to see if I
can come up with something by analogy. But as it stands, I know too
little about Elisp, nothing about Elisp testing, and have too little
spare time, sorry :(

Best,

David

[-- Attachment #2: oc-basic.el.patch --]
[-- Type: text/x-patch, Size: 1177 bytes --]

diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index a937f75..f10b95b 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -189,7 +189,14 @@ Return a hash table with citation references as keys and fields alist as values.
                                 (cons 'year
                                       (cond
                                        ((consp date)
-                                        (caar date))
+                                         (let ((year (caar date)))
+                                           (cond
+                                             ((numberp year) (number-to-string year))
+                                             ((stringp year) year)
+                                             (t
+                                               (error
+                                                 "First element of CSL-JSON date-parts should be a number or string, got %s: %S"
+                                                 (type-of year) year)))))
                                        ((stringp date)
                                         (replace-regexp-in-string
                                           (rx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-20 14:13             ` David Lukeš
@ 2022-06-20 14:24               ` Bruce D'Arcus
  2022-06-21  3:28               ` Ihor Radchenko
  1 sibling, 0 replies; 12+ messages in thread
From: Bruce D'Arcus @ 2022-06-20 14:24 UTC (permalink / raw)
  To: David Lukeš; +Cc: Ihor Radchenko, org-mode-email

On Mon, Jun 20, 2022 at 10:13 AM David Lukeš <dafydd.lukes@gmail.com> wrote:
...

> > it's likely we'll change the date property to prefer an
> > EDTF string
>
> Will that be stored in the `raw` or `literal` field? In that case, the
> current implementation should work with (not too wild) EDTF strings.
> If not, code will have to be added to extract the value from the
> appropriate field.

I need to emphasize a major caveat: this is subject to change,
discussion, etc., and would take time regardless to see in the wild.
So you shouldn't do anything with this information ATM; I just mention
it because it may be relevant in the future.

And of course, we welcome feedback.

But currently, the draft schema for the next major release allows this
as an option (this is an EDTF date range):

"issued": "2020-07/2020-08",

E.g. a (preferred) EDTF string, OR the current date object.

In general, I should add, there are some competing priorities here.
The CSL JSON was first created for the citeproc-js project, whose
initial and primary consumer is Zotero, which embeds that data in word
processing documents.

In that case, machines are the only consumers.

But it's since become more widely used, including in pandoc, and now
in org. So the planned move to EDTF is in part to balance those
priorities.

But as I say, we still need feedback from the different constituencies.

Bruce


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-20 14:13             ` David Lukeš
  2022-06-20 14:24               ` Bruce D'Arcus
@ 2022-06-21  3:28               ` Ihor Radchenko
  2022-06-21 12:02                 ` David Lukeš
  1 sibling, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-06-21  3:28 UTC (permalink / raw)
  To: David Lukeš; +Cc: Bruce D'Arcus, org-mode-email

David Lukeš <dafydd.lukes@gmail.com> writes:

>> Would you mind creating a patch and possibly supplying a test
>> that will make sure that the example file and similar are correctly
>> parsed?
>
> I'm attaching a patch (just the patch for now, no commit message), let
> me know what you think, in particular the error message. What would be
> an acceptable way to wrap the format string?

LGTM! There is no need to wrap the format string.

> As for tests -- if `oc-basic.el` already had some, I'd try to see if I
> can come up with something by analogy. But as it stands, I know too
> little about Elisp, nothing about Elisp testing, and have too little
> spare time, sorry :(

No problem. Your patch is already helpful.

Best,
Ihor


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: oc-basic: CSL-JSON year as number vs. string (nativecomp?)
  2022-06-21  3:28               ` Ihor Radchenko
@ 2022-06-21 12:02                 ` David Lukeš
  0 siblings, 0 replies; 12+ messages in thread
From: David Lukeš @ 2022-06-21 12:02 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Bruce D'Arcus, org-mode-email

Bruce: Understood, I was just thinking forward :) Thank you for the
details. As far as I'm concerned, EDTF in CSL JSON sounds like a good
idea (but it's not like my opinion should particularly matter).

Ihor: Thank you for the feedback! I'll send a proper patch then.

Best,

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-21 12:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18  6:34 oc-basic: CSL-JSON year as number vs. string (nativecomp?) David Lukeš
2022-06-19  1:44 ` Ihor Radchenko
2022-06-19  1:55   ` Bruce D'Arcus
2022-06-19  3:31     ` David Lukeš
2022-06-19 13:39       ` Bruce D'Arcus
2022-06-20 12:06         ` Ihor Radchenko
2022-06-20 13:11           ` Bruce D'Arcus
2022-06-20 14:13             ` David Lukeš
2022-06-20 14:24               ` Bruce D'Arcus
2022-06-21  3:28               ` Ihor Radchenko
2022-06-21 12:02                 ` David Lukeš
2022-06-20 12:04       ` Ihor Radchenko

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).