unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* [bug #31680] R6RS string literal intraline whitespace removal
@ 2010-11-17 12:15 Göran Weinholt
  2010-11-18 15:05 ` Mike Gran
  0 siblings, 1 reply; 10+ messages in thread
From: Göran Weinholt @ 2010-11-17 12:15 UTC (permalink / raw)
  To: Göran Weinholt, bug-guile


URL:
  <http://savannah.gnu.org/bugs/?31680>

                 Summary: R6RS string literal intraline whitespace removal
                 Project: Guile
            Submitted by: weinholt
            Submitted on: Wed Nov 17 13:15:37 2010
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

In R6RS subsection 4.2.7 there is this escape sequence:

 \<intraline whitespace><line ending>
       <intraline whitespace> : nothing

This escape sequence isn't working. Both the examples below should return
"foobar":

GNU Guile 1.9.13.58-b98d5a
Copyright (C) 1995-2010 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (import (rnrs))
scheme@(guile-user)> (read (open-string-input-port "#!r6rs \"foo\\\n 
bar\""))
$1 = "foo  bar"
scheme@(guile-user)> (read (open-string-input-port "#!r6rs \"foo\\  \n 
bar\""))
ERROR: In procedure scm_lreadr:
ERROR: #<unknown port>:1:10: illegal character in escape sequence: #\space





    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-17 12:15 [bug #31680] R6RS string literal intraline whitespace removal Göran Weinholt
@ 2010-11-18 15:05 ` Mike Gran
  2010-11-18 18:56   ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Gran @ 2010-11-18 15:05 UTC (permalink / raw)
  To: Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #1, bug #31680 (project guile):

Yeah it is true that this isn't implemented and should be.

Ludo, Andy, I did in fact write an implementation of this in r6rs-strings
tree, but, I never committed it to master.

The commit was cff6adf899dbc0336c7c017d52504f8138c89b3d and the part of the
commit specific to this bug is all in read.c in scm_read_string and
SCM_READ_SPACE_LINE_SPACE

I didn't commit it because, at the time, I took exception with the r6rs spec
itself.

It states that <intraline-whitespace><line-ending><intraline-whitespace>
expands to nothing.  And then it calls out that whitespace and line-ending are
any characters in the Unicode whitespace and line ending families!

I guarantee that no one will ever put something like <U+005d "backslash">
<U+205F "medium mathematical space"> <U+2028 "line separator"> <U+2006 "6 per
em space"> in the middle of a source code file.  It will never, ever happen. 
But writing the code to allow for all those cases lead to an ugly patch.



    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-18 15:05 ` Mike Gran
@ 2010-11-18 18:56   ` Andy Wingo
  2010-11-18 19:45     ` Göran Weinholt
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Wingo @ 2010-11-18 18:56 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #2, bug #31680 (project guile):

Interesting issue. I have no idea what that particular escape sequence is
for. Göran, do you use it? What for?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-18 18:56   ` Andy Wingo
@ 2010-11-18 19:45     ` Göran Weinholt
  2010-11-18 20:24       ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Göran Weinholt @ 2010-11-18 19:45 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #3, bug #31680 (project guile):

I've used it for multiline strings, like so:

  (define modp-group1-p
    (string->number
     "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1
      29024E088A67CC74020BBEA63B139B22514A08798E3404DD
      EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245
      E485B576625E7EC6F44C42E9A63A3620FFFFFFFFFFFFFFFF"
     16))

or like so:

        "My name is Ozymandias, king of kings:n
         Look on my works, ye Mighty, and despair!"

I think any implementation of multiline strings should be prepared to handle
characters from other operating systems. But surely you should not need to
check for U+205F and so on explictly, isn't it enough to see that they belong
to the right char-general-category?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-18 19:45     ` Göran Weinholt
@ 2010-11-18 20:24       ` Andy Wingo
  2010-11-18 21:51         ` Mike Gran
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Wingo @ 2010-11-18 20:24 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #4, bug #31680 (project guile):

On Thu 18 Nov 2010 20:45, Göran Weinholt <INVALID.NOREPLY@gnu.org> writes:

>   (define modp-group1-p
>     (string->number
>      "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1
>       29024E088A67CC74020BBEA63B139B22514A08798E3404DD
>       EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245
>       E485B576625E7EC6F44C42E9A63A3620FFFFFFFFFFFFFFFF"
>      16))

Sounds legit. 

> surely you should not need to check for U+205F and so on explictly,
> isn't it enough to see that they belong to the right
> char-general-category?

I think the deal is that all (?) of the other escapes can be dealt with
via the equivalent of a `case' expression. This one requires a property
lookup. It's not as nice.

Also note the thread at
http://lists.r6rs.org/pipermail/r6rs-discuss/2010-November/006146.html.

Is there a use case for allowing intraline spaces before the newline?
Disallowing that would eliminate a state in the parser.

Andy


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-18 20:24       ` Andy Wingo
@ 2010-11-18 21:51         ` Mike Gran
  2010-11-19 15:24           ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Gran @ 2010-11-18 21:51 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #5, bug #31680 (project guile):

> I think the deal is that all (?) of the other escapes can be 
> dealt with via the equivalent of a `case' expression. This 
> one requires a property lookup. It's not as nice. 

> Also note the thread at 
> http://lists.r6rs.org/pipermail/r6rs-discuss/2010-November/006146.html. 

> Is there a use case for allowing intraline spaces before
> the newline? Disallowing that would eliminate a state
> in the parser. 

I don't think there is a valid use case for allowing that
initial intraline space.  If (according to the discussion
in the R6RS list) the intention was to work around broken
editors, those bugs should be fixed with the editors, and
not become workarounds enshrined in a language spec.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-18 21:51         ` Mike Gran
@ 2010-11-19 15:24           ` Andy Wingo
  2010-11-19 17:57             ` Mike Gran
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Wingo @ 2010-11-19 15:24 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #6, bug #31680 (project guile):

OK, why don't we implement the escape sequence, <line ending><intraline
whitespace>* -> nothing, then. Does that sound right to you, Mike?

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-19 15:24           ` Andy Wingo
@ 2010-11-19 17:57             ` Mike Gran
  2011-01-21  7:35               ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Gran @ 2010-11-19 17:57 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #7, bug #31680 (project guile):

> OK, why don't we implement the escape sequence,
> <line ending><intraline whitespace>* -> nothing,
> then. Does that sound right to you, Mike?

Sounds great.  I'll commit something this weekend if you
don't get to it first.



    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2010-11-19 17:57             ` Mike Gran
@ 2011-01-21  7:35               ` Andy Wingo
  2011-01-21  8:22                 ` Andy Wingo
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Wingo @ 2011-01-21  7:35 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Follow-up Comment #8, bug #31680 (project guile):

I looked at implementing this, and it seems that Guile does handle the
sequence  <LF> as being nothing; e.g.

"foo
 bar"
=> "foo bar"

This appears to have been the case since time immemorial, or 1996 at least.
So we will probably have to add a reader option for this, unfortunately...

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug #31680] R6RS string literal intraline whitespace removal
  2011-01-21  7:35               ` Andy Wingo
@ 2011-01-21  8:22                 ` Andy Wingo
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Wingo @ 2011-01-21  8:22 UTC (permalink / raw)
  To: Andy Wingo, Göran Weinholt, Mike Gran, bug-guile


Update of bug #31680 (project guile):

                  Status:                    None => Fixed                  
             Open/Closed:                    Open => Closed                 

    _______________________________________________________

Follow-up Comment #9:

Fixed in git.  You have to enable a reader option:

(read-enable 'hungry-eol-escapes)


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31680>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-21  8:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-17 12:15 [bug #31680] R6RS string literal intraline whitespace removal Göran Weinholt
2010-11-18 15:05 ` Mike Gran
2010-11-18 18:56   ` Andy Wingo
2010-11-18 19:45     ` Göran Weinholt
2010-11-18 20:24       ` Andy Wingo
2010-11-18 21:51         ` Mike Gran
2010-11-19 15:24           ` Andy Wingo
2010-11-19 17:57             ` Mike Gran
2011-01-21  7:35               ` Andy Wingo
2011-01-21  8:22                 ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).