unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* always O_BINARY?
@ 2013-02-24 11:39 Andy Wingo
  2013-02-24 12:08 ` Neil Jerram
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Andy Wingo @ 2013-02-24 11:39 UTC (permalink / raw)
  To: guile-devel

Hi,

Just thinking aloud here -- Windows has this O_BINARY thing that
translates CRLF to LF when reading, and LF to CRLF when writing.  It
seems to me to be a useless thing.  We already have our own i/o
abstractions and should deal with CRLF vs LF in Scheme, I think:

  The (newline) function can write CRLF
  The ~% format directive should DTRT
  read-line should DTRT

And since all of our hackers have been on POSIX systems, we're used to
there being no O_BINARY/O_TEXT distinction.

So, what do you think about always adding O_BINARY to files that Guile
opens?

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: always O_BINARY?
  2013-02-24 11:39 always O_BINARY? Andy Wingo
@ 2013-02-24 12:08 ` Neil Jerram
  2013-02-24 16:55 ` Mike Gran
  2013-02-24 21:17 ` Ludovic Courtès
  2 siblings, 0 replies; 25+ messages in thread
From: Neil Jerram @ 2013-02-24 12:08 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> writes:

> Hi,
>
> Just thinking aloud here -- Windows has this O_BINARY thing that
> translates CRLF to LF when reading, and LF to CRLF when writing.  It
> seems to me to be a useless thing.  We already have our own i/o
> abstractions and should deal with CRLF vs LF in Scheme, I think:
>
>   The (newline) function can write CRLF
>   The ~% format directive should DTRT
>   read-line should DTRT
>
> And since all of our hackers have been on POSIX systems, we're used to
> there being no O_BINARY/O_TEXT distinction.
>
> So, what do you think about always adding O_BINARY to files that Guile
> opens?

Maybe look at what Emacs on Windows does?  I would guess it has the same
question, and probably the same answer as you've suggested.

       Neil



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: always O_BINARY?
  2013-02-24 11:39 always O_BINARY? Andy Wingo
  2013-02-24 12:08 ` Neil Jerram
@ 2013-02-24 16:55 ` Mike Gran
  2013-02-24 21:17 ` Ludovic Courtès
  2 siblings, 0 replies; 25+ messages in thread
From: Mike Gran @ 2013-02-24 16:55 UTC (permalink / raw)
  To: Andy Wingo, guile-devel

> From: Andy Wingo <wingo@pobox.com>
> So, what do you think about always adding O_BINARY to files that Guile
> opens?

Lilypond, Gnucash, Denemo, Autogen and Emacs all run on Windows
to varying degrees.  As does Gnome Games.  If it doesn't break
any of them, then it might be okay.  In an ideal world, there would
be a cross-platform build bot that runs 'make check' on each of
these things so that one could know if a change was going to 
break something.
 
But, for what it is worth, I think it is a bad idea.
 
If you imagine a program that uses autoconf... One way to deal with
the rapid churn of API in Guile is to check for the presence or
absence of a function.  Most of our API changes could be detected
in a configure script by checking to see if a procedure is present
or absent.
 
This would be something else entirely.  To deal with this in an
autoconf sense, one would have to write a test that actually reads
and writes a file.
 
-Mike



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: always O_BINARY?
  2013-02-24 11:39 always O_BINARY? Andy Wingo
  2013-02-24 12:08 ` Neil Jerram
  2013-02-24 16:55 ` Mike Gran
@ 2013-02-24 21:17 ` Ludovic Courtès
  2013-02-28  3:24   ` Adding new information to scm_t_port (was Re: always O_BINARY?) Mark H Weaver
  2 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2013-02-24 21:17 UTC (permalink / raw)
  To: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> Just thinking aloud here -- Windows has this O_BINARY thing that
> translates CRLF to LF when reading, and LF to CRLF when writing.  It
> seems to me to be a useless thing.  We already have our own i/o
> abstractions and should deal with CRLF vs LF in Scheme, I think:

Yes.

>   The (newline) function can write CRLF
>   The ~% format directive should DTRT
>   read-line should DTRT

IMO the correct abstraction here is transcoders à la R6RS.  The problem
is that scm_t_port doesn’t have any slot to specify the EOL style, but
it would need one.

> So, what do you think about always adding O_BINARY to files that Guile
> opens?

Yes, but only when there’s a per-port EOL style, since otherwise we’d
just remove functionality, no?

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-24 21:17 ` Ludovic Courtès
@ 2013-02-28  3:24   ` Mark H Weaver
  2013-02-28  8:53     ` Andy Wingo
  0 siblings, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2013-02-28  3:24 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:
> Andy Wingo <wingo@pobox.com> skribis:
>
>>   The (newline) function can write CRLF
>>   The ~% format directive should DTRT
>>   read-line should DTRT
>
> IMO the correct abstraction here is transcoders à la R6RS.

Agreed.

> The problem is that scm_t_port doesn’t have any slot to specify the
> EOL style, but it would need one.

I think it's important that we find a way to add new information to
scm_t_port in 2.0.  We also need this to properly fix the BOM issue.

Here's a proposal: let's slightly redefine the meaning of 'input_cd' and
'output_cd'.  Users are already unable to use these, because in the
common case (UTF-8) they are both -1.

Instead of having 'input_cd' and 'output_cd' point directly to the
platform's iconv_t structures, let's have them point to our own internal
structure(s) that hold the needed transcoder state.  This could include
things like the state for internally-implement encoding(s) (e.g. UTF-8
BOM handling), EOL style, and iconv_t pointer(s) if appropriate.

What do you think?

    Mark



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-28  3:24   ` Adding new information to scm_t_port (was Re: always O_BINARY?) Mark H Weaver
@ 2013-02-28  8:53     ` Andy Wingo
  2013-02-28 11:04       ` Ludovic Courtès
  2013-03-05 18:55       ` Mark H Weaver
  0 siblings, 2 replies; 25+ messages in thread
From: Andy Wingo @ 2013-02-28  8:53 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Thu 28 Feb 2013 04:24, Mark H Weaver <mhw@netris.org> writes:

> Instead of having 'input_cd' and 'output_cd' point directly to the
> platform's iconv_t structures, let's have them point to our own internal
> structure(s) that hold the needed transcoder state.  This could include
> things like the state for internally-implement encoding(s) (e.g. UTF-8
> BOM handling), EOL style, and iconv_t pointer(s) if appropriate.

Great idea.

I would call it a generic "port-private" data structure; it is not
limited to transcoder state.

  struct scm_t_port_private;
  #define SCM_I_PORT_PRIVATE ((struct scm_t_port_private*)((ptob)->input_cd))

MHO :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-28  8:53     ` Andy Wingo
@ 2013-02-28 11:04       ` Ludovic Courtès
  2013-02-28 13:09         ` Andy Wingo
  2013-03-05 18:55       ` Mark H Weaver
  1 sibling, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2013-02-28 11:04 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Mark H Weaver, guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Thu 28 Feb 2013 04:24, Mark H Weaver <mhw@netris.org> writes:
>
>> Instead of having 'input_cd' and 'output_cd' point directly to the
>> platform's iconv_t structures, let's have them point to our own internal
>> structure(s) that hold the needed transcoder state.  This could include
>> things like the state for internally-implement encoding(s) (e.g. UTF-8
>> BOM handling), EOL style, and iconv_t pointer(s) if appropriate.
>
> Great idea.
>
> I would call it a generic "port-private" data structure; it is not
> limited to transcoder state.
>
>   struct scm_t_port_private;
>   #define SCM_I_PORT_PRIVATE ((struct scm_t_port_private*)((ptob)->input_cd))

Sounds good to me, perhaps with ‘input_cd’ changed to ‘internal’ or
similar.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-28 11:04       ` Ludovic Courtès
@ 2013-02-28 13:09         ` Andy Wingo
  2013-03-01  9:03           ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Wingo @ 2013-02-28 13:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mark H Weaver, guile-devel

On Thu 28 Feb 2013 12:04, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> On Thu 28 Feb 2013 04:24, Mark H Weaver <mhw@netris.org> writes:
>>
>>> Instead of having 'input_cd' and 'output_cd' point directly to the
>>> platform's iconv_t structures, let's have them point to our own internal
>>> structure(s) that hold the needed transcoder state.  This could include
>>> things like the state for internally-implement encoding(s) (e.g. UTF-8
>>> BOM handling), EOL style, and iconv_t pointer(s) if appropriate.
>>
>> Great idea.
>>
>> I would call it a generic "port-private" data structure; it is not
>> limited to transcoder state.
>>
>>   struct scm_t_port_private;
>>   #define SCM_I_PORT_PRIVATE ((struct scm_t_port_private*)((ptob)->input_cd))
>
> Sounds good to me, perhaps with ‘input_cd’ changed to ‘internal’ or
> similar.

That would be an API break; probably best to keep it with the same name,
no?  Though I guess it doesn't matter in this particular case...

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-28 13:09         ` Andy Wingo
@ 2013-03-01  9:03           ` Ludovic Courtès
  0 siblings, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2013-03-01  9:03 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Mark H Weaver, guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Thu 28 Feb 2013 12:04, ludo@gnu.org (Ludovic Courtès) writes:
>
>> Andy Wingo <wingo@pobox.com> skribis:
>>
>>> On Thu 28 Feb 2013 04:24, Mark H Weaver <mhw@netris.org> writes:
>>>
>>>> Instead of having 'input_cd' and 'output_cd' point directly to the
>>>> platform's iconv_t structures, let's have them point to our own internal
>>>> structure(s) that hold the needed transcoder state.  This could include
>>>> things like the state for internally-implement encoding(s) (e.g. UTF-8
>>>> BOM handling), EOL style, and iconv_t pointer(s) if appropriate.
>>>
>>> Great idea.
>>>
>>> I would call it a generic "port-private" data structure; it is not
>>> limited to transcoder state.
>>>
>>>   struct scm_t_port_private;
>>>   #define SCM_I_PORT_PRIVATE ((struct scm_t_port_private*)((ptob)->input_cd))
>>
>> Sounds good to me, perhaps with ‘input_cd’ changed to ‘internal’ or
>> similar.
>
> That would be an API break; probably best to keep it with the same name,
> no?  Though I guess it doesn't matter in this particular case...

I’d say it doesn’t matter in this case because I can hardly imagine
someone doing something useful with ‘input_cd’ outside of Guile.

Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Adding new information to scm_t_port (was Re: always O_BINARY?)
  2013-02-28  8:53     ` Andy Wingo
  2013-02-28 11:04       ` Ludovic Courtès
@ 2013-03-05 18:55       ` Mark H Weaver
  2013-03-27 20:00         ` [PATCH] Add private port structure, and move iconv descriptors there Mark H Weaver
  1 sibling, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2013-03-05 18:55 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Andy Wingo <wingo@pobox.com> writes:

> On Thu 28 Feb 2013 04:24, Mark H Weaver <mhw@netris.org> writes:
>
>> Instead of having 'input_cd' and 'output_cd' point directly to the
>> platform's iconv_t structures, let's have them point to our own internal
>> structure(s) that hold the needed transcoder state.  This could include
>> things like the state for internally-implement encoding(s) (e.g. UTF-8
>> BOM handling), EOL style, and iconv_t pointer(s) if appropriate.
>
> Great idea.
>
> I would call it a generic "port-private" data structure; it is not
> limited to transcoder state.
>
>   struct scm_t_port_private;
>   #define SCM_I_PORT_PRIVATE ((struct scm_t_port_private*)((ptob)->input_cd))

Sounds good to me.  I'll include something like this in my next revision
of BOM-handling patches, which I hope to post soon.

    Thanks,
      Mark



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-05 18:55       ` Mark H Weaver
@ 2013-03-27 20:00         ` Mark H Weaver
  2013-03-27 20:28           ` Ludovic Courtès
  2013-03-27 20:51           ` Andy Wingo
  0 siblings, 2 replies; 25+ messages in thread
From: Mark H Weaver @ 2013-03-27 20:00 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 315 bytes --]

Hello all,

Here's a patch to add a private port structure.  I moved both input_cd
and output_cd there.  I plan to create more patches on top of this very
soon (BOM handling, more efficient per-port read options, maybe better
EOF handling), so a prompt review would be very helpful.

What do you think?

    Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: [PATCH] Add private port structure, and move the iconv descriptors there --]
[-- Type: text/x-diff, Size: 12442 bytes --]

From 47aadc79423200b226d7592ea6bbeb788f1c5bef Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Wed, 27 Mar 2013 15:41:45 -0400
Subject: [PATCH] Add private port structure, and move the iconv descriptors
 there.

* libguile/private-ports.h: New file.

* libguile/Makefile.am (noinst_HEADERS): Add private-ports.h.

* libguile/ports.h (scm_t_port): Replace 'input_cd' and 'output_cd'
  fields with 'internal' and 'reserved_for_future_use'.

* libguile/ports.c: Include private-ports.h.
  (finalize_port, scm_i_remove_port, get_iconv_codepoint): Access
  'input_cd' and 'output_cd' via 'internal' pointer.
  (scm_new_port_table_entry): Initialize new internal fields.
  Access 'input_cd' and 'output_cd' via 'internal' pointer.
  (get_codepoint): Use 'internal->encoding_type' to detect if the
  encoders are initialized and which encoding method to use.
  (scm_i_set_port_encoding_x): Set the 'internal->encoding_type' field
  appropriately.  Access 'input_cd' and 'output_cd' via 'internal'
  pointer.

* libguile/print.c: Include private-ports.h.
  (display_string_using_iconv): Access 'input_cd' and 'output_cd' via
  'internal' pointer.
  (display_string): Use 'internal->encoding_type' to detect if the
  encoders are initialized and which encoding method to use.
---
 libguile/Makefile.am     |    2 +-
 libguile/ports.c         |   79 +++++++++++++++++++++++++++++++---------------
 libguile/ports.h         |    7 ++--
 libguile/print.c         |   26 +++++++++------
 libguile/private-ports.h |   41 ++++++++++++++++++++++++
 5 files changed, 116 insertions(+), 39 deletions(-)
 create mode 100644 libguile/private-ports.h

diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index 4b1f96b..6eaca76 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -455,7 +455,7 @@ noinst_HEADERS = conv-integer.i.c conv-uinteger.i.c		\
                  srfi-14.i.c					\
                  quicksort.i.c                                  \
                  win32-uname.h					\
-		 private-gc.h private-options.h
+		 private-gc.h private-options.h private-ports.h
 
 # vm instructions
 noinst_HEADERS += vm-engine.c vm-i-system.c vm-i-scheme.c vm-i-loader.c
diff --git a/libguile/ports.c b/libguile/ports.c
index 8737a76..45e770d 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -60,6 +60,8 @@
 #include "libguile/fluids.h"
 #include "libguile/eq.h"
 
+#include "libguile/private-ports.h"
+
 #ifdef HAVE_STRING_H
 #include <string.h>
 #endif
@@ -589,10 +591,10 @@ finalize_port (void *ptr, void *data)
 
 	  entry = SCM_PTAB_ENTRY (port);
 
-	  if (entry->input_cd != (iconv_t) -1)
-	    iconv_close (entry->input_cd);
-	  if (entry->output_cd != (iconv_t) -1)
-	    iconv_close (entry->output_cd);
+	  if (entry->internal->input_cd != (iconv_t) -1)
+	    iconv_close (entry->internal->input_cd);
+	  if (entry->internal->output_cd != (iconv_t) -1)
+	    iconv_close (entry->internal->output_cd);
 
 	  SCM_SETSTREAM (port, 0);
 	  SCM_CLR_PORT_OPEN_FLAG (port);
@@ -629,9 +631,15 @@ scm_new_port_table_entry (scm_t_bits tag)
   enc = scm_i_default_port_encoding ();
   entry->encoding = enc ? scm_gc_strdup (enc, "port") : NULL;
 
+  /* Initialize private fields.  */
+  entry->reserved_for_future_use = NULL;
+  entry->internal = (struct scm_t_port_private *)
+    scm_gc_calloc (sizeof (struct scm_t_port_private), "port-private");
+  entry->internal->encoding_type = SCM_PORT_ENCODING_TYPE_UNINITIALIZED;
+
   /* The conversion descriptors will be opened lazily.  */
-  entry->input_cd = (iconv_t) -1;
-  entry->output_cd = (iconv_t) -1;
+  entry->internal->input_cd = (iconv_t) -1;
+  entry->internal->output_cd = (iconv_t) -1;
 
   entry->ilseq_handler = scm_i_default_port_conversion_handler ();
 
@@ -686,18 +694,20 @@ scm_i_remove_port (SCM port)
   p->putback_buf = NULL;
   p->putback_buf_size = 0;
 
-  if (p->input_cd != (iconv_t) -1)
+  if (p->internal->input_cd != (iconv_t) -1)
     {
-      iconv_close (p->input_cd);
-      p->input_cd = (iconv_t) -1;
+      iconv_close (p->internal->input_cd);
+      p->internal->input_cd = (iconv_t) -1;
     }
   
-  if (p->output_cd != (iconv_t) -1)
+  if (p->internal->output_cd != (iconv_t) -1)
     {
-      iconv_close (p->output_cd);
-      p->output_cd = (iconv_t) -1;
+      iconv_close (p->internal->output_cd);
+      p->internal->output_cd = (iconv_t) -1;
     }
 
+  p->internal = NULL;
+
   SCM_SETPTAB_ENTRY (port, 0);
 
   scm_hashq_remove_x (scm_i_port_weak_hash, port);
@@ -1297,12 +1307,14 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
 		     char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
 {
   scm_t_port *pt;
+  struct scm_t_port_private *pti;
   int err, byte_read;
   size_t bytes_consumed, output_size;
   char *output;
   scm_t_uint8 utf8_buf[SCM_MBCHAR_BUF_SIZE];
 
   pt = SCM_PTAB_ENTRY (port);
+  pti = pt->internal;
 
   for (output_size = 0, output = (char *) utf8_buf,
 	 bytes_consumed = 0, err = 0;
@@ -1332,7 +1344,7 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
       input_left = bytes_consumed + 1;
       output_left = sizeof (utf8_buf);
 
-      done = iconv (pt->input_cd, &input, &input_left,
+      done = iconv (pti->input_cd, &input, &input_left,
 		    &output, &output_left);
       if (done == (size_t) -1)
 	{
@@ -1369,15 +1381,22 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
   int err;
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
 
-  if (pt->input_cd == (iconv_t) -1)
+  if (pt->internal->encoding_type == SCM_PORT_ENCODING_TYPE_UNINITIALIZED)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
-  /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->input_cd == (iconv_t) -1)
-    err = get_utf8_codepoint (port, codepoint, (scm_t_uint8 *) buf, len);
-  else
-    err = get_iconv_codepoint (port, codepoint, buf, len);
+  switch (pt->internal->encoding_type)
+    {
+    case SCM_PORT_ENCODING_TYPE_ICONV:
+      err = get_iconv_codepoint (port, codepoint, buf, len);
+      break;
+    case SCM_PORT_ENCODING_TYPE_UTF8:
+      err = get_utf8_codepoint (port, codepoint, (scm_t_uint8 *) buf, len);
+      break;
+    case SCM_PORT_ENCODING_TYPE_UNINITIALIZED:
+    default:
+      scm_syserror ("get_codepoint");
+    }
 
   if (SCM_LIKELY (err == 0))
     update_port_lf (*codepoint, port);
@@ -2208,13 +2227,16 @@ void
 scm_i_set_port_encoding_x (SCM port, const char *encoding)
 {
   scm_t_port *pt;
+  struct scm_t_port_private *pti;
   iconv_t new_input_cd, new_output_cd;
+  enum scm_t_port_encoding_type new_encoding_type;
 
   new_input_cd = (iconv_t) -1;
   new_output_cd = (iconv_t) -1;
 
   /* Set the character encoding for this port.  */
   pt = SCM_PTAB_ENTRY (port);
+  pti = pt->internal;
 
   if (encoding == NULL)
     encoding = "ISO-8859-1";
@@ -2225,8 +2247,12 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
   /* If ENCODING is UTF-8, then no conversion descriptor is opened
      because we do I/O ourselves.  This saves 100+ KiB for each
      descriptor.  */
-  if (strcmp (encoding, "UTF-8"))
+  if (!strcmp (encoding, "UTF-8"))
+    new_encoding_type = SCM_PORT_ENCODING_TYPE_UTF8;
+  else
     {
+      new_encoding_type = SCM_PORT_ENCODING_TYPE_ICONV;
+
       if (SCM_CELL_WORD_0 (port) & SCM_RDNG)
 	{
 	  /* Open an input iconv conversion descriptor, from ENCODING
@@ -2251,13 +2277,14 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
 	}
     }
 
-  if (pt->input_cd != (iconv_t) -1)
-    iconv_close (pt->input_cd);
-  if (pt->output_cd != (iconv_t) -1)
-    iconv_close (pt->output_cd);
+  if (pti->input_cd != (iconv_t) -1)
+    iconv_close (pti->input_cd);
+  if (pti->output_cd != (iconv_t) -1)
+    iconv_close (pti->output_cd);
 
-  pt->input_cd = new_input_cd;
-  pt->output_cd = new_output_cd;
+  pti->input_cd = new_input_cd;
+  pti->output_cd = new_output_cd;
+  pti->encoding_type = new_encoding_type;
 
   return;
 
diff --git a/libguile/ports.h b/libguile/ports.h
index d4d59b7..8aa29ee 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -43,6 +43,8 @@ typedef enum scm_t_port_rw_active {
   SCM_PORT_WRITE = 2
 } scm_t_port_rw_active;
 
+struct scm_t_port_private;
+
 /* C representation of a Scheme port.  */
 
 typedef struct 
@@ -112,9 +114,8 @@ typedef struct
   unsigned char *putback_buf;
   size_t putback_buf_size;        /* allocated size of putback_buf.  */
 
-  /* input/output iconv conversion descriptors */
-  void *input_cd;
-  void *output_cd;
+  struct scm_t_port_private *internal;
+  void *reserved_for_future_use;
 } scm_t_port;
 
 
diff --git a/libguile/print.c b/libguile/print.c
index 647eed8..51f9f01 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -56,6 +56,7 @@
 #include "libguile/print.h"
 
 #include "libguile/private-options.h"
+#include "libguile/private-ports.h"
 
 \f
 
@@ -880,8 +881,10 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
 {
   size_t printed;
   scm_t_port *pt;
+  struct scm_t_port_private *pti;
 
   pt = SCM_PTAB_ENTRY (port);
+  pti = pt->internal;
 
   printed = 0;
 
@@ -910,7 +913,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
       output = encoded_output;
       output_left = sizeof (encoded_output);
 
-      done = iconv (pt->output_cd, &input, &input_left,
+      done = iconv (pti->output_cd, &input, &input_left,
 		    &output, &output_left);
 
       output_len = sizeof (encoded_output) - output_left;
@@ -920,7 +923,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
           int errno_save = errno;
 
 	  /* Reset the `iconv' state.  */
-	  iconv (pt->output_cd, NULL, NULL, NULL, NULL);
+	  iconv (pti->output_cd, NULL, NULL, NULL, NULL);
 
 	  /* Print the OUTPUT_LEN bytes successfully converted.  */
 	  scm_lfwrite (encoded_output, output_len, port);
@@ -984,16 +987,21 @@ display_string (const void *str, int narrow_p,
 
   pt = SCM_PTAB_ENTRY (port);
 
-  if (pt->output_cd == (iconv_t) -1)
+  if (pt->internal->encoding_type == SCM_PORT_ENCODING_TYPE_UNINITIALIZED)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
-  /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->output_cd == (iconv_t) -1)
-    return display_string_as_utf8 (str, narrow_p, len, port);
-  else
-    return display_string_using_iconv (str, narrow_p, len,
-				       port, strategy);
+  switch (pt->internal->encoding_type)
+    {
+    case SCM_PORT_ENCODING_TYPE_ICONV:
+      return display_string_using_iconv (str, narrow_p, len,
+                                         port, strategy);
+    case SCM_PORT_ENCODING_TYPE_UTF8:
+      return display_string_as_utf8 (str, narrow_p, len, port);
+    case SCM_PORT_ENCODING_TYPE_UNINITIALIZED:
+    default:
+      scm_syserror ("display_string");
+    }
 }
 
 /* Attempt to display CH to PORT according to STRATEGY.  Return non-zero
diff --git a/libguile/private-ports.h b/libguile/private-ports.h
new file mode 100644
index 0000000..369c4a5
--- /dev/null
+++ b/libguile/private-ports.h
@@ -0,0 +1,41 @@
+/*
+ * private-ports.h - private declarations for ports.
+ * 
+ * Copyright (C) 2013 Free Software Foundation, Inc.
+ * 
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef SCM_PRIVATE_PORTS
+#define SCM_PRIVATE_PORTS
+
+#include "libguile/_scm.h"
+#include "libguile/ports.h"
+
+enum scm_t_port_encoding_type {
+  SCM_PORT_ENCODING_TYPE_UNINITIALIZED = 0,
+  SCM_PORT_ENCODING_TYPE_ICONV = 1,
+  SCM_PORT_ENCODING_TYPE_UTF8 = 2
+};
+
+struct scm_t_port_private
+{
+  enum scm_t_port_encoding_type encoding_type;
+  void *input_cd;
+  void *output_cd;
+};
+
+#endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-27 20:00         ` [PATCH] Add private port structure, and move iconv descriptors there Mark H Weaver
@ 2013-03-27 20:28           ` Ludovic Courtès
  2013-03-27 20:51           ` Andy Wingo
  1 sibling, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2013-03-27 20:28 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Hi!

Mark H Weaver <mhw@netris.org> skribis:

> Here's a patch to add a private port structure.  I moved both input_cd
> and output_cd there.  I plan to create more patches on top of this very
> soon (BOM handling, more efficient per-port read options, maybe better
> EOF handling), so a prompt review would be very helpful.

Looks good to me, and definitely an improvement.

Nitpicking:

> +struct scm_t_port_private;

Please add a short comment above.

> +  struct scm_t_port_private *internal;
> +  void *reserved_for_future_use;
>  } scm_t_port;

Likewise.

> +enum scm_t_port_encoding_type {

Likewise.

Also, brace on the next line.

I’d be tempted to remove ‘_t’ from the name since it’s a tag.

> +struct scm_t_port_private

Likewise.

> +  enum scm_t_port_encoding_type encoding_type;
> +  void *input_cd;
> +  void *output_cd;
> +};

Ideally a comment saying what the fields represent.

Thanks!

Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-27 20:00         ` [PATCH] Add private port structure, and move iconv descriptors there Mark H Weaver
  2013-03-27 20:28           ` Ludovic Courtès
@ 2013-03-27 20:51           ` Andy Wingo
  2013-03-27 21:11             ` Mark H Weaver
                               ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Andy Wingo @ 2013-03-27 20:51 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

Hi,

On Wed 27 Mar 2013 21:00, Mark H Weaver <mhw@netris.org> writes:

> index 8737a76..45e770d 100644
> --- a/libguile/ports.c
> +++ b/libguile/ports.c
> @@ -60,6 +60,8 @@
>  #include "libguile/fluids.h"
>  #include "libguile/eq.h"
>  
> +#include "libguile/private-ports.h"
> +
>  #ifdef HAVE_STRING_H
>  #include <string.h>
>  #endif
> @@ -589,10 +591,10 @@ finalize_port (void *ptr, void *data)
>  
>  	  entry = SCM_PTAB_ENTRY (port);
>  
> -	  if (entry->input_cd != (iconv_t) -1)
> -	    iconv_close (entry->input_cd);
> -	  if (entry->output_cd != (iconv_t) -1)
> -	    iconv_close (entry->output_cd);
> +	  if (entry->internal->input_cd != (iconv_t) -1)
> +	    iconv_close (entry->internal->input_cd);
> +	  if (entry->internal->output_cd != (iconv_t) -1)
> +	    iconv_close (entry->internal->output_cd);
>  
>  	  SCM_SETSTREAM (port, 0);
>  	  SCM_CLR_PORT_OPEN_FLAG (port);

I would appreciate a symmetry between the name of the internal member
and the name of the header.  WDYT about "internal" in both?

I also think that "ports-internal.h" is a better name than
"internal-ports.h".

> +  entry->internal = (struct scm_t_port_private *)
> +    scm_gc_calloc (sizeof (struct scm_t_port_private), "port-private");

Would be nice to have a macro to allocate structs, with the correct
type.  Like

#define scm_gc_typed_calloc(t) (((t)*) scm_gc_calloc (sizeof (t), #t))

> @@ -1297,12 +1307,14 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
>  		     char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
>  {
>    scm_t_port *pt;
> +  struct scm_t_port_private *pti;

Indeed here the name belies your intention ;) Seems you want to call the
type scm_t_port_internal.  (Though, do we need _t_ in struct tags?
Seems redundant, given that struct tags are a separate namespace.
Dunno)

> @@ -1332,7 +1344,7 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
>        input_left = bytes_consumed + 1;
>        output_left = sizeof (utf8_buf);
>  
> -      done = iconv (pt->input_cd, &input, &input_left,
> +      done = iconv (pti->input_cd, &input, &input_left,
>  		    &output, &output_left);

Would be nice to fix up tabs here

> @@ -1369,15 +1381,22 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
>    int err;
>    scm_t_port *pt = SCM_PTAB_ENTRY (port);
>  
> -  if (pt->input_cd == (iconv_t) -1)
> +  if (pt->internal->encoding_type == SCM_PORT_ENCODING_TYPE_UNINITIALIZED)

Here I would appreciate consonance with the SCM_PORT_ENCODING_MODE in
`master'.

> +  struct scm_t_port_private *pti;
>    iconv_t new_input_cd, new_output_cd;
> +  enum scm_t_port_encoding_type new_encoding_type;

In general we use typedefs for enums in guile, fwiw

So currently this patch is not OK for me, because of the interaction
with master.  Can you take a look at porting
6c98257f2ead0855f218369ea7f9a823cdb9727e?

This work you are doing will not be easy to merge forward.  If you are
willing to take care of that, I think I will be OK with something going
into stable-2.0 after this review is over.  If not, I would prefer some
other solution (for example one of the 3 or 4 BOM-related patches I made
that you rejected ;-)

Thanks,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-27 20:51           ` Andy Wingo
@ 2013-03-27 21:11             ` Mark H Weaver
  2013-03-31  7:52             ` [PATCH] Add internal-only port structure; " Mark H Weaver
  2013-03-31 19:44             ` [PATCH] Add private port structure, and " Mark H Weaver
  2 siblings, 0 replies; 25+ messages in thread
From: Mark H Weaver @ 2013-03-27 21:11 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Hi Andy,

Thanks for the quick review.  I agree with you on all points.
I'll work on a new patch that's more harmonious with master.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-03-27 20:51           ` Andy Wingo
  2013-03-27 21:11             ` Mark H Weaver
@ 2013-03-31  7:52             ` Mark H Weaver
  2013-03-31 13:20               ` Ludovic Courtès
  2013-04-01 18:57               ` Andy Wingo
  2013-03-31 19:44             ` [PATCH] Add private port structure, and " Mark H Weaver
  2 siblings, 2 replies; 25+ messages in thread
From: Mark H Weaver @ 2013-03-31  7:52 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

Here's a new start at adding an internal-only port structure to 2.0.

I've come to the conclusion that it is not safe to modify 'scm_t_port'
in 2.0 at all; not even to change the member names.  In brief, the
reason has to do with the C11 standard definition of "compatible types",
which ties into the strict aliasing rules.  Section 6.2.7 of C11 spells
out what it means for two structures declared in separate translation
units to be compatible, and among other things their member names must
be the same.

So this patch leaves 'scm_t_port' completely unchanged, but 'input_cd'
is repurposed to be a pointer to the internal-only structure, and
'output_cd' becomes unused.  However, this ugly detail is evident in
only two places: in 'scm_new_port_table_entry' where the 'scm_t_port' is
allocated, and in the new 'SCM_INTERNAL_PTAB_ENTRY' macro that converts
a port (SCM value) to a (scm_t_port_internal *).

The idea is that by accessing the public and private port structures via
the two macros 'SCM_PTAB_ENTRY' and 'SCM_INTERNAL_PTAB_ENTRY', we give
ourselves plenty of flexibility to change the way these structures are
accessed in 2.2, while keeping most of the code identical.

This patch is more limited in scope than the last one.  It does not
attempt to deal with the SCM_PORT_ENCODING_MODE.  That will be in a
subsequent patch.  This patch only adds the internal structure and moves
the iconv descriptors there.

Is this a reasonable start?  Any suggestions before I proceed?

     Thanks,
       Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: [PATCH 1/2] Add 'scm_gc_typed_calloc' macro --]
[-- Type: text/x-diff, Size: 745 bytes --]

From aeef3dc501dd598e5b3dea3fc5d1552a032d3917 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 30 Mar 2013 23:09:01 -0400
Subject: [PATCH 1/2] Add 'scm_gc_typed_calloc' macro.

* libguile/gc.h (scm_gc_typed_calloc): New macro.
---
 libguile/gc.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/libguile/gc.h b/libguile/gc.h
index 9f00e01..c6f7a4b 100644
--- a/libguile/gc.h
+++ b/libguile/gc.h
@@ -207,6 +207,7 @@ SCM_API char *scm_gc_strdup (const char *str, const char *what)
 SCM_API char *scm_gc_strndup (const char *str, size_t n, const char *what)
   SCM_MALLOC;
 
+#define scm_gc_typed_calloc(t) ((t *) scm_gc_calloc (sizeof (t), #t))
 
 #ifdef BUILDING_LIBGUILE
 #include "libguile/bdw-gc.h"
-- 
1.7.10.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: [PATCH 2/2] Add internal-only port structure and move iconv descriptors there --]
[-- Type: text/x-diff, Size: 12406 bytes --]

From e11e069a6ae7da1f7a280e8c1936a61bc9119625 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 30 Mar 2013 22:34:56 -0400
Subject: [PATCH 2/2] Add internal-only port structure and move iconv
 descriptors there.

* libguile/ports-internal.h: New file.

* libguile/Makefile.am (noinst_HEADERS): Add ports-internal.h.

* libguile/ports.h (scm_t_port): Add a comment mentioning that the
  'input_cd' and 'output_cd' fields of the public structure are no
  longer what they seem to be.

* libguile/ports.c: Include ports-internal.h.

  (finalize_port, scm_i_remove_port, get_iconv_codepoint, get_codepoint,
  scm_i_set_port_encoding_x): Access 'input_cd' and 'output_cd' via the
  new internal port structure.

  (scm_new_port_table_entry): Allocate and initialize the internal port
  structure.

* libguile/print.c: Include ports-internal.h.

  (display_string_using_iconv, display_string): Access 'input_cd' and
  'output_cd' via 'internal' pointer.
---
 libguile/Makefile.am      |    2 +-
 libguile/ports-internal.h |   38 ++++++++++++++++++++++++++
 libguile/ports.c          |   65 +++++++++++++++++++++++++++------------------
 libguile/ports.h          |   19 ++++++++++---
 libguile/print.c          |   15 ++++++-----
 5 files changed, 103 insertions(+), 36 deletions(-)
 create mode 100644 libguile/ports-internal.h

diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index 4b1f96b..0c4b6ce 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -455,7 +455,7 @@ noinst_HEADERS = conv-integer.i.c conv-uinteger.i.c		\
                  srfi-14.i.c					\
                  quicksort.i.c                                  \
                  win32-uname.h					\
-		 private-gc.h private-options.h
+		 private-gc.h private-options.h ports-internal.h
 
 # vm instructions
 noinst_HEADERS += vm-engine.c vm-i-system.c vm-i-scheme.c vm-i-loader.c
diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
new file mode 100644
index 0000000..b4daff3
--- /dev/null
+++ b/libguile/ports-internal.h
@@ -0,0 +1,38 @@
+/*
+ * ports-internal.h - internal-only declarations for ports.
+ *
+ * Copyright (C) 2013 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef SCM_PORTS_INTERNAL
+#define SCM_PORTS_INTERNAL
+
+#include "libguile/_scm.h"
+#include "libguile/ports.h"
+
+typedef struct scm_t_port_internal
+{
+  /* input/output iconv conversion descriptors */
+  void *input_cd;
+  void *output_cd;
+} scm_t_port_internal;
+
+#define SCM_INTERNAL_PTAB_ENTRY(x)                              \
+  ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
+
+#endif
diff --git a/libguile/ports.c b/libguile/ports.c
index 8737a76..86db4a0 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -55,6 +55,7 @@
 #include "libguile/mallocs.h"
 #include "libguile/validate.h"
 #include "libguile/ports.h"
+#include "libguile/ports-internal.h"
 #include "libguile/vectors.h"
 #include "libguile/weaks.h"
 #include "libguile/fluids.h"
@@ -576,7 +577,7 @@ finalize_port (void *ptr, void *data)
 	register_finalizer_for_port (port);
       else
 	{
-	  scm_t_port *entry;
+          scm_t_port_internal *pti;
 
 	  port_type = SCM_TC2PTOBNUM (SCM_CELL_TYPE (port));
 	  if (port_type >= scm_numptob)
@@ -587,12 +588,12 @@ finalize_port (void *ptr, void *data)
 	       is for explicit `close-port' by user.  */
 	    scm_ptobs[port_type].free (port);
 
-	  entry = SCM_PTAB_ENTRY (port);
+	  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-	  if (entry->input_cd != (iconv_t) -1)
-	    iconv_close (entry->input_cd);
-	  if (entry->output_cd != (iconv_t) -1)
-	    iconv_close (entry->output_cd);
+	  if (pti->input_cd != (iconv_t) -1)
+	    iconv_close (pti->input_cd);
+	  if (pti->output_cd != (iconv_t) -1)
+	    iconv_close (pti->output_cd);
 
 	  SCM_SETSTREAM (port, 0);
 	  SCM_CLR_PORT_OPEN_FLAG (port);
@@ -617,7 +618,8 @@ scm_new_port_table_entry (scm_t_bits tag)
    */
   
   SCM z = scm_cons (SCM_EOL, SCM_EOL);
-  scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), "port");
+  scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
+  scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
   const char *enc;
 
   entry->file_name = SCM_BOOL_F;
@@ -630,11 +632,17 @@ scm_new_port_table_entry (scm_t_bits tag)
   entry->encoding = enc ? scm_gc_strdup (enc, "port") : NULL;
 
   /* The conversion descriptors will be opened lazily.  */
-  entry->input_cd = (iconv_t) -1;
-  entry->output_cd = (iconv_t) -1;
+  pti->input_cd = (iconv_t) -1;
+  pti->output_cd = (iconv_t) -1;
 
   entry->ilseq_handler = scm_i_default_port_conversion_handler ();
 
+  /* XXX These fields are not what they seem.  They have been
+     repurposed, but cannot safely be renamed in 2.0 without breaking
+     ABI compatibility.  This will be cleaned up in 2.2.  */
+  entry->input_cd = pti;   /* XXX pointer to the internal port structure */
+  entry->output_cd = NULL; /* XXX unused */
+
   SCM_SET_CELL_TYPE (z, tag);
   SCM_SETPTAB_ENTRY (z, entry);
 
@@ -678,24 +686,26 @@ scm_i_remove_port (SCM port)
 #define FUNC_NAME "scm_remove_port"
 {
   scm_t_port *p;
+  scm_t_port_internal *pti;
 
   scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
 
   p = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
   scm_port_non_buffer (p);
   p->putback_buf = NULL;
   p->putback_buf_size = 0;
 
-  if (p->input_cd != (iconv_t) -1)
+  if (pti->input_cd != (iconv_t) -1)
     {
-      iconv_close (p->input_cd);
-      p->input_cd = (iconv_t) -1;
+      iconv_close (pti->input_cd);
+      pti->input_cd = (iconv_t) -1;
     }
   
-  if (p->output_cd != (iconv_t) -1)
+  if (pti->output_cd != (iconv_t) -1)
     {
-      iconv_close (p->output_cd);
-      p->output_cd = (iconv_t) -1;
+      iconv_close (pti->output_cd);
+      pti->output_cd = (iconv_t) -1;
     }
 
   SCM_SETPTAB_ENTRY (port, 0);
@@ -1296,13 +1306,13 @@ static int
 get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
 		     char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
 {
-  scm_t_port *pt;
+  scm_t_port_internal *pti;
   int err, byte_read;
   size_t bytes_consumed, output_size;
   char *output;
   scm_t_uint8 utf8_buf[SCM_MBCHAR_BUF_SIZE];
 
-  pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   for (output_size = 0, output = (char *) utf8_buf,
 	 bytes_consumed = 0, err = 0;
@@ -1332,7 +1342,7 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
       input_left = bytes_consumed + 1;
       output_left = sizeof (utf8_buf);
 
-      done = iconv (pt->input_cd, &input, &input_left,
+      done = iconv (pti->input_cd, &input, &input_left,
 		    &output, &output_left);
       if (done == (size_t) -1)
 	{
@@ -1368,13 +1378,14 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
 {
   int err;
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
+  scm_t_port_internal *pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pt->input_cd == (iconv_t) -1)
+  if (pti->input_cd == (iconv_t) -1)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
   /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->input_cd == (iconv_t) -1)
+  if (pti->input_cd == (iconv_t) -1)
     err = get_utf8_codepoint (port, codepoint, (scm_t_uint8 *) buf, len);
   else
     err = get_iconv_codepoint (port, codepoint, buf, len);
@@ -2208,6 +2219,7 @@ void
 scm_i_set_port_encoding_x (SCM port, const char *encoding)
 {
   scm_t_port *pt;
+  scm_t_port_internal *pti;
   iconv_t new_input_cd, new_output_cd;
 
   new_input_cd = (iconv_t) -1;
@@ -2215,6 +2227,7 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
 
   /* Set the character encoding for this port.  */
   pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   if (encoding == NULL)
     encoding = "ISO-8859-1";
@@ -2251,13 +2264,13 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
 	}
     }
 
-  if (pt->input_cd != (iconv_t) -1)
-    iconv_close (pt->input_cd);
-  if (pt->output_cd != (iconv_t) -1)
-    iconv_close (pt->output_cd);
+  if (pti->input_cd != (iconv_t) -1)
+    iconv_close (pti->input_cd);
+  if (pti->output_cd != (iconv_t) -1)
+    iconv_close (pti->output_cd);
 
-  pt->input_cd = new_input_cd;
-  pt->output_cd = new_output_cd;
+  pti->input_cd = new_input_cd;
+  pti->output_cd = new_output_cd;
 
   return;
 
diff --git a/libguile/ports.h b/libguile/ports.h
index d4d59b7..27d59a2 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -112,9 +112,22 @@ typedef struct
   unsigned char *putback_buf;
   size_t putback_buf_size;        /* allocated size of putback_buf.  */
 
-  /* input/output iconv conversion descriptors */
-  void *input_cd;
-  void *output_cd;
+  /* IMPORTANT: 'input_cd' and 'output_cd' used to be pointers to the
+     input and output iconv descriptors, but those have been moved to
+     the internal-only port structure defined in ports-internal.h.
+
+     Given that we must preserve ABI compatibility in 2.0, we cannot
+     safely change this public structure without running afoul of C
+     strict aliasing rules.  We cannot even change the field names.
+
+     To work around this, in this public structure, 'input_cd' has been
+     repurposed to be a pointer to the internal port structure (see
+     ports-internal.h), and 'output_cd' is now unused.
+
+     This will be cleaned up in 2.2.  */
+
+  void *input_cd;   /* XXX actually a pointer to scm_t_port_internal */
+  void *output_cd;  /* XXX actually unused */
 } scm_t_port;
 
 
diff --git a/libguile/print.c b/libguile/print.c
index 647eed8..62d7267 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -45,6 +45,7 @@
 #include "libguile/alist.h"
 #include "libguile/struct.h"
 #include "libguile/ports.h"
+#include "libguile/ports-internal.h"
 #include "libguile/root.h"
 #include "libguile/strings.h"
 #include "libguile/strports.h"
@@ -879,9 +880,9 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
 			    scm_t_string_failed_conversion_handler strategy)
 {
   size_t printed;
-  scm_t_port *pt;
+  scm_t_port_internal *pti;
 
-  pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   printed = 0;
 
@@ -910,7 +911,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
       output = encoded_output;
       output_left = sizeof (encoded_output);
 
-      done = iconv (pt->output_cd, &input, &input_left,
+      done = iconv (pti->output_cd, &input, &input_left,
 		    &output, &output_left);
 
       output_len = sizeof (encoded_output) - output_left;
@@ -920,7 +921,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
           int errno_save = errno;
 
 	  /* Reset the `iconv' state.  */
-	  iconv (pt->output_cd, NULL, NULL, NULL, NULL);
+	  iconv (pti->output_cd, NULL, NULL, NULL, NULL);
 
 	  /* Print the OUTPUT_LEN bytes successfully converted.  */
 	  scm_lfwrite (encoded_output, output_len, port);
@@ -981,15 +982,17 @@ display_string (const void *str, int narrow_p,
 
 {
   scm_t_port *pt;
+  scm_t_port_internal *pti;
 
   pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pt->output_cd == (iconv_t) -1)
+  if (pti->output_cd == (iconv_t) -1)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
   /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->output_cd == (iconv_t) -1)
+  if (pti->output_cd == (iconv_t) -1)
     return display_string_as_utf8 (str, narrow_p, len, port);
   else
     return display_string_using_iconv (str, narrow_p, len,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-03-31  7:52             ` [PATCH] Add internal-only port structure; " Mark H Weaver
@ 2013-03-31 13:20               ` Ludovic Courtès
  2013-03-31 15:23                 ` Mark H Weaver
  2013-04-01 18:57               ` Andy Wingo
  1 sibling, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2013-03-31 13:20 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> I've come to the conclusion that it is not safe to modify 'scm_t_port'
> in 2.0 at all; not even to change the member names.  In brief, the
> reason has to do with the C11 standard definition of "compatible types",
> which ties into the strict aliasing rules.  Section 6.2.7 of C11 spells
> out what it means for two structures declared in separate translation
> units to be compatible, and among other things their member names must
> be the same.

I can’t imagine how changing the *name* of a member could change
something to the structure’s layout in practice.

I would be in favor of keeping the names ‘internal’ and ‘reserved’ from
your previous patch, but if you’re really convinced that this member
name thing has a practical effect, fine with me.

> Is this a reasonable start?  Any suggestions before I proceed?

Looks good to me!

> +#define scm_gc_typed_calloc(t) ((t *) scm_gc_calloc (sizeof (t), #t))

Not really convinced by this, but hey.  Ideally, this would need to go
in the manual too.

> +typedef struct scm_t_port_internal
> +{
> +  /* input/output iconv conversion descriptors */
> +  void *input_cd;
> +  void *output_cd;
> +} scm_t_port_internal;

Please define the struct tag and typedef separately.  Also, I’d remove
‘_t’ from the struct tag, as discussed before.

Thanks for your patience & thoroughness!  :-)

Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-03-31 13:20               ` Ludovic Courtès
@ 2013-03-31 15:23                 ` Mark H Weaver
  2013-03-31 22:06                   ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2013-03-31 15:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, guile-devel

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> I've come to the conclusion that it is not safe to modify 'scm_t_port'
>> in 2.0 at all; not even to change the member names.  In brief, the
>> reason has to do with the C11 standard definition of "compatible types",
>> which ties into the strict aliasing rules.  Section 6.2.7 of C11 spells
>> out what it means for two structures declared in separate translation
>> units to be compatible, and among other things their member names must
>> be the same.
>
> I can’t imagine how changing the *name* of a member could change
> something to the structure’s layout in practice.

It doesn't change the structure's layout.  However, it could cause
link-time optimization to break our code.  Basically, when linking
together objects built with the old scm_t_port and the new scm_t_port,
it's feasible that the compiler could compare the two structures by tag
name, member names, etc, to determine if they are "compatible types"
according to section 6.2.7.  If they are not compatible types, then the
linker is allowed to assume that pointers to these two distinct types
cannot alias each other, and make optimizations appropriately.  Of
course this is a simplification of the detailed rules (and TBH I haven't
yet taken the time to fully understand the rules myself), but from
reading C11, I can see a potential for real problems in practice.

>> +#define scm_gc_typed_calloc(t) ((t *) scm_gc_calloc (sizeof (t), #t))
>
> Not really convinced by this, but hey.  Ideally, this would need to go
> in the manual too.

Maybe talk to Andy about it?  It was his suggestion.

>> +typedef struct scm_t_port_internal
>> +{
>> +  /* input/output iconv conversion descriptors */
>> +  void *input_cd;
>> +  void *output_cd;
>> +} scm_t_port_internal;
>
> Please define the struct tag and typedef separately.  Also, I’d remove
> ‘_t’ from the struct tag, as discussed before.

Okay.  I was just following the conventions used in current Guile code,
e.g. in ports.h, but I'll change it as you suggest.

      Thanks,
        Mark



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-27 20:51           ` Andy Wingo
  2013-03-27 21:11             ` Mark H Weaver
  2013-03-31  7:52             ` [PATCH] Add internal-only port structure; " Mark H Weaver
@ 2013-03-31 19:44             ` Mark H Weaver
  2013-03-31 22:08               ` Ludovic Courtès
  2013-04-01 19:04               ` Andy Wingo
  2 siblings, 2 replies; 25+ messages in thread
From: Mark H Weaver @ 2013-03-31 19:44 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 331 bytes --]

Andy Wingo <wingo@pobox.com> writes:
> Can you take a look at porting
> 6c98257f2ead0855f218369ea7f9a823cdb9727e?

Okay, here's an updated patch set that's essentially a backport of that
commit from master, but with 'encoding_mode' and 'iconv_descriptors' put
in the internal-only port structure.

What do you think?

      Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: [PATCH 1/3] Add 'scm_gc_typed_calloc' macro --]
[-- Type: text/x-diff, Size: 745 bytes --]

From aeef3dc501dd598e5b3dea3fc5d1552a032d3917 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 30 Mar 2013 23:09:01 -0400
Subject: [PATCH 1/3] Add 'scm_gc_typed_calloc' macro.

* libguile/gc.h (scm_gc_typed_calloc): New macro.
---
 libguile/gc.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/libguile/gc.h b/libguile/gc.h
index 9f00e01..c6f7a4b 100644
--- a/libguile/gc.h
+++ b/libguile/gc.h
@@ -207,6 +207,7 @@ SCM_API char *scm_gc_strdup (const char *str, const char *what)
 SCM_API char *scm_gc_strndup (const char *str, size_t n, const char *what)
   SCM_MALLOC;
 
+#define scm_gc_typed_calloc(t) ((t *) scm_gc_calloc (sizeof (t), #t))
 
 #ifdef BUILDING_LIBGUILE
 #include "libguile/bdw-gc.h"
-- 
1.7.10.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: [PATCH 2/3] Add internal-only port structure; move iconv descriptors there --]
[-- Type: text/x-diff, Size: 12436 bytes --]

From 9eb48ea34c3f22f178cb1d889b93911283395a91 Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sat, 30 Mar 2013 22:34:56 -0400
Subject: [PATCH 2/3] Add internal-only port structure and move iconv
 descriptors there.

* libguile/ports-internal.h: New file.

* libguile/Makefile.am (noinst_HEADERS): Add ports-internal.h.

* libguile/ports.h (scm_t_port): Add a comment mentioning that the
  'input_cd' and 'output_cd' fields of the public structure are no
  longer what they seem to be.

* libguile/ports.c: Include ports-internal.h.

  (finalize_port, scm_i_remove_port, get_iconv_codepoint, get_codepoint,
  scm_i_set_port_encoding_x): Access 'input_cd' and 'output_cd' via the
  new internal port structure.

  (scm_new_port_table_entry): Allocate and initialize the internal port
  structure.

* libguile/print.c: Include ports-internal.h.

  (display_string_using_iconv, display_string): Access 'input_cd' and
  'output_cd' via 'internal' pointer.
---
 libguile/Makefile.am      |    2 +-
 libguile/ports-internal.h |   40 ++++++++++++++++++++++++++++
 libguile/ports.c          |   65 +++++++++++++++++++++++++++------------------
 libguile/ports.h          |   19 ++++++++++---
 libguile/print.c          |   15 ++++++-----
 5 files changed, 105 insertions(+), 36 deletions(-)
 create mode 100644 libguile/ports-internal.h

diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index 4b1f96b..0c4b6ce 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -455,7 +455,7 @@ noinst_HEADERS = conv-integer.i.c conv-uinteger.i.c		\
                  srfi-14.i.c					\
                  quicksort.i.c                                  \
                  win32-uname.h					\
-		 private-gc.h private-options.h
+		 private-gc.h private-options.h ports-internal.h
 
 # vm instructions
 noinst_HEADERS += vm-engine.c vm-i-system.c vm-i-scheme.c vm-i-loader.c
diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
new file mode 100644
index 0000000..eba91c8
--- /dev/null
+++ b/libguile/ports-internal.h
@@ -0,0 +1,40 @@
+/*
+ * ports-internal.h - internal-only declarations for ports.
+ *
+ * Copyright (C) 2013 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef SCM_PORTS_INTERNAL
+#define SCM_PORTS_INTERNAL
+
+#include "libguile/_scm.h"
+#include "libguile/ports.h"
+
+struct scm_port_internal
+{
+  /* input/output iconv conversion descriptors */
+  void *input_cd;
+  void *output_cd;
+};
+
+typedef struct scm_port_internal scm_t_port_internal;
+
+#define SCM_INTERNAL_PTAB_ENTRY(x)                              \
+  ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
+
+#endif
diff --git a/libguile/ports.c b/libguile/ports.c
index 8737a76..86db4a0 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -55,6 +55,7 @@
 #include "libguile/mallocs.h"
 #include "libguile/validate.h"
 #include "libguile/ports.h"
+#include "libguile/ports-internal.h"
 #include "libguile/vectors.h"
 #include "libguile/weaks.h"
 #include "libguile/fluids.h"
@@ -576,7 +577,7 @@ finalize_port (void *ptr, void *data)
 	register_finalizer_for_port (port);
       else
 	{
-	  scm_t_port *entry;
+          scm_t_port_internal *pti;
 
 	  port_type = SCM_TC2PTOBNUM (SCM_CELL_TYPE (port));
 	  if (port_type >= scm_numptob)
@@ -587,12 +588,12 @@ finalize_port (void *ptr, void *data)
 	       is for explicit `close-port' by user.  */
 	    scm_ptobs[port_type].free (port);
 
-	  entry = SCM_PTAB_ENTRY (port);
+	  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-	  if (entry->input_cd != (iconv_t) -1)
-	    iconv_close (entry->input_cd);
-	  if (entry->output_cd != (iconv_t) -1)
-	    iconv_close (entry->output_cd);
+	  if (pti->input_cd != (iconv_t) -1)
+	    iconv_close (pti->input_cd);
+	  if (pti->output_cd != (iconv_t) -1)
+	    iconv_close (pti->output_cd);
 
 	  SCM_SETSTREAM (port, 0);
 	  SCM_CLR_PORT_OPEN_FLAG (port);
@@ -617,7 +618,8 @@ scm_new_port_table_entry (scm_t_bits tag)
    */
   
   SCM z = scm_cons (SCM_EOL, SCM_EOL);
-  scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), "port");
+  scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
+  scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
   const char *enc;
 
   entry->file_name = SCM_BOOL_F;
@@ -630,11 +632,17 @@ scm_new_port_table_entry (scm_t_bits tag)
   entry->encoding = enc ? scm_gc_strdup (enc, "port") : NULL;
 
   /* The conversion descriptors will be opened lazily.  */
-  entry->input_cd = (iconv_t) -1;
-  entry->output_cd = (iconv_t) -1;
+  pti->input_cd = (iconv_t) -1;
+  pti->output_cd = (iconv_t) -1;
 
   entry->ilseq_handler = scm_i_default_port_conversion_handler ();
 
+  /* XXX These fields are not what they seem.  They have been
+     repurposed, but cannot safely be renamed in 2.0 without breaking
+     ABI compatibility.  This will be cleaned up in 2.2.  */
+  entry->input_cd = pti;   /* XXX pointer to the internal port structure */
+  entry->output_cd = NULL; /* XXX unused */
+
   SCM_SET_CELL_TYPE (z, tag);
   SCM_SETPTAB_ENTRY (z, entry);
 
@@ -678,24 +686,26 @@ scm_i_remove_port (SCM port)
 #define FUNC_NAME "scm_remove_port"
 {
   scm_t_port *p;
+  scm_t_port_internal *pti;
 
   scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
 
   p = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
   scm_port_non_buffer (p);
   p->putback_buf = NULL;
   p->putback_buf_size = 0;
 
-  if (p->input_cd != (iconv_t) -1)
+  if (pti->input_cd != (iconv_t) -1)
     {
-      iconv_close (p->input_cd);
-      p->input_cd = (iconv_t) -1;
+      iconv_close (pti->input_cd);
+      pti->input_cd = (iconv_t) -1;
     }
   
-  if (p->output_cd != (iconv_t) -1)
+  if (pti->output_cd != (iconv_t) -1)
     {
-      iconv_close (p->output_cd);
-      p->output_cd = (iconv_t) -1;
+      iconv_close (pti->output_cd);
+      pti->output_cd = (iconv_t) -1;
     }
 
   SCM_SETPTAB_ENTRY (port, 0);
@@ -1296,13 +1306,13 @@ static int
 get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
 		     char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
 {
-  scm_t_port *pt;
+  scm_t_port_internal *pti;
   int err, byte_read;
   size_t bytes_consumed, output_size;
   char *output;
   scm_t_uint8 utf8_buf[SCM_MBCHAR_BUF_SIZE];
 
-  pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   for (output_size = 0, output = (char *) utf8_buf,
 	 bytes_consumed = 0, err = 0;
@@ -1332,7 +1342,7 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
       input_left = bytes_consumed + 1;
       output_left = sizeof (utf8_buf);
 
-      done = iconv (pt->input_cd, &input, &input_left,
+      done = iconv (pti->input_cd, &input, &input_left,
 		    &output, &output_left);
       if (done == (size_t) -1)
 	{
@@ -1368,13 +1378,14 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
 {
   int err;
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
+  scm_t_port_internal *pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pt->input_cd == (iconv_t) -1)
+  if (pti->input_cd == (iconv_t) -1)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
   /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->input_cd == (iconv_t) -1)
+  if (pti->input_cd == (iconv_t) -1)
     err = get_utf8_codepoint (port, codepoint, (scm_t_uint8 *) buf, len);
   else
     err = get_iconv_codepoint (port, codepoint, buf, len);
@@ -2208,6 +2219,7 @@ void
 scm_i_set_port_encoding_x (SCM port, const char *encoding)
 {
   scm_t_port *pt;
+  scm_t_port_internal *pti;
   iconv_t new_input_cd, new_output_cd;
 
   new_input_cd = (iconv_t) -1;
@@ -2215,6 +2227,7 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
 
   /* Set the character encoding for this port.  */
   pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   if (encoding == NULL)
     encoding = "ISO-8859-1";
@@ -2251,13 +2264,13 @@ scm_i_set_port_encoding_x (SCM port, const char *encoding)
 	}
     }
 
-  if (pt->input_cd != (iconv_t) -1)
-    iconv_close (pt->input_cd);
-  if (pt->output_cd != (iconv_t) -1)
-    iconv_close (pt->output_cd);
+  if (pti->input_cd != (iconv_t) -1)
+    iconv_close (pti->input_cd);
+  if (pti->output_cd != (iconv_t) -1)
+    iconv_close (pti->output_cd);
 
-  pt->input_cd = new_input_cd;
-  pt->output_cd = new_output_cd;
+  pti->input_cd = new_input_cd;
+  pti->output_cd = new_output_cd;
 
   return;
 
diff --git a/libguile/ports.h b/libguile/ports.h
index d4d59b7..6791a5c 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -112,9 +112,22 @@ typedef struct
   unsigned char *putback_buf;
   size_t putback_buf_size;        /* allocated size of putback_buf.  */
 
-  /* input/output iconv conversion descriptors */
-  void *input_cd;
-  void *output_cd;
+  /* IMPORTANT: 'input_cd' and 'output_cd' used to be pointers to the
+     input and output iconv descriptors, but those have been moved to
+     the internal-only port structure defined in ports-internal.h.
+
+     Given that we must preserve ABI compatibility in 2.0, we cannot
+     safely change this public structure without running afoul of C
+     strict aliasing rules.  We cannot even change the member names.
+
+     To work around this, in this public structure, 'input_cd' has been
+     repurposed to be a pointer to the internal port structure (see
+     ports-internal.h), and 'output_cd' is now unused.
+
+     This will be cleaned up in 2.2.  */
+
+  void *input_cd;   /* XXX actually a pointer to scm_t_port_internal */
+  void *output_cd;  /* XXX actually unused */
 } scm_t_port;
 
 
diff --git a/libguile/print.c b/libguile/print.c
index 647eed8..62d7267 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -45,6 +45,7 @@
 #include "libguile/alist.h"
 #include "libguile/struct.h"
 #include "libguile/ports.h"
+#include "libguile/ports-internal.h"
 #include "libguile/root.h"
 #include "libguile/strings.h"
 #include "libguile/strports.h"
@@ -879,9 +880,9 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
 			    scm_t_string_failed_conversion_handler strategy)
 {
   size_t printed;
-  scm_t_port *pt;
+  scm_t_port_internal *pti;
 
-  pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
   printed = 0;
 
@@ -910,7 +911,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
       output = encoded_output;
       output_left = sizeof (encoded_output);
 
-      done = iconv (pt->output_cd, &input, &input_left,
+      done = iconv (pti->output_cd, &input, &input_left,
 		    &output, &output_left);
 
       output_len = sizeof (encoded_output) - output_left;
@@ -920,7 +921,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
           int errno_save = errno;
 
 	  /* Reset the `iconv' state.  */
-	  iconv (pt->output_cd, NULL, NULL, NULL, NULL);
+	  iconv (pti->output_cd, NULL, NULL, NULL, NULL);
 
 	  /* Print the OUTPUT_LEN bytes successfully converted.  */
 	  scm_lfwrite (encoded_output, output_len, port);
@@ -981,15 +982,17 @@ display_string (const void *str, int narrow_p,
 
 {
   scm_t_port *pt;
+  scm_t_port_internal *pti;
 
   pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pt->output_cd == (iconv_t) -1)
+  if (pti->output_cd == (iconv_t) -1)
     /* Initialize the conversion descriptors, if needed.  */
     scm_i_set_port_encoding_x (port, pt->encoding);
 
   /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pt->output_cd == (iconv_t) -1)
+  if (pti->output_cd == (iconv_t) -1)
     return display_string_as_utf8 (str, narrow_p, len, port);
   else
     return display_string_using_iconv (str, narrow_p, len,
-- 
1.7.10.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: [PATCH 3/3] Refactor port encoding modes: utf-8 and iconv --]
[-- Type: text/x-diff, Size: 15402 bytes --]

From b5304dfe27075784532a73ae2a587612ddc717fd Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Sun, 31 Mar 2013 15:23:32 -0400
Subject: [PATCH 3/3] Refactor port encoding modes: utf-8 and iconv

Based on 6c98257f2ead0855f218369ea7f9a823cdb9727e by Andy Wingo.

* libguile/ports-internal.h (struct scm_port_internal): Add a flag
  for the port encoding mode: UTF8 or iconv.  The iconv descriptors
  are now in a separate structure so that we can avoid attaching
  finalizers to the ports themselves in the future.
  (enum scm_port_encoding_mode): New enum.
  (struct scm_iconv_descriptors): New struct.
  (scm_i_port_iconv_descriptors): Add prototype.

* libguile/ports.c (finalize_port): Don't close iconv descriptors here.
  (scm_new_port_table_entry): Adapt to the iconv descriptors being
  moved.  Initialize 'encoding_mode'.
  (scm_i_remove_port): Adapt to call 'close_iconv_descriptors'.
  (close_iconv_descriptors): New static function.
  (get_iconv_codepoint): Use 'scm_i_port_iconv_descriptors'.
  (get_codepoint): Check the port 'encoding_mode'.
  (finalize_iconv_descriptors, open_iconv_descriptors,
  close_iconv_descriptors, scm_i_port_iconv_descriptors): New static
  functions.
  (scm_i_set_port_encoding_x): Adapt to iconv descriptors being moved
  to separate structure, to set the 'encoding_mode' flag, and to use
  'open_iconv_descriptors' and 'close_iconv_descriptors'.

* libguile/print.c (display_string_using_iconv): Use
  'scm_i_port_iconv_descriptors'.
  (display_string): Use 'encoding_mode' flag.
---
 libguile/ports-internal.h |   22 ++++-
 libguile/ports.c          |  229 ++++++++++++++++++++++++++++-----------------
 libguile/print.c          |   17 +---
 3 files changed, 169 insertions(+), 99 deletions(-)

diff --git a/libguile/ports-internal.h b/libguile/ports-internal.h
index eba91c8..d52eab2 100644
--- a/libguile/ports-internal.h
+++ b/libguile/ports-internal.h
@@ -25,16 +25,36 @@
 #include "libguile/_scm.h"
 #include "libguile/ports.h"
 
-struct scm_port_internal
+enum scm_port_encoding_mode {
+  SCM_PORT_ENCODING_MODE_UTF8,
+  SCM_PORT_ENCODING_MODE_ICONV
+};
+
+typedef enum scm_port_encoding_mode scm_t_port_encoding_mode;
+
+/* This is a separate object so that only those ports that use iconv
+   cause finalizers to be registered (FIXME: although currently in 2.0
+   finalizers are always registered for ports anyway).  */
+struct scm_iconv_descriptors
 {
   /* input/output iconv conversion descriptors */
   void *input_cd;
   void *output_cd;
 };
 
+typedef struct scm_iconv_descriptors scm_t_iconv_descriptors;
+
+struct scm_port_internal
+{
+  scm_t_port_encoding_mode encoding_mode;
+  scm_t_iconv_descriptors *iconv_descriptors;
+};
+
 typedef struct scm_port_internal scm_t_port_internal;
 
 #define SCM_INTERNAL_PTAB_ENTRY(x)                              \
   ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
 
+SCM_INTERNAL scm_t_iconv_descriptors *scm_i_port_iconv_descriptors (SCM port);
+
 #endif
diff --git a/libguile/ports.c b/libguile/ports.c
index 86db4a0..3c67e67 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -555,8 +555,8 @@ static void finalize_port (void *, void *);
 static SCM_C_INLINE_KEYWORD void
 register_finalizer_for_port (SCM port)
 {
-  /* Register a finalizer for PORT so that its iconv CDs get freed and
-     optionally its type's `free' function gets called.  */
+  /* Register a finalizer for PORT so that its
+     type's `free' function gets called.  */
   scm_i_set_finalizer (SCM2PTR (port), finalize_port, NULL);
 }
 
@@ -577,8 +577,6 @@ finalize_port (void *ptr, void *data)
 	register_finalizer_for_port (port);
       else
 	{
-          scm_t_port_internal *pti;
-
 	  port_type = SCM_TC2PTOBNUM (SCM_CELL_TYPE (port));
 	  if (port_type >= scm_numptob)
 	    abort ();
@@ -588,13 +586,6 @@ finalize_port (void *ptr, void *data)
 	       is for explicit `close-port' by user.  */
 	    scm_ptobs[port_type].free (port);
 
-	  pti = SCM_INTERNAL_PTAB_ENTRY (port);
-
-	  if (pti->input_cd != (iconv_t) -1)
-	    iconv_close (pti->input_cd);
-	  if (pti->output_cd != (iconv_t) -1)
-	    iconv_close (pti->output_cd);
-
 	  SCM_SETSTREAM (port, 0);
 	  SCM_CLR_PORT_OPEN_FLAG (port);
 
@@ -620,7 +611,7 @@ scm_new_port_table_entry (scm_t_bits tag)
   SCM z = scm_cons (SCM_EOL, SCM_EOL);
   scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
   scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
-  const char *enc;
+  const char *encoding;
 
   entry->file_name = SCM_BOOL_F;
   entry->rw_active = SCM_PORT_NEITHER;
@@ -628,14 +619,14 @@ scm_new_port_table_entry (scm_t_bits tag)
 
   /* Initialize this port with the thread's current default
      encoding.  */
-  enc = scm_i_default_port_encoding ();
-  entry->encoding = enc ? scm_gc_strdup (enc, "port") : NULL;
-
-  /* The conversion descriptors will be opened lazily.  */
-  pti->input_cd = (iconv_t) -1;
-  pti->output_cd = (iconv_t) -1;
-
+  encoding = scm_i_default_port_encoding ();
   entry->ilseq_handler = scm_i_default_port_conversion_handler ();
+  entry->encoding = encoding ? scm_gc_strdup (encoding, "port") : NULL;
+  if (encoding && strcmp (encoding, "UTF-8") == 0)
+    pti->encoding_mode = SCM_PORT_ENCODING_MODE_UTF8;
+  else
+    pti->encoding_mode = SCM_PORT_ENCODING_MODE_ICONV;
+  pti->iconv_descriptors = NULL;
 
   /* XXX These fields are not what they seem.  They have been
      repurposed, but cannot safely be renamed in 2.0 without breaking
@@ -681,6 +672,8 @@ scm_add_to_port_table (SCM port)
 
 /* Remove a port from the table and destroy it.  */
 
+static void close_iconv_descriptors (scm_t_iconv_descriptors *id);
+
 static void
 scm_i_remove_port (SCM port)
 #define FUNC_NAME "scm_remove_port"
@@ -696,16 +689,10 @@ scm_i_remove_port (SCM port)
   p->putback_buf = NULL;
   p->putback_buf_size = 0;
 
-  if (pti->input_cd != (iconv_t) -1)
+  if (pti->iconv_descriptors)
     {
-      iconv_close (pti->input_cd);
-      pti->input_cd = (iconv_t) -1;
-    }
-  
-  if (pti->output_cd != (iconv_t) -1)
-    {
-      iconv_close (pti->output_cd);
-      pti->output_cd = (iconv_t) -1;
+      close_iconv_descriptors (pti->iconv_descriptors);
+      pti->iconv_descriptors = NULL;
     }
 
   SCM_SETPTAB_ENTRY (port, 0);
@@ -1306,13 +1293,13 @@ static int
 get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
 		     char buf[SCM_MBCHAR_BUF_SIZE], size_t *len)
 {
-  scm_t_port_internal *pti;
+  scm_t_iconv_descriptors *id;
   int err, byte_read;
   size_t bytes_consumed, output_size;
   char *output;
   scm_t_uint8 utf8_buf[SCM_MBCHAR_BUF_SIZE];
 
-  pti = SCM_INTERNAL_PTAB_ENTRY (port);
+  id = scm_i_port_iconv_descriptors (port);
 
   for (output_size = 0, output = (char *) utf8_buf,
 	 bytes_consumed = 0, err = 0;
@@ -1342,8 +1329,7 @@ get_iconv_codepoint (SCM port, scm_t_wchar *codepoint,
       input_left = bytes_consumed + 1;
       output_left = sizeof (utf8_buf);
 
-      done = iconv (pti->input_cd, &input, &input_left,
-		    &output, &output_left);
+      done = iconv (id->input_cd, &input, &input_left, &output, &output_left);
       if (done == (size_t) -1)
 	{
 	  err = errno;
@@ -1380,12 +1366,7 @@ get_codepoint (SCM port, scm_t_wchar *codepoint,
   scm_t_port *pt = SCM_PTAB_ENTRY (port);
   scm_t_port_internal *pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pti->input_cd == (iconv_t) -1)
-    /* Initialize the conversion descriptors, if needed.  */
-    scm_i_set_port_encoding_x (port, pt->encoding);
-
-  /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pti->input_cd == (iconv_t) -1)
+  if (pti->encoding_mode == SCM_PORT_ENCODING_MODE_UTF8)
     err = get_utf8_codepoint (port, codepoint, (scm_t_uint8 *) buf, len);
   else
     err = get_iconv_codepoint (port, codepoint, buf, len);
@@ -2215,73 +2196,149 @@ scm_i_default_port_encoding (void)
     }
 }
 
+static void
+finalize_iconv_descriptors (GC_PTR ptr, GC_PTR data)
+{
+  close_iconv_descriptors (ptr);
+}
+
+static scm_t_iconv_descriptors *
+open_iconv_descriptors (const char *encoding, int reading, int writing)
+{
+  scm_t_iconv_descriptors *id;
+  iconv_t input_cd, output_cd;
+
+  input_cd = (iconv_t) -1;
+  output_cd = (iconv_t) -1;
+  if (reading)
+    {
+      /* Open an input iconv conversion descriptor, from ENCODING
+         to UTF-8.  We choose UTF-8, not UTF-32, because iconv
+         implementations can typically convert from anything to
+         UTF-8, but not to UTF-32 (see
+         <http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>).  */
+
+      /* Assume opening an iconv descriptor causes about 16 KB of
+         allocation.  */
+      scm_gc_register_allocation (16 * 1024);
+
+      input_cd = iconv_open ("UTF-8", encoding);
+      if (input_cd == (iconv_t) -1)
+        goto invalid_encoding;
+    }
+
+  if (writing)
+    {
+      /* Assume opening an iconv descriptor causes about 16 KB of
+         allocation.  */
+      scm_gc_register_allocation (16 * 1024);
+
+      output_cd = iconv_open (encoding, "UTF-8");
+      if (output_cd == (iconv_t) -1)
+        {
+          if (input_cd != (iconv_t) -1)
+            iconv_close (input_cd);
+          goto invalid_encoding;
+        }
+    }
+
+  id = scm_gc_malloc_pointerless (sizeof (*id), "iconv descriptors");
+  id->input_cd = input_cd;
+  id->output_cd = output_cd;
+
+  {
+    GC_finalization_proc prev_finalizer;
+    GC_PTR prev_finalization_data;
+
+    /* Register a finalizer to close the descriptors.  */
+    GC_REGISTER_FINALIZER_NO_ORDER (id, finalize_iconv_descriptors, 0,
+                                    &prev_finalizer, &prev_finalization_data);
+  }
+
+  return id;
+ 
+ invalid_encoding:
+  {
+    SCM err;
+    err = scm_from_locale_string (encoding);
+    scm_misc_error ("open_iconv_descriptors",
+                    "invalid or unknown character encoding ~s",
+                    scm_list_1 (err));
+  }
+}
+
+static void
+close_iconv_descriptors (scm_t_iconv_descriptors *id)
+{
+  if (id->input_cd != (iconv_t) -1)
+    iconv_close (id->input_cd);
+  if (id->output_cd != (iconv_t) -1)
+    iconv_close (id->output_cd);
+  id->input_cd = (void *) -1;
+  id->output_cd = (void *) -1;
+}
+
+scm_t_iconv_descriptors *
+scm_i_port_iconv_descriptors (SCM port)
+{
+  scm_t_port *pt;
+  scm_t_port_internal *pti;
+
+  pt = SCM_PTAB_ENTRY (port);
+  pti = SCM_INTERNAL_PTAB_ENTRY (port);
+
+  assert (pti->encoding_mode == SCM_PORT_ENCODING_MODE_ICONV);
+
+  if (!pti->iconv_descriptors)
+    {
+      if (!pt->encoding)
+        pt->encoding = "ISO-8859-1";
+      pti->iconv_descriptors =
+        open_iconv_descriptors (pt->encoding,
+                                SCM_INPUT_PORT_P (port),
+                                SCM_OUTPUT_PORT_P (port));
+    }
+
+  return pti->iconv_descriptors;
+}
+
 void
 scm_i_set_port_encoding_x (SCM port, const char *encoding)
 {
   scm_t_port *pt;
   scm_t_port_internal *pti;
-  iconv_t new_input_cd, new_output_cd;
-
-  new_input_cd = (iconv_t) -1;
-  new_output_cd = (iconv_t) -1;
+  scm_t_iconv_descriptors *prev;
 
   /* Set the character encoding for this port.  */
   pt = SCM_PTAB_ENTRY (port);
   pti = SCM_INTERNAL_PTAB_ENTRY (port);
+  prev = pti->iconv_descriptors;
 
   if (encoding == NULL)
     encoding = "ISO-8859-1";
 
-  if (pt->encoding != encoding)
-    pt->encoding = scm_gc_strdup (encoding, "port");
-
   /* If ENCODING is UTF-8, then no conversion descriptor is opened
      because we do I/O ourselves.  This saves 100+ KiB for each
      descriptor.  */
-  if (strcmp (encoding, "UTF-8"))
+  if (strcmp (encoding, "UTF-8") == 0)
     {
-      if (SCM_CELL_WORD_0 (port) & SCM_RDNG)
-	{
-	  /* Open an input iconv conversion descriptor, from ENCODING
-	     to UTF-8.  We choose UTF-8, not UTF-32, because iconv
-	     implementations can typically convert from anything to
-	     UTF-8, but not to UTF-32 (see
-	     <http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>).  */
-	  new_input_cd = iconv_open ("UTF-8", encoding);
-	  if (new_input_cd == (iconv_t) -1)
-	    goto invalid_encoding;
-	}
-
-      if (SCM_CELL_WORD_0 (port) & SCM_WRTNG)
-	{
-	  new_output_cd = iconv_open (encoding, "UTF-8");
-	  if (new_output_cd == (iconv_t) -1)
-	    {
-	      if (new_input_cd != (iconv_t) -1)
-		iconv_close (new_input_cd);
-	      goto invalid_encoding;
-	    }
-	}
+      pt->encoding = "UTF-8";
+      pti->encoding_mode = SCM_PORT_ENCODING_MODE_UTF8;
+      pti->iconv_descriptors = NULL;
+    }
+  else
+    {
+      /* Open descriptors before mutating the port. */
+      pti->iconv_descriptors =
+        open_iconv_descriptors (encoding,
+                                SCM_INPUT_PORT_P (port),
+                                SCM_OUTPUT_PORT_P (port));
+      pt->encoding = scm_gc_strdup (encoding, "port");
+      pti->encoding_mode = SCM_PORT_ENCODING_MODE_ICONV;
     }
 
-  if (pti->input_cd != (iconv_t) -1)
-    iconv_close (pti->input_cd);
-  if (pti->output_cd != (iconv_t) -1)
-    iconv_close (pti->output_cd);
-
-  pti->input_cd = new_input_cd;
-  pti->output_cd = new_output_cd;
-
-  return;
-
- invalid_encoding:
-  {
-    SCM err;
-    err = scm_from_locale_string (encoding);
-    scm_misc_error ("scm_i_set_port_encoding_x",
-		    "invalid or unknown character encoding ~s",
-		    scm_list_1 (err));
-  }
+  if (prev)
+    close_iconv_descriptors (prev);
 }
 
 SCM_DEFINE (scm_port_encoding, "port-encoding", 1, 0, 0,
diff --git a/libguile/print.c b/libguile/print.c
index 62d7267..3469d71 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -880,9 +880,9 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
 			    scm_t_string_failed_conversion_handler strategy)
 {
   size_t printed;
-  scm_t_port_internal *pti;
+  scm_t_iconv_descriptors *id;
 
-  pti = SCM_INTERNAL_PTAB_ENTRY (port);
+  id = scm_i_port_iconv_descriptors (port);
 
   printed = 0;
 
@@ -911,7 +911,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
       output = encoded_output;
       output_left = sizeof (encoded_output);
 
-      done = iconv (pti->output_cd, &input, &input_left,
+      done = iconv (id->output_cd, &input, &input_left,
 		    &output, &output_left);
 
       output_len = sizeof (encoded_output) - output_left;
@@ -921,7 +921,7 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
           int errno_save = errno;
 
 	  /* Reset the `iconv' state.  */
-	  iconv (pti->output_cd, NULL, NULL, NULL, NULL);
+	  iconv (id->output_cd, NULL, NULL, NULL, NULL);
 
 	  /* Print the OUTPUT_LEN bytes successfully converted.  */
 	  scm_lfwrite (encoded_output, output_len, port);
@@ -981,18 +981,11 @@ display_string (const void *str, int narrow_p,
 		scm_t_string_failed_conversion_handler strategy)
 
 {
-  scm_t_port *pt;
   scm_t_port_internal *pti;
 
-  pt = SCM_PTAB_ENTRY (port);
   pti = SCM_INTERNAL_PTAB_ENTRY (port);
 
-  if (pti->output_cd == (iconv_t) -1)
-    /* Initialize the conversion descriptors, if needed.  */
-    scm_i_set_port_encoding_x (port, pt->encoding);
-
-  /* FIXME: In 2.1, add a flag to determine whether a port is UTF-8.  */
-  if (pti->output_cd == (iconv_t) -1)
+  if (pti->encoding_mode == SCM_PORT_ENCODING_MODE_UTF8)
     return display_string_as_utf8 (str, narrow_p, len, port);
   else
     return display_string_using_iconv (str, narrow_p, len,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-03-31 15:23                 ` Mark H Weaver
@ 2013-03-31 22:06                   ` Ludovic Courtès
  0 siblings, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2013-03-31 22:06 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> I've come to the conclusion that it is not safe to modify 'scm_t_port'
>>> in 2.0 at all; not even to change the member names.  In brief, the
>>> reason has to do with the C11 standard definition of "compatible types",
>>> which ties into the strict aliasing rules.  Section 6.2.7 of C11 spells
>>> out what it means for two structures declared in separate translation
>>> units to be compatible, and among other things their member names must
>>> be the same.
>>
>> I can’t imagine how changing the *name* of a member could change
>> something to the structure’s layout in practice.
>
> It doesn't change the structure's layout.  However, it could cause
> link-time optimization to break our code.

Ah right, makes sense.

>>> +#define scm_gc_typed_calloc(t) ((t *) scm_gc_calloc (sizeof (t), #t))
>>
>> Not really convinced by this, but hey.  Ideally, this would need to go
>> in the manual too.
>
> Maybe talk to Andy about it?  It was his suggestion.

No big deal.

Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-31 19:44             ` [PATCH] Add private port structure, and " Mark H Weaver
@ 2013-03-31 22:08               ` Ludovic Courtès
  2013-04-01 19:04               ` Andy Wingo
  1 sibling, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2013-03-31 22:08 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> Andy Wingo <wingo@pobox.com> writes:
>> Can you take a look at porting
>> 6c98257f2ead0855f218369ea7f9a823cdb9727e?
>
> Okay, here's an updated patch set that's essentially a backport of that
> commit from master, but with 'encoding_mode' and 'iconv_descriptors' put
> in the internal-only port structure.

I’m not familiar with how things are done in ‘master’.  From a 2.0
perspective, it looks good to me.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-03-31  7:52             ` [PATCH] Add internal-only port structure; " Mark H Weaver
  2013-03-31 13:20               ` Ludovic Courtès
@ 2013-04-01 18:57               ` Andy Wingo
  2013-04-01 20:03                 ` Mark H Weaver
  1 sibling, 1 reply; 25+ messages in thread
From: Andy Wingo @ 2013-04-01 18:57 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

LGTM if these are addressed:

On Sun 31 Mar 2013 09:52, Mark H Weaver <mhw@netris.org> writes:

> +#define SCM_INTERNAL_PTAB_ENTRY(x)                              \
> +  ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
> +

SCM_PORT_GET_INTERNAL(x) ?  PTAB is a historical name (port table; there
is no more port table.)

>    SCM z = scm_cons (SCM_EOL, SCM_EOL);
> -  scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), "port");
> +  scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
> +  scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
>    const char *enc;

How about allocating a struct { scm_t_port a; scm_t_port_internal b; }
and get the pointers from there?

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add private port structure, and move iconv descriptors there
  2013-03-31 19:44             ` [PATCH] Add private port structure, and " Mark H Weaver
  2013-03-31 22:08               ` Ludovic Courtès
@ 2013-04-01 19:04               ` Andy Wingo
  1 sibling, 0 replies; 25+ messages in thread
From: Andy Wingo @ 2013-04-01 19:04 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Sun 31 Mar 2013 21:44, Mark H Weaver <mhw@netris.org> writes:

> +  {
> +    GC_finalization_proc prev_finalizer;
> +    GC_PTR prev_finalization_data;
> +
> +    /* Register a finalizer to close the descriptors.  */
> +    GC_REGISTER_FINALIZER_NO_ORDER (id, finalize_iconv_descriptors, 0,
> +                                    &prev_finalizer, &prev_finalization_data);
> +  }

Here we really want 6978c673393a960d7caf604b8c72ff2b5fe0f4ec; committed
as 75ba64d6797f5857cc9885eb753126119a8c8b68 in stable-2.0, but probably
needs to be manually reapplied here.

Otherwise looks great to me.
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-04-01 18:57               ` Andy Wingo
@ 2013-04-01 20:03                 ` Mark H Weaver
  2013-04-01 20:54                   ` Andy Wingo
  0 siblings, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2013-04-01 20:03 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Hi Andy,

Andy Wingo <wingo@pobox.com> writes:

> LGTM if these are addressed:
>
> On Sun 31 Mar 2013 09:52, Mark H Weaver <mhw@netris.org> writes:
>
>> +#define SCM_INTERNAL_PTAB_ENTRY(x)                              \
>> +  ((scm_t_port_internal *) (SCM_PTAB_ENTRY(x)->input_cd))
>> +
>
> SCM_PORT_GET_INTERNAL(x) ?  PTAB is a historical name (port table; there
> is no more port table.)

Good idea, will do!

>>    SCM z = scm_cons (SCM_EOL, SCM_EOL);
>> -  scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), "port");
>> +  scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
>> +  scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
>>    const char *enc;
>
> How about allocating a struct { scm_t_port a; scm_t_port_internal b; }
> and get the pointers from there?

I was hoping to do something like this in master, but having thought
about it some more, I don't see how to do this without running afoul of
the strict aliasing rules.  I confess that I don't yet fully understand
the rules, but it sounds like we're not allowed to access that combined
struct you propose above except via that a pointer to that combined
struct.  So says Marcus Brinkmann:

  http://thread.gmane.org/gmane.os.hurd.l4/1519

In other words, we'd need to modify SCM_PTAB_ENTRY to access it through
that combined struct, which in turn means that we'd have to export the
complete definition of 'scm_t_port_internal' to the user, and that would
defeat our original goal.

I suspect that Marcus is correct.  Section 6.5, clause 7, of C11 states:

   An object shall have its stored value accessed only by an lvalue
   expression that has one of the following types:
   
   * a type compatible with the effective type of the object,
   
   * a qualified version of a type compatible with the effective type of
     the object,
   
   * a type that is the signed or unsigned type corresponding to the
     effective type of the object,
   
   * a type that is the signed or unsigned type corresponding to a
     qualified version of the effective type of the object,
   
   * an aggregate or union type that includes one of the aforementioned
     types among its members (including, recursively, a member of a
     subaggregate or contained union), or
   
   * a character type.

Where an 'object' is defined as "region of data storage in the execution
environment, the contents of which can represent values" (section 3.15).
So entire structs are considered objects.

As I interpret this, a 'scm_t_port' object can be accessed via your
combined struct above, but not vice versa.  Maybe I'm missing something,
but I see nothing above that clearly permits your combined struct to be
accessed via a bare 'scm_t_port *'.

Remember that the purpose of all this is to prevent the compiler from
having to constantly refetch things from memory every time anything is
written to memory.  Toward that end, it is crucial to minimize the
number of legal aliases, and there's every reason to believe that
compilers will become increasingly aggressive about this.

Anyway, if it turns out that this is doable somehow, we can easily
change it later, since all accesses to the internal structure are made
via the 'SCM_PORT_GET_INTERNAL' macro.

     Thanks!
       Mark



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-04-01 20:03                 ` Mark H Weaver
@ 2013-04-01 20:54                   ` Andy Wingo
  2013-04-01 21:04                     ` Andy Wingo
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Wingo @ 2013-04-01 20:54 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Mon 01 Apr 2013 22:03, Mark H Weaver <mhw@netris.org> writes:

>>>    SCM z = scm_cons (SCM_EOL, SCM_EOL);
>>> -  scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), "port");
>>> +  scm_t_port *entry = scm_gc_typed_calloc (scm_t_port);
>>> +  scm_t_port_internal *pti = scm_gc_typed_calloc (scm_t_port_internal);
>>>    const char *enc;
>>
>> How about allocating a struct { scm_t_port a; scm_t_port_internal b; }
>> and get the pointers from there?
>
> I was hoping to do something like this in master, but having thought
> about it some more, I don't see how to do this without running afoul of
> the strict aliasing rules.

Though we already violate them in innumerable ways in Guile, this is not
one of those cases:

  struct foo { a a; b b; } *foo;
  foo = malloc (sizeof (*foo));
  a* a = &bar.a;
  b* b = &bar.b;

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Add internal-only port structure; move iconv descriptors there
  2013-04-01 20:54                   ` Andy Wingo
@ 2013-04-01 21:04                     ` Andy Wingo
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Wingo @ 2013-04-01 21:04 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel

On Mon 01 Apr 2013 22:54, Andy Wingo <wingo@pobox.com> writes:

> Though we already violate them in innumerable ways in Guile, this is not
> one of those cases:
>
>   struct foo { a a; b b; } *foo;
>   foo = malloc (sizeof (*foo));
>   a* a = &bar.a;
>   b* b = &bar.b;

To be clear, I was suggesting to allocate some struct type that is never
again referenced; not to cast.

But these comments shouldn't hold back your patches.  We can fix things
later if needed.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-04-01 21:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-24 11:39 always O_BINARY? Andy Wingo
2013-02-24 12:08 ` Neil Jerram
2013-02-24 16:55 ` Mike Gran
2013-02-24 21:17 ` Ludovic Courtès
2013-02-28  3:24   ` Adding new information to scm_t_port (was Re: always O_BINARY?) Mark H Weaver
2013-02-28  8:53     ` Andy Wingo
2013-02-28 11:04       ` Ludovic Courtès
2013-02-28 13:09         ` Andy Wingo
2013-03-01  9:03           ` Ludovic Courtès
2013-03-05 18:55       ` Mark H Weaver
2013-03-27 20:00         ` [PATCH] Add private port structure, and move iconv descriptors there Mark H Weaver
2013-03-27 20:28           ` Ludovic Courtès
2013-03-27 20:51           ` Andy Wingo
2013-03-27 21:11             ` Mark H Weaver
2013-03-31  7:52             ` [PATCH] Add internal-only port structure; " Mark H Weaver
2013-03-31 13:20               ` Ludovic Courtès
2013-03-31 15:23                 ` Mark H Weaver
2013-03-31 22:06                   ` Ludovic Courtès
2013-04-01 18:57               ` Andy Wingo
2013-04-01 20:03                 ` Mark H Weaver
2013-04-01 20:54                   ` Andy Wingo
2013-04-01 21:04                     ` Andy Wingo
2013-03-31 19:44             ` [PATCH] Add private port structure, and " Mark H Weaver
2013-03-31 22:08               ` Ludovic Courtès
2013-04-01 19:04               ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).