unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Guile hangs for minutes when many files are opened (1.8.7)
@ 2011-06-14  7:21 rixed
  2011-06-21 13:02 ` rixed
  2011-07-04  9:33 ` rixed
  0 siblings, 2 replies; 8+ messages in thread
From: rixed @ 2011-06-14  7:21 UTC (permalink / raw)
  To: guile-user

Hello happy hackers!

I'm using guile (1.8.7) to extend an application (written in C) that
writes on many opened file descriptors (usually 5000 files opened at
once).

Amongst other things, guile is listening on a socket to serve user'
commands (kind of embedded REPL).

When many files are opened, guile can be very slow (ie. up to 2 minutes)
to answer connects and/or commands sent to this REPL ; but if I disable
the part of the application that opens and writes into the many files,
then everything goes smooth.

strace revealed me that when pausing, guile is waiting for a lock. There
are some few other guile threads that performs minor activity that may
also use a port (for instance to write a few lines into stdout), but
apart from that guile should be mostly sleeping.

So I've checked libguile/ports.c, looking for what could be affected by
the huge number of opened files, and stumbled upon the port table copy
into a temp port vector in scm_c_port_for_each(), which may hold
scm_i_port_table_mutex for some time if there are as many ports as
opened files descriptor ; but I wonder whether there is an associated
port for a file descriptor that's never used nor even communicated to
guile? Also, I do not call this function myself.

Also, scm_flush_all_ports() is iterating over all ports, but the same
remarks as above apply (I don't call this - although I've found that
really_cleanup_for_exit() does call it - and yet, the 5k file descr
should be unknown of guile).

Do you have any idea of what could cause these pauses?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-14  7:21 Guile hangs for minutes when many files are opened (1.8.7) rixed
@ 2011-06-21 13:02 ` rixed
  2011-06-21 21:59   ` Ludovic Courtès
  2011-07-04  9:33 ` rixed
  1 sibling, 1 reply; 8+ messages in thread
From: rixed @ 2011-06-21 13:02 UTC (permalink / raw)
  To: guile-user

Ok, the problem is straightforward: fport_input_waiting is using select
with a predefined SELECT_SET_SIZE of 1024. Although guile does not see
the many files used by the applications, it may ends up with a file
descriptor whose number is bigger than that. Then the select will
block forever.

Increasing SELECT_SET_SIZE will not do it, since the fd set must not be
bigger than FD_SETSIZE(=1024 on my box) anyway.

I will try to use the alternative (IOCTL).
Nobody's planned to replace select by ppoll yet? :)
If I end up doing it, interrested in a patch?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-21 13:02 ` rixed
@ 2011-06-21 21:59   ` Ludovic Courtès
  2011-06-22  8:34     ` rixed
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2011-06-21 21:59 UTC (permalink / raw)
  To: guile-user

Hi Cédric,

rixed@happyleptic.org skribis:

> Ok, the problem is straightforward: fport_input_waiting is using select
> with a predefined SELECT_SET_SIZE of 1024. Although guile does not see
> the many files used by the applications, it may ends up with a file
> descriptor whose number is bigger than that. Then the select will
> block forever.

Oooh, interesting bug.  :-)

> Nobody's planned to replace select by ppoll yet? :)

Guile 2.0 has (ice-9 poll), but it relies on some features not available
in 1.8.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-21 21:59   ` Ludovic Courtès
@ 2011-06-22  8:34     ` rixed
  2011-06-22 12:08       ` rixed
  2011-06-22 14:31       ` Ludovic Courtès
  0 siblings, 2 replies; 8+ messages in thread
From: rixed @ 2011-06-22  8:34 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 144 bytes --]

Here is attached a patch over 1-8-8 that seams to fix the problem (I'm
going to test it more thoughtfully).

What do you think of the approach?

[-- Attachment #2: guile.diff --]
[-- Type: text/x-diff, Size: 2787 bytes --]

diff --git a/configure.in b/configure.in
index 217ac83..ea26b5c 100644
--- a/configure.in
+++ b/configure.in
@@ -661,7 +661,7 @@ AC_CHECK_HEADERS([complex.h fenv.h io.h libc.h limits.h malloc.h memory.h proces
 regex.h rxposix.h rx/rxposix.h sys/dir.h sys/ioctl.h sys/select.h \
 sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \
 sys/utime.h time.h unistd.h utime.h pwd.h grp.h sys/utsname.h \
-direct.h strings.h machine/fpu.h])
+poll.h direct.h strings.h machine/fpu.h])
 
 # "complex double" is new in C99, and "complex" is only a keyword if
 # <complex.h> is included
@@ -755,7 +755,7 @@ AC_CHECK_HEADERS([assert.h crt_externs.h])
 #   isblank - available as a GNU extension or in C99
 #   _NSGetEnviron - Darwin specific
 #
-AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid fesetround ftime ftruncate fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice pipe _pipe readdir_r readdir64_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex truncate unsetenv isblank _NSGetEnviron strncasecmp])
+AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid fesetround ftime ftruncate fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice pipe _pipe readdir_r readdir64_r readlink rename rmdir poll select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex truncate unsetenv isblank _NSGetEnviron strncasecmp])
 
 # Reasons for testing:
 #   netdb.h - not in mingw
diff --git a/libguile/fports.c b/libguile/fports.c
index 007ee3f..c807122 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -46,7 +46,9 @@
 #ifdef HAVE_STRUCT_STAT_ST_BLKSIZE
 #include <sys/stat.h>
 #endif
-
+#ifdef HAVE_POLL_H
+#include <poll.h>
+#endif
 #include <errno.h>
 #include <sys/types.h>
 
@@ -485,7 +487,14 @@ scm_fdes_to_port (int fdes, char *mode, SCM name)
 static int
 fport_input_waiting (SCM port)
 {
-#ifdef HAVE_SELECT
+#ifdef HAVE_POLL
+  int fdes = SCM_FSTREAM (port)->fdes;
+  struct pollfd pollfd = { fdes, POLLIN, 0 };
+  if (poll(&pollfd, 1, 0) < 0)
+    scm_syserror ("fport_input_waiting");
+  return pollfd.revents & POLLIN ? 1 : 0;
+
+#elif defined(HAVE_SELECT)
   int fdes = SCM_FSTREAM (port)->fdes;
   struct timeval timeout;
   SELECT_TYPE read_set;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-22  8:34     ` rixed
@ 2011-06-22 12:08       ` rixed
  2011-06-22 14:31       ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: rixed @ 2011-06-22 12:08 UTC (permalink / raw)
  To: guile-user

-[ Wed, Jun 22, 2011 at 10:34:42AM +0200, rixed@happyleptic.org ]----
> Here is attached a patch over 1-8-8 that seams to fix the problem (I'm
> going to test it more thoughtfully).

Works for me.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-22  8:34     ` rixed
  2011-06-22 12:08       ` rixed
@ 2011-06-22 14:31       ` Ludovic Courtès
  2011-06-22 15:56         ` rixed
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2011-06-22 14:31 UTC (permalink / raw)
  To: guile-user

rixed@happyleptic.org skribis:

> Here is attached a patch over 1-8-8 that seams to fix the problem (I'm
> going to test it more thoughtfully).
>
> What do you think of the approach?

Simple and efficient.  :-)

We could/should apply it, but I must admit I feel little motivation to
push another 1.8 release.

Thanks,
Ludo'.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-22 14:31       ` Ludovic Courtès
@ 2011-06-22 15:56         ` rixed
  0 siblings, 0 replies; 8+ messages in thread
From: rixed @ 2011-06-22 15:56 UTC (permalink / raw)
  To: guile-user

-[ Wed, Jun 22, 2011 at 04:31:06PM +0200, Ludovic Courtès ]----
> We could/should apply it, but I must admit I feel little motivation to
> push another 1.8 release.

I can understand.
BTW, the same patch apply to master as well.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Guile hangs for minutes when many files are opened (1.8.7)
  2011-06-14  7:21 Guile hangs for minutes when many files are opened (1.8.7) rixed
  2011-06-21 13:02 ` rixed
@ 2011-07-04  9:33 ` rixed
  1 sibling, 0 replies; 8+ messages in thread
From: rixed @ 2011-07-04  9:33 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

Well, the bug is more annoying than previously though since guile use
select on various ports internally.

The attached patch fixes all occurrences of such uses.
The scm_std_poll function API is ugly, though.

I'd really like this issue being addressed in current guile branch
either with this patch or a better one.


[-- Attachment #2: select.patch --]
[-- Type: text/x-diff, Size: 6266 bytes --]

diff --git a/configure.in b/configure.in
index 217ac83..ea26b5c 100644
--- a/configure.in
+++ b/configure.in
@@ -661,7 +661,7 @@ AC_CHECK_HEADERS([complex.h fenv.h io.h libc.h limits.h malloc.h memory.h proces
 regex.h rxposix.h rx/rxposix.h sys/dir.h sys/ioctl.h sys/select.h \
 sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \
 sys/utime.h time.h unistd.h utime.h pwd.h grp.h sys/utsname.h \
-direct.h strings.h machine/fpu.h])
+poll.h direct.h strings.h machine/fpu.h])
 
 # "complex double" is new in C99, and "complex" is only a keyword if
 # <complex.h> is included
@@ -755,7 +755,7 @@ AC_CHECK_HEADERS([assert.h crt_externs.h])
 #   isblank - available as a GNU extension or in C99
 #   _NSGetEnviron - Darwin specific
 #
-AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid fesetround ftime ftruncate fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice pipe _pipe readdir_r readdir64_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex truncate unsetenv isblank _NSGetEnviron strncasecmp])
+AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid fesetround ftime ftruncate fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice pipe _pipe readdir_r readdir64_r readlink rename rmdir poll select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex truncate unsetenv isblank _NSGetEnviron strncasecmp])
 
 # Reasons for testing:
 #   netdb.h - not in mingw
diff --git a/libguile/fports.c b/libguile/fports.c
index 007ee3f..8e16a55 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -46,7 +46,9 @@
 #ifdef HAVE_STRUCT_STAT_ST_BLKSIZE
 #include <sys/stat.h>
 #endif
-
+#ifdef HAVE_POLL_H
+#include <poll.h>
+#endif
 #include <errno.h>
 #include <sys/types.h>
 
@@ -485,7 +487,14 @@ scm_fdes_to_port (int fdes, char *mode, SCM name)
 static int
 fport_input_waiting (SCM port)
 {
-#ifdef HAVE_SELECT
+#ifdef HAVE_POLL
+  int fdes = SCM_FSTREAM (port)->fdes;
+  struct pollfd pollfd = { fdes, POLLIN, 0 };
+  if (poll(&pollfd, 1, 0) < 0)
+    scm_syserror ("fport_input_waiting");
+  return pollfd.revents & POLLIN ? 1 : 0;
+
+#elif defined(HAVE_SELECT)
   int fdes = SCM_FSTREAM (port)->fdes;
   struct timeval timeout;
   SELECT_TYPE read_set;
@@ -566,7 +575,6 @@ fport_wait_for_input (SCM port)
   if (!fport_input_waiting (port))
     {
       int n;
-      SELECT_TYPE readfds;
       int flags = fcntl (fdes, F_GETFL);
 
       if (flags == -1)
@@ -574,9 +582,17 @@ fport_wait_for_input (SCM port)
       if (!(flags & O_NONBLOCK))
 	do
 	  {
+#if HAVE_POLL
+        struct pollfd pollfds[2];
+		pollfds[1].fd = fdes;
+		pollfds[1].events = POLLIN;
+		n = scm_std_poll (2, pollfds, -1);
+#else
+        SELECT_TYPE readfds;
 	    FD_ZERO (&readfds);
 	    FD_SET (fdes, &readfds);
 	    n = scm_std_select (fdes + 1, &readfds, NULL, NULL, NULL);
+#endif
 	  }
 	while (n == -1 && errno == EINTR);
     }
diff --git a/libguile/iselect.h b/libguile/iselect.h
index b23a641..459e690 100644
--- a/libguile/iselect.h
+++ b/libguile/iselect.h
@@ -57,6 +57,15 @@
 
 #endif /* no FD_SET */
 
+#if HAVE_POLL
+
+#include <poll.h>
+
+SCM_API int scm_std_poll (int fds,
+		        struct pollfd *pollfds,
+			    int timeout_ms);
+#endif
+
 SCM_API int scm_std_select (int fds,
 			    SELECT_TYPE *rfds,
 			    SELECT_TYPE *wfds,
diff --git a/libguile/socket.c b/libguile/socket.c
index cb954f4..9d9efb6 100644
--- a/libguile/socket.c
+++ b/libguile/socket.c
@@ -1328,7 +1328,6 @@ SCM_DEFINE (scm_accept, "accept", 1, 0, 0,
   int newfd;
   SCM address;
   SCM newsock;
-  SELECT_TYPE readfds, exceptfds;
   socklen_t addr_size = MAX_ADDR_SIZE;
   scm_t_max_sockaddr addr;
 
@@ -1336,6 +1335,16 @@ SCM_DEFINE (scm_accept, "accept", 1, 0, 0,
   SCM_VALIDATE_OPFPORT (1, sock);
   fd = SCM_FPORT_FDES (sock);
 
+#if HAVE_POLL
+  struct pollfd pollfds[2];
+  pollfds[1].fd = fd;
+  pollfds[1].events = POLLIN | POLLHUP;
+
+  /* Block until something happens on FD, leaving guile mode while
+     waiting.  */
+  selected = scm_std_poll (2, pollfds, -1);
+#else
+  SELECT_TYPE readfds, exceptfds;
   FD_ZERO (&readfds);
   FD_ZERO (&exceptfds);
   FD_SET (fd, &readfds);
@@ -1345,6 +1354,7 @@ SCM_DEFINE (scm_accept, "accept", 1, 0, 0,
      waiting.  */
   selected = scm_std_select (fd + 1, &readfds, NULL, &exceptfds,
 			     NULL);
+#endif
   if (selected < 0)
     SCM_SYSERROR;
 
diff --git a/libguile/threads.c b/libguile/threads.c
index f2bb556..16e17e5 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -37,6 +37,10 @@
 #include <sys/time.h>
 #endif
 
+#ifdef HAVE_POLL_H
+#include <poll.h>
+#endif
+
 #include "libguile/validate.h"
 #include "libguile/root.h"
 #include "libguile/eval.h"
@@ -1419,6 +1423,52 @@ scm_threads_mark_stacks (void)
 
 /*** Select */
 
+#if HAVE_POLL
+int
+scm_std_poll (int nfds,
+		struct pollfd *pollfds,	// first entry is unset, reserved for us
+		int timeout_ms)
+{
+  int res, eno, wakeup_fd;
+  scm_i_thread *t = SCM_I_CURRENT_THREAD;
+  scm_t_guile_ticket ticket;
+
+  while (scm_i_setup_sleep (t, SCM_BOOL_F, NULL, t->sleep_pipe[1]))
+    SCM_TICK;
+
+  wakeup_fd = t->sleep_pipe[0];
+  ticket = scm_leave_guile ();
+
+  pollfds[0].fd = wakeup_fd;
+  pollfds[0].events = POLLIN;
+
+  res = poll (pollfds, nfds, timeout_ms);
+  t->sleep_fd = -1;
+  eno = errno;
+  scm_enter_guile (ticket);
+
+  scm_i_reset_sleep (t);
+
+  if (res > 0 && (pollfds[0].revents & POLLIN))
+    {
+      char dummy;
+      size_t count;
+
+      count = read (wakeup_fd, &dummy, 1);
+
+	  pollfds[0].revents = 0;
+      res -= 1;
+      if (res == 0)
+	{
+	  eno = EINTR;
+	  res = -1;
+	}
+    }
+  errno = eno;
+  return res;
+}
+#endif
+
 int
 scm_std_select (int nfds,
 		SELECT_TYPE *readfds,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-07-04  9:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-14  7:21 Guile hangs for minutes when many files are opened (1.8.7) rixed
2011-06-21 13:02 ` rixed
2011-06-21 21:59   ` Ludovic Courtès
2011-06-22  8:34     ` rixed
2011-06-22 12:08       ` rixed
2011-06-22 14:31       ` Ludovic Courtès
2011-06-22 15:56         ` rixed
2011-07-04  9:33 ` rixed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).