From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.user Subject: Re: Guile bugs Date: Tue, 19 Sep 2017 19:04:21 +0800 Message-ID: References: <87lgtajpkc.fsf@web.de> <87h8y7ruuz.fsf_-_@gnu.org> <87y3pm7l6j.fsf@gnu.org> <87wp55elvs.fsf@gnu.org> <871sn875tl.fsf@gnu.org> Reply-To: linasvepstas@gmail.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1505819328 20554 195.159.176.226 (19 Sep 2017 11:08:48 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 19 Sep 2017 11:08:48 +0000 (UTC) Cc: Guile User To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Tue Sep 19 13:08:36 2017 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1duGOK-0004pD-3b for guile-user@m.gmane.org; Tue, 19 Sep 2017 13:08:36 +0200 Original-Received: from localhost ([::1]:41552 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duGOR-0001za-43 for guile-user@m.gmane.org; Tue, 19 Sep 2017 07:08:43 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37786) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duGKl-0007oj-Gq for guile-user@gnu.org; Tue, 19 Sep 2017 07:04:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1duGKj-0001V6-D3 for guile-user@gnu.org; Tue, 19 Sep 2017 07:04:55 -0400 Original-Received: from mail-lf0-x230.google.com ([2a00:1450:4010:c07::230]:52610) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1duGKb-0001OX-HY; Tue, 19 Sep 2017 07:04:45 -0400 Original-Received: by mail-lf0-x230.google.com with SMTP id b127so3287040lfe.9; Tue, 19 Sep 2017 04:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=2E3qavvltJMgIyYBYqi8wU2pu1eNYy8NmsEG2O9jwsI=; b=MYZSJhtjb22huFoHY+ilglKmTqk8Nxi/M/zzJcfMKid3rCfFkcGvG700Y7iPYiy8oD +QMDOLvif3R3VEnCZl1jrVA6LAWJzcuYaERU2qPym6mb2RM4BKrbUG2RrMD5EiHcx2F6 Zts9YuOGdMmziPIBkDJlDiGeRN83/S1Q6BfKRYzrbz6qSNQFf7SIFUyO4U2/nZ5oJgdr BTnLvpMNeEyptVFnYMq36LoYh0t5ZCgtJD+VyiBNxslpl6fvYCDOPXTTWtztajreoTBw 8OrnTZH+BZ2c8U8e5QN4dl8rdh7ETXHKuNfQtDdSyBYXqb4CjBeVBxUBUCM35P4H2Tnl K0Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=2E3qavvltJMgIyYBYqi8wU2pu1eNYy8NmsEG2O9jwsI=; b=iOfka6cF25mHC0jY//P7u/KJ4XkA+cSdzCprSE1BnX0LMo/0jTCHI2ErOrvy6AKFI7 DoQ0kpU7t4CfV6EgoxqA9+y2ubeOi4Ay7eCiRC+8+avvDjnrIFoCFSX56Frnt+2cbdS+ HZcmrBCYBQ1ke+J+cG77jeacdWaV9CnLAtKJgi79jMu6sY2fFsIbKS9zW34IOD9tIxeK hlP2IIbkP5rbHdD4vEkPM+ZJWf4d66Aa0J2F5os+eURFPrq8fErwiqHgBj6BqHC0Yq/B voaTwP09BVt73NFi9LozXjx0KSlNJB2lxZBCu1hSL5kspm0BBor+llWeEdFXfWu1qHF+ p1IQ== X-Gm-Message-State: AHPjjUiQnd9lu7fKpi5AA9D/UjhEs1X+qYRPU8ctb83eX583ke5EHdzw gWdau4bCNf/WYx2Rw82W3W4znjazOctxTj0isTa+EA== X-Google-Smtp-Source: AOwi7QDArM/A7bT4O6s2J4ZA6g9n0MlHGjqVB/E4XQ6GJLkiof4eVqZZIgJLgEe5AneE/vMF+WN6+TKyE/dKTcOEv00= X-Received: by 10.46.66.84 with SMTP id p81mr505767lja.19.1505819082522; Tue, 19 Sep 2017 04:04:42 -0700 (PDT) Original-Received: by 10.25.44.200 with HTTP; Tue, 19 Sep 2017 04:04:21 -0700 (PDT) In-Reply-To: <871sn875tl.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4010:c07::230 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:14171 Archived-At: Hi Ludo, On Fri, Sep 15, 2017 at 3:56 PM, Ludovic Court=C3=A8s wrote: > Linas Vepstas skribis: > > > On Mon, Sep 11, 2017 at 2:26 AM, Ludovic Court=C3=A8s wr= ote: > > > >> Hello, > >> > >> Linas Vepstas skribis: > >> > >> > The stuff coming over the network sockets are bytes, not s-exps. Sin= ce > >> none > >> > of the bytes are ever zero, they are effectively C/C++ strings, and > are > >> > handled as such. These C strings are sent to scm_eval_string() > wrapped > >> > by scm_c_catch(). > >> > >> I don=E2=80=99t know to what extent that is applicable to your softwar= e, but my > >> recommendation would be to treat that network socket as a Scheme port, > >> pass it to =E2=80=98read=E2=80=99, and pass the result to =E2=80=98eva= l=E2=80=99 (as opposed to reading > >> the whole string from C++ and passing it to =E2=80=98scm_eval_string= =E2=80=99.) > >> > > > > Why? What advantage does this offer? > > It avoids copies and conversions, which is big deal if you deal with > very big strings. > > > Its not clear that guile eval is smart enough to manage a network socke= t > -- > > if the user starts a long-running process with intermittent prints, wil= l > it > > send that to the socket? What if the user hits cntrl-C in the middle o= f > it > > all? What if the code that came over the socket happened to throw an > > exception? > > These are important considerations, but it=E2=80=99s not eval=E2=80=99s b= usiness IMO. > Instead, I suggest building your own protocol around it, and having a > way in that protocol to report both exceptions and normal returns. > Well, yes, this is exactly what I've done. This conversation is frustrating: either piping read to eval is the right thing to do, in which case eval must handle network connections correctly, or else piping read to eval is the wrong thing to do. You can't have it both ways. > > I've had to deal with all of these issues in the past, and have a stabl= e > > code base; but if I had to start all over again, its not clear that the= se > > issues have gone away. I mean, eval was designed to eval -- it was not > > designed to support multi-threaded, concurrent network operations, righ= t? > > Right. > > > To support my point: the default guile network REPL server is painfully > > slow, and frequently crashes/hangs. It works well enough to do some dem= os > > but is not stable enough for production use ... if its just read+eval, > that > > might explain why its unstable. > > I=E2=80=99ve never noticed slowness of the REPL server, nor crashes. > You are probably using it only very lightly, and not in a high-load systems environment. It runs maybe 5x slower than my current guile shell server, and it is very definitely unstable and crashy. In my environment, I am sending it approximately from one up to twenty scheme expressions every second, with a new socket opened for each scheme expression. This goes on for days or weeks. I am using a custom guile server written in C++, which accepts network connections, reads bytes from the network, and sends them to scm_eval_string(). It mostly works fine, with a couple of problems: there seems to be a pointless utf8-utf32 conversion, which started this email chain. There also seems to be some sort of very rare race condition in the compiler that leads to corruption inside of guile. I believe that this can be triggered by starting twenty threads (for example) and then compiling and running fairly short programs in each thread. By "fairly short" I mean "less than 5-10 lines of code", and which compute and return answers in less than a tenth of a second. Doing this for a few hours eventually causes guile to hang in a spinloop, trying to read some guile-internal structure that has invalid data in it. I opened a bug report for this a month or two ago, but did not supply an easy-to-trigger test case. I tried replacing my guile network server with the REPL shell, and discovered that the REPL server is much much slower; I don't recall exactly how I measured the 5x number, but that was from an actual measurement. I don't think the REPL server can handle 20 network connections per second. I remember hypothesizing that guile was being re-initialized for every network connection. Obviously, this is wasteful and slow. Entering guile is a large bottleneck. I once measured this, and I think it takes approximately 200 microseconds to enter guile, which implies a maximum limit of about 5K guile evaluations per second, when using the simple-minded design of having the C code enter guile each time before evaluation an expression. By contrast, python (cython) can be entered in 10 or 20 microseconds. The test case here is how many times per second can one eval some simple expression, e.g. (+ 2 2) or the equivalent of that in python. The solution for the heavy cost of entering guile is to create a pool for a few dozen threads, enter guile in each, and then never exit -- just return threads to the thread pool, when the eval is completed, and the thread is no longer needed. This cuts the 200 microseconds overhead to zero, and what one is then left with is the cost of calling scm_eval_string(). I did measure that too, but I don't recall the numbers. > That said, if you run a REPL server in a separate thread and mutate the > global state of the program, you could possibly crash it=E2=80=94no wonde= rs > here. > Yes, well, I would call that a bug! It feels like you are trying to blame me for a guile bug -- its not my fault that it crashes! I did not look very carefully, and don't recall what the stack traces looked like, but I got the impression that there were race conditions in guile init, and how it interacted with the sockets. Likewise, the REPL server is meant to be used for debugging on > localhost. If you talk to a REPL server over the network with high > latency, it=E2=80=99s going to be slow, not surprisingly. > The performance problem was not the latency, it was the number of connections it could accept. I'll say it again: I have a different network server that is 5x faster than the REPL server, and it works, it is stable. For reasons completely unrelated to guile, I would like to declare my network server deprecated and obsolete. However, I cannot do this, because the guile REPL server is not yet good enough to be an adequate replacement. --linas > > So yes, I find the REPL server to be a really pleasant tool when > debugging an application locally, but that=E2=80=99s all it is=E2=80=94it= =E2=80=99s not a remote > procedure call framework or anything like that. > > Thanks, > Ludo=E2=80=99. > --=20 *"The problem is not that artificial intelligence will get too smart and take over the world," computer scientist Pedro Domingos writes, "the problem is that it's too stupid and already has." *