From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.lisp.guile.devel Subject: Re: thread safe functions Date: Sat, 28 Aug 2010 21:33:25 -0400 Message-ID: <5105E811-5A0E-4D7B-9A4A-D3DABB12406F@raeburn.org> References: <20100805112743.GA1671@securactive.net> <8C8EEFE6-77CA-42E7-A2FB-9CEF4E83CDFF@raeburn.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1283045627 15477 80.91.229.12 (29 Aug 2010 01:33:47 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 29 Aug 2010 01:33:47 +0000 (UTC) Cc: guile-devel Development To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Aug 29 03:33:45 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OpWmF-0002Ne-R2 for guile-devel@m.gmane.org; Sun, 29 Aug 2010 03:33:44 +0200 Original-Received: from localhost ([127.0.0.1]:59305 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OpWmE-0001UG-Lf for guile-devel@m.gmane.org; Sat, 28 Aug 2010 21:33:42 -0400 Original-Received: from [140.186.70.92] (port=38809 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OpWm9-0001U7-VH for guile-devel@gnu.org; Sat, 28 Aug 2010 21:33:39 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OpWm7-0001mG-Py for guile-devel@gnu.org; Sat, 28 Aug 2010 21:33:37 -0400 Original-Received: from splat.raeburn.org ([69.25.196.39]:47746 helo=raeburn.org) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OpWlz-0001m2-Jc for guile-devel@gnu.org; Sat, 28 Aug 2010 21:33:35 -0400 Original-Received: from squish.raeburn.org (squish.raeburn.org [10.0.0.172]) by raeburn.org (8.14.3/8.14.1) with ESMTP id o7T1XPER009033; Sat, 28 Aug 2010 21:33:25 -0400 (EDT) In-Reply-To: X-Mailer: Apple Mail (2.1081) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:10806 Archived-At: On Aug 28, 2010, at 15:20, Andy Wingo wrote: >> Unfortunately this applies to some internals of the implementation = too. >> For example, "set-object-property!" and friends use a hash table and >> assoc lists internally. >=20 > Fixed, thanks. >=20 >> scm_c_issue_deprecation_warning and scm_c_register_extension don't = look >> thread-safe to me (in the sense described above). >=20 > Fixed also. Thanks! Though... looking at deprecation.c, I'd suggest not printing stuff while = the lock is held; just keep track of whether it needs to be printed, and = do the printing after releasing it. Partly because locks should be held = only briefly, and printing can block, but also if the error port can = call back into other Scheme code, which may call a deprecated function, = then you could get an error trying to acquire the lock (which we won't = notice, because we never check), or deadlock. I was starting to look through the port code for this, but got = distracted by scm_ptobs, a pointer which appears to be dynamically = reallocated as needed when scm_make_port_type is called. Which means = every function that reads it should be locking the same mutex used when = updating it; in this case, that appears to be the generic "critical = section" stuff, and it looks like we're not doing that. So, hmm, where else do we use hash tables internally? foreign.c: pointer_weak_refs is a weakly keyed hash table, which we add = things to without locking. gc.c: scm_protects is protected; good. instructions.c: Can scm_lookup_instruction_by_name be called the "first" = time from two threads concurrently? It uses a hash table stored in a = static variable, and the variable is updated before the hash table is = filled in. (Even if you simply assign to the variable afterwards, = there's no requirement that the compiler or CPU not reorder those = assignments.) But, as an aside, do we need all that in C, or could = instruction-list just export a big Scheme list that could be examined = (or turned into a hash table) by Scheme versions of the other functions? keywords.c: A critical section protects keyword_obarray; good. modules.c: scm_pre_modules_obarray doesn't look protected. ports.c: scm_i_port_weak_hash is protected in some places, others I = haven't investigated further. But in scm_c_port_for_each, it looks to = me as though, if ports are being created while the function is setting = up, you could end up raising an exception if scm_internal_hash_fold = iterates over more ports than were in the hash table when the count was = initially taken (and the vector size set). procproc.c: There's a mutex to protect overrides, but it looks like = set-procedure-property! doesn't use it correctly; it should look more = like set-object-property! does. properties.c: I'm not familiar with this code at all, but I'd guess = properties_whash should be protected. Though I wonder if the call-out = from primitive-property-ref could invoke other primitive-property = functions, causing a deadlock if it isn't done carefully. smob.c: I don't think tramp_weak_map is adequately protected, but I'm = not certain. srcprop.c: scm_source_whash isn't protected; it also appears to be = exposed by the name "source-whash" to Scheme code, which wouldn't be = able to use a mutex defined and used purely in the C code. (What's that, four different kinds of property-list mappings, all = implemented separately?) struct.c: Even calling struct-vtable-name can cause a hash table entry = to be created, it appears. So that's not thread-safe, never mind the = call to actually change the name. symbols.c: I don't think 'symbols' is handled safely. But this code is = all starting to run together in my mind. :-) weaks.c: Is only making things for Scheme code, where we're making = locking the user's problem. It would be a bit easier if hash tables themselves were thread-safe. A = read/write lock might make more sense than a simple mutex though, if = reading is more common than writing. In my experience they sometimes = perform worse than simple mutexes, so they're a win mainly for = concurrent reading. Of course, hash tables aren't the only places where we can have = problems, and I'm not even touching on hash tables that might be created = by Scheme code (but exposed to users with APIs that don't scream "unsafe = hash table used here!"), but I've already spent my afternoon and evening = on this. >> That's just from a spot check; I'm not even trying to be thorough. >=20 > I await your more thorough check :) Yeah, in my copious free time... :-( Maybe some more next weekend, if = I'm not doing the stuff I was supposed to get done last weekend. >> And don't get me going on memory ordering models.... >=20 > I'm interested! There's a lot to read out there, and I'm no expert, but the main idea = is, depending on the specific architecture, memory loads and stores can = be almost arbitrarily reordered between CPUs unless you do something to = prevent it. So, if you do something like: (preconditions: z is a pair visible to both threads; SCM_CAR(z) is a = pair) thread 1: allocate cons cell x SCM_SETCAR(x, v1) SCM_SETCDR(x, v2) SCM_SETCAR(z, x) thread 2: t=3DSCM_CAR(z) read SCM_CAR(t) Assuming that reads and writes of SCM values are atomic, thread 2 will = read t as either the old value, or x. If it's x, though, SCM_CAR(x) = might *not* be read back as v1, but whatever garbage was in there when = (or before) the cell was allocated. And instead of the CAR slot could = also be some bits in a non-immediate value that indicate a subtype, = causing the reading thread to call the wrong routines to manipulate it. Basically, any time you make an object handle visible to other threads, = you need to make sure the object's contents have been stored first, and = the other processor hasn't loaded that location yet -- or, if it might = be cached, it knows to flush that entry. (Multiple threads can run on = one processor, or multiple cores sharing a cache, in which case it's = probably not a problem. A thread can also be moved between cores, but = the OS is responsible for keeping things in sync in that case.) = Changing an object already visible to other threads probably requires at = least as much caution, but given that malloc/free are likely to reuse = storage, it's not even reasonable to assume that a location in a freshly = allocated cell isn't cached by some other processor from back when it = was part of some other object, so, I think actually the two problems are = largely equivalent; a visible live object might merely be more likely to = be in another cache. According to the Wikipedia article on memory ordering, the Alpha has = (some of them are still out there) a really interesting property: = reordering dependent loads, i.e., loading data before loading the = pointer to the data. It isn't described in detail, but I assume it's = related to speculative loading or something. Mutexes encapsulate "something to prevent it" pretty well; hacks like = "volatile" don't, though they can get the compiler somewhat out of the = way. Of course, adding a mutex to every non-immediate value is far too = expensive in memory use, even if the almost complete lack of contention = means the locking can probably be done with few context switches into = the kernel. Using one global mutex can work, but it basically means = only one thread can be examining or manipulating Guile objects at a = time. For I/O or DNS lookups, that's probably fine, but we do want to = be able to manipulate data structures concurrently. There are lower-level facilities generally available, like = memory-barrier instructions and atomic operations (usually = compare-and-swap, or load-link/store-conditional), which could probably = be used more efficiently; there are even lock-free algorithms for doing = some simple data structures safely using these primitives. But the = details vary, and doing more synchronization than you're actually = required to can hurt performance, so you really need to tune it well. = This is the sort of thing that kernel and C-library developers sometimes = spend a lot of effort trying to get right, so that the kernel and libc = facilities (e.g., mutexes) will be both correct and efficient. To make it worse, the compiler can reorder memory access instructions if = it can show there aren't aliasing problems (i.e., that the locations = being accessed cannot be the same assuming the source code is following = the rules of the language -- and that last clause may let a compiler = reorder accesses via "int" and "double" types that happen to map to the = same location!), and it can even write (or not write) intermediate = values to an output location. So, for example, in: register SCM x =3D ...; // so it can't be aliased by memory accesses SCM_SETCDR(x, v1); SCM_SETCAR(x, v2); the compiler could change the order of the writes, since it knows the = locations are different (because there's a fixed nonzero offset between = them), and under the specs of the C language there's no way to stop in = between the two statements and examine the values stored, so the two = versions are the same under the "as if" rule that says only the overall = side effects matter. The original order will often be retained to help = with debugging, because no optimization benefit was found in reordering, = etc., but you can't make your program's semantics depend on that. Or: register int a, b, c; n =3D foo() + a*b*c; // store to mem could -- at least theoretically -- be implemented as: n =3D foo(); tmp =3D a*b; tmp *=3D c; n +=3D tmp; though offhand I'm having a hard time thinking of a realistic case in = which a compiler would be likely to do that. This is where "volatile" would help a bit. But simply making all = non-immediate objects volatile would probably be overkill. Now, how *much* is this a problem for user-mode programs in practice? I = don't know. I've read a few papers, but haven't actually worked on code = at this level or researched the architecture details of many processors = and memory subsystems. (The multiprocessing stuff I've done all used = mutexes, if not always very efficiently.) A couple of web pages I've = run across have suggested that some published processor errata have = pointed out memory ordering problems, so the actual requirements for a = given processor family may be stricter than advertised by the = architecture manuals. And sometimes updates to the architecture manuals = can change these specs. I think we'd need someone with more low-level = experience to help answer that. I suspect many programs can run just = fine for a long time without race conditions in them actually causing = visible problems; then, the next day, maybe they'll crash. I recommend the "Memory ordering" page at wikipedia as a good starting = point for reading up on it, and the first couple of references at least. = Also Boehm's paper "Threads Cannot be Implemented as a Library" is an = interesting read about the compiler interactions, though if we're trying = to bypass having to use locks altogether, the problems we face are = probably larger than the ones he describes. > I think we agree, but I prefer to paint this in a more optimistic > light -- that things are mostly there, but bugs are possible. Bug = fixes > are also possible :) You might guess from the above that I don't think they're mostly there. = That bugs are possible, yes, that I'll grant you. ;-) A real review should check: * any non-const static-storage variables defined in the library, and = how they're used; * any functions modifying objects, to determine the effect of = concurrent calls; * any uses of those functions or objects from our C or Scheme code, to = see if we're presenting the correct semantics in the face of concurrent = calls; * that we actually specify the semantics, or when results are = undefined, clearly; * anything dealing with OS interfaces (or other library APIs) that = aren't thread-safe; And if those aren't going to take long enough already: * getting *real* answers to the memory ordering issues; * possibly reviewing every function manipulating memory for ordering = issues; * probably more. Then, I might have some confidence that "things are mostly there." I = don't think it's going to happen at all for 2.0, and I don't expect to = have time myself to do anything more than little bits of work on the = first part of the list, if even that much, in the near future. My work today didn't do any of this; I just started with looking over = the patches you checked in, and then ran "grep hash_ta *.[ch]" in = libguile, to see where else we've got hash tables, since we've already = seen that they're a problem in at least one other place. But even with = just that, I found quite a few things that I think need to be looked at. Ken=