From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: thinking out loud: wip-rtl, ELF, pages, and mmap Date: Wed, 24 Apr 2013 22:23:33 +0200 Message-ID: <874nevhe4a.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1366835089 21183 80.91.229.3 (24 Apr 2013 20:24:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 24 Apr 2013 20:24:49 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Apr 24 22:24:53 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UV6FJ-0005M2-14 for guile-devel@m.gmane.org; Wed, 24 Apr 2013 22:24:53 +0200 Original-Received: from localhost ([::1]:44395 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UV6FI-0006Ga-K4 for guile-devel@m.gmane.org; Wed, 24 Apr 2013 16:24:52 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:38201) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UV6FB-0006Df-US for guile-devel@gnu.org; Wed, 24 Apr 2013 16:24:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UV6F6-0005BZ-8O for guile-devel@gnu.org; Wed, 24 Apr 2013 16:24:45 -0400 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:36086 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UV6F6-0004m1-2k for guile-devel@gnu.org; Wed, 24 Apr 2013 16:24:40 -0400 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id 063B0A577 for ; Wed, 24 Apr 2013 16:23:37 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to :subject:date:message-id:mime-version:content-type; s=sasl; bh=T bvr5E0/+ygB0xCxNfbwI3Bpjz8=; b=IyszNyLoTzzPJLXmtiS1819ePM5yV64yz wFomYOo72bJ2AAUMnKE0fbuBLPhhMk+4P6dVWZ9GWB0fA36cHM8KPWofc4E1GwPf 7G/HVT/2y+pHtMY1xVcimrQf0dN1jfrwT8bhKVGA6wYSC8lzgAxAoKLQxSMefLqD uHOi7T0EdE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sasl; b=vn3 oZ4GeiVG/7BrGSw9Gi/RgM70mKNi7BLgWlbVsRzL6uawaDkqNf8lFQC2M6PaJCCM I49Hgff4erho2Mn47yBRH9kBfhKK/IXmsL1AJS1veNrPGX/7xCuss0WMiF65j0XX 46NjEXHyB1yf0ZniCaVIa/09vmMOWdmpd1VSKvTw= Original-Received: from a-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id EEC5BA576 for ; Wed, 24 Apr 2013 16:23:36 -0400 (EDT) Original-Received: from badger (unknown [88.160.190.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id 3F4FDA575 for ; Wed, 24 Apr 2013 16:23:36 -0400 (EDT) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-Pobox-Relay-ID: D7AF9DC8-AD1C-11E2-9D79-C8310E5B5709-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 208.72.237.25 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16293 Archived-At: Hi, I've been working on wip-rtl recently. The goal is to implement good debugging. I'll give a bit of background and then get to my problem. In master, ".go" files are written in the ELF format. ELF is nice because it embodies common wisdom on how to structure object files, and this wisdom applies to Guile fairly directly. To simplify, ELF files are cheap to load and useful to introspect. The former is achieved with "segments", which basically correspond to mmap'd blocks of memory. The latter is achieved by "sections", which describe parts of the file. The table of segments is usually written at the beginning of the file, to make loading easier, and the table of sections is usually at the end, as it's not usually needed at runtime. There are usually fewer segments than sections. You can have segments in the file that are marked as not being loaded into memory at runtime. Usually this is the case for debugging information. OK, so that's ELF. The conventional debugging format to use with ELF is DWARF, and it's pretty well thought out. In Guile we'll probably use DWARF, along with some more basic metadata in .symtab sections. I should mention that in master, the ELF files are simple wrappers over 2.0-style objcode. The wip-rtl branch takes more advantage of ELF -- for example, to allocate some constants in read-only shareable memory, and to statically allocate any constants that need initialization or relocation at runtime. ELF also has advantages when we start to do native compilation: native code can go in another section, for example. * * * OK, so that's the thing. I recently added support for writing .symtab sections, and have been looking on how to load that up at runtime, for example when disassembling functions. To be complete, there are a few other common operations that would require loading debug information: * Procedure names. * Line/column information, for example in backtraces. * Arity information and argument names. * Local variable names and live ranges (the ,locals REPL command). * Generic procedure metadata. Anyway! How do you avoid loading this information at runtime? The original solution I had in mind was to put them in ELF segments that don't get loaded. Then at runtime you would somehow map from an IP to an ELF object, and at that point you would lazily load the unloaded ELF sections. But that has a few disadvantages. One is that it's difficult to ensure that the lazily-loaded object is the same as the one that you originally loaded. We don't keep .go file descriptors open currently, and debugging would be a bad reason to do so. Another more serious is that this is a lot of work, actually. There's a constant overhead of the data about what is loaded and how to load what isn't, and the cross-references from the debug info to the loaded info is tricky. Then I realized: why am I doing all of this if the kernel has a virtual memory system already that does all this for me? So I have a new plan, I think. I'll change the linker to always emit sections and segments that correspond exactly in their on-disk layout and in their in-memory layout. (In ELF terms: segments are contiguous, with p_memsz == p_filesz.) I'll put commonly needed things at the beginning, and debugging info and the section table at the end. Then I'll just map the whole thing with PROT_READ, and set PROT_WRITE on those page-aligned segments that need it. (Obviously in the future, PROT_EXEC as well.) Then I'll just record a list of ELF objects that have been loaded. Simple bisection will map IP -> ELF, and from there we have the section table in memory (lazily paged in by the virtual memory system) and can find the symtab and other debug info. So that's the plan. It's a significant change, and I wondered if folks had some experience or reactions. Note that we have a read()-based fallback if mmap is not available. This strategy also makes the read-based fallback easier. Thoughts? Andy -- http://wingolog.org/