From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Zach Shaftel Newsgroups: gmane.emacs.devel Subject: Re: Update 1 on Bytecode Offset tracking Date: Thu, 16 Jul 2020 18:45:00 -0400 Message-ID: <87blkfoz9v.fsf@gmail.com> References: <87a700fk3j.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="20405"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.4.10; emacs 28.0.50 Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jul 17 00:46:04 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jwCdg-0005Gs-A7 for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Jul 2020 00:46:04 +0200 Original-Received: from localhost ([::1]:51038 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jwCdf-0001OE-Ap for ged-emacs-devel@m.gmane-mx.org; Thu, 16 Jul 2020 18:46:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37604) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jwCcj-0000rn-6K for emacs-devel@gnu.org; Thu, 16 Jul 2020 18:45:05 -0400 Original-Received: from mail-qk1-x729.google.com ([2607:f8b0:4864:20::729]:34891) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jwCch-0004UY-4U for emacs-devel@gnu.org; Thu, 16 Jul 2020 18:45:04 -0400 Original-Received: by mail-qk1-x729.google.com with SMTP id q198so7176664qka.2 for ; Thu, 16 Jul 2020 15:45:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=FkXkTisf++B9N+4Cq7B4ZWx+uf65mLeDcyuRiJXtw0M=; b=uTsg3GPGabdWt3dkF7GAnhyzZcW/VXishRvCyBV+NumiZPr/3ebcUFJCyhOnJyQUW2 MuHXAthj28FhUIu0LNFlu0D7JGV4fpBhRvgNY7Z1rVF4zmkzXtgn9b+i8T6xp1L0ii4w C00QEgBSHwrFxeJKs58gnWpLDuZ8F/l4iU4z0g0jvAWiIF/o+JMMnW8exXackEwksFI5 idWzY2vt4PVUJrDXnfan9vZlBbvnaFAVoERFnwIexQq5yovSpU+rfO1X9y7qLTgUpM81 YWLBkl+ApGlR+JPVhijUzoPwvtrsuElmNr0kHld4FsYc9pFW1eB/O2vIHx+1e1LFcsf8 W1Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=FkXkTisf++B9N+4Cq7B4ZWx+uf65mLeDcyuRiJXtw0M=; b=WC6uVQwpLpls/jg06IBFgbl6gPmUXv526H/zbkVF4ecmMyfldYGtqZ+ZqdwRZE+HGg zC8l5JkcQWcdRIFgXygPnzm6osfPPaNNePIUPSBPFMs7aNqiPOO7HDInsISj8K1v8th4 cdQ1NrWJmiRbGN/s7Viii5QLX2hcmeh44gGrgU7QxQ22dbKUpVWDtEsTkKISXKfAuuSt WSJ8WseI1xhY4cqMFpAraz1zllboBJW28SpNVpVxy7BxYJJUOHF0cLP6bG/mMp4f662o FRQU666byBCZMVJXbcCsTrNeYVIaLO4n5czdyLylb8VI7B9C0Gm7RCJCRQnuEKNpvTvw LnAA== X-Gm-Message-State: AOAM530nGLKYRkmPWhd/Dv2he9R71dRUO4MxkZR53yv9oQzoHBFczj1z cCCUAgxOYOwIJeSnbrBlfTEnLs3wlyJIdA== X-Google-Smtp-Source: ABdhPJwfPKj8ryrFrYt5c/Yq6Un4RdTVDayXVeeWmidj3WDOm9XXUqI6yUuO/M+vVUStD5iqYx43nQ== X-Received: by 2002:a37:6488:: with SMTP id y130mr6293260qkb.194.1594939501523; Thu, 16 Jul 2020 15:45:01 -0700 (PDT) Original-Received: from arch-thinkpad ([2604:2000:2f41:2d00::1]) by smtp.gmail.com with ESMTPSA id t36sm9849090qtj.58.2020.07.16.15.45.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jul 2020 15:45:00 -0700 (PDT) In-reply-to: Received-SPF: pass client-ip=2607:f8b0:4864:20::729; envelope-from=zshaftel@gmail.com; helo=mail-qk1-x729.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:252991 Archived-At: Hi Stefan, Stefan Monnier writes: >> The second branch saves the offset only before a call. Therefore, the >> traceback on all of the functions other than the current one are >> accurate, but the current one is not accurate if the error happens in >> a byte op. > > IIUC this has a negligible performance impact. The info it provides in > not 100% accurate, but I think it's a "sweet spot": it does provide the > byte-offset info and is cheap enough to be acceptable into `master` with > no real downside. Great! I just followed up on my copyright assignment as I still haven't finished that process. I don't know whether this could be exempt or if Rocky's assignment is sufficient, but hopefully I will hear back from copyright-clerk soon. > I'd look at it as a "step" along the way: subsequent steps can be to > make use of that info, or to improve the accuracy of that info. Absolutely, it would be great to have that in place as a basis for further improvement. >> The third branch bypasses invoking Ffuncall from within >> exec_byte_code, and instead does essentially the same thing that >> Ffuncall does, right in the Bcall ops. > > This would be useful in its own right. > So I suggest you try and get this code into shape for `master` as well. I will definitely continue work on this. > I expect this will tend to suffer from some amount of code duplication. > Maybe we can avoid it via refactoring, or maybe by "clever" macro > tricks, but if the speedup is important enough, we can probably live > with some amount of duplication. That seems to be the case. I'll keep looking to see if there's any low hanging fruit in terms of splitting up the funcall logic without slowing things down. More testing is necessary, but if a moderate chunk of duplicated code is acceptable then there may not be as much work needed on that branch as I had thought. >> All of them print the offset next to function names in the backtrace lik= e this: >> >> >> Debugger entered--Lisp error: (wrong-type-argument stringp t) >> string-match(t t nil) >> 13 test-condition-case() >> load("/home/zach/.repos/bench-compare.el/test/test-debug...") >> 78 byte-recompile-file("/home/zach/.repos/bench-compare.el/test/test= -debug..." nil 0 t) >> 35 emacs-lisp-byte-compile-and-load() >> funcall-interactively(emacs-lisp-byte-compile-and-load) >> call-interactively(emacs-lisp-byte-compile-and-load record nil) >> 101 command-execute(emacs-lisp-byte-compile-and-load record) > > Cool! > >> With respect to reporting offsets, using code from edebug we have >> a Lisp-Expression reader that will track source-code locations and >> store the information in a source-map-expression cl-struct. The code >> in progress is here. > > How does the performance of this code compare to that of the "native" `re= ad? Rough tests indicate it's about three times slower. Running it on all 274 files in the `lisp` directory of the GNU Emacs sources takes ~11-12 seconds (after removing the string from the struct and not pretty-printing). A similar function which just calls `read` takes ~4 seconds. There are probably ways to further improve the performance of `source-map-read`, but I don't know much more speed can realistically be gained. > And to put it into perspective, have you looked at the relative > proportion of time spent in `read` during a "typical" byte compilation? I have not yet, but I'll evaluate that and keep it in mind. > There's no doubt that preserving source code information will slow down > byte-compilation but depending on how slow it gets we may find it's not > "worth it". > >> Information currently saved is: >> >> * The expression itself >> * The exact string that was read >> * Begin and end point=E2=80=8Bs of the sexp in the buffer >> * source-map-expression children (for conses and vectors) > > Sounds like a lot of information, which in turn implies a potentially > high overhead (e.g. the "exact string" sounds like it might cost O(N=C2= =B2) > in corner cases, yet provides redundant info that can be recovered from > begin+end points). Removing the string did improve performance, but not by as much as I expected. The function that constructs the tree of "children" may be slower than it needs to be, so I'll look into improving that. It may not be necessary to create the children for vectors since they're constants (outside of backquote, at least). >Note also that while `read` returns a sexp made exclusively of data >coming from a particular buffer, the code after macro-expansion can >include chunks coming from other buffers, so if we want to keep the >same representation of "sexp with extra info" in both cases, we can't >just assume "the buffer". Yes, and it won't be easy to maintain the read locations across macroexpansion, byte-opt and cconv. It's tough to say at this point how much the final product will slow down compilation, but I suspect it will be significant. > Stefan