From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D5EAF6DE0ACB for ; Tue, 28 Nov 2017 12:46:35 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0.23 X-Spam-Level: X-Spam-Status: No, score=0.23 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TEK1dZgJ8ByY for ; Tue, 28 Nov 2017 12:46:34 -0800 (PST) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by arlo.cworth.org (Postfix) with ESMTPS id 43A766DE014D for ; Tue, 28 Nov 2017 12:46:34 -0800 (PST) Received: by mail-wm0-f68.google.com with SMTP id b76so2031922wmg.1 for ; Tue, 28 Nov 2017 12:46:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:subject:date:message-id; bh=4Mo0dlSwaqGY5QR/EbNN7OColY1BlZxS6nTLk+wzFhE=; b=hO3TYxHSEp8fI3/G+jyYcipOB1LGmtvhb97ozMsbxuHxWb6xKVv80+jGbTrPuUW/Co PEBctu/qBOW33RGBTjQxhQ6YFxQJYT2bxcLiYxpQsqCdxyXTV28LEUX6g9Z4MutLP51f IoTRyhPZ6P67dRYGJ7Jb6b67CpBF1S/ykr6BLYfcjtDXeIfJL8c8CT4vee6NCvzQP4Tt bfEEgtK5rdAAhPCME8rAWhEACYKT9emSXqEXKLZIfKLqIyLv5BDn6fXdXLD/SxEc/EXi G7KpibSGdNFxKgCW+Unm2bnk81feg0LrLnKrCmD0ZhVFv0B5RCXMUzrMf+7w7cmKKJpZ HREg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id; bh=4Mo0dlSwaqGY5QR/EbNN7OColY1BlZxS6nTLk+wzFhE=; b=MMgktFjmP4NQckKPmB2ZudfpYTI0OGvB55qdgxGW/P1WhUaC+cPzGPtoFbdcEh+Ekt iK8BSD0sPFLPlGUkzTPire4M3RuQMqoqOr9qCx4Wzd0ca1LeHZ3m/yq0LaSJM0PzginF fQbnRGTeo9RQOBTuYC7Amd1Gsqiltsq8EEKfFfATHKrbLRY9SktAhcUAlpmiTb/2juvt DtyuiTbFCiRqQJ9YTTKqAmZ4OoC0Hln47A/W3lan/eZIj/7vUkP68KKfo/7XhqP5dQgw am5Fzvt2UbJOAEQQXXojHXKqkMJ9kvVHfdSc8jumuYGYp0ewoFyT8w/+Ai9CaRFpc4F4 d3vA== X-Gm-Message-State: AJaThX6PXtO6gkifunc+0TpShDw2Cra+8nZRCTWyZXp3RsSYOnSaLjfI faQnxPEUmPFP8cqJIVaEJIwL+SW/ X-Google-Smtp-Source: AGs4zMYlaKM8Ubv4s1KL28ZMtR9TCPF/L4MwD7lSudy8n5H6peoHhBsBEsVVNYUHR0PjbXarh5rKAg== X-Received: by 10.80.166.156 with SMTP id e28mr4254102edc.51.1511901991872; Tue, 28 Nov 2017 12:46:31 -0800 (PST) Received: from devork.be ([2a02:168:58e4::b89]) by smtp.gmail.com with ESMTPSA id a5sm211598edm.47.2017.11.28.12.46.29 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 28 Nov 2017 12:46:30 -0800 (PST) Sender: Floris Bruynooghe Received: (nullmailer pid 18556 invoked by uid 469025); Tue, 28 Nov 2017 20:46:29 -0000 From: Floris Bruynooghe To: notmuch@notmuchmail.org Subject: DRAFT Introduce CFFI-based Python bindings Date: Tue, 28 Nov 2017 21:46:07 +0100 Message-Id: <20171128204608.12210-1-flub@devork.be> X-Mailer: git-send-email 2.15.0.417.g466bffb3ac-goog X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 20:46:36 -0000 Hi all, Here are the beginnings off CFFI-based Python bindings, rather than the ctypes-based ones currently available. I started this work in order to get faster bindings on pypy since a script of mine was running slower on pypy than CPython. Initially aiming for a drop-in replacement of the existing bindings I ended up abandoning this to help enforce correct usage of the API. The benefits of this approach are: - More "Pythonic" API, e.g. tags behave like sets, iterators which get consumed can easily be re-created as is usual with collections, avoid allowing invalid combinations of args and calls on a Python-API level. - CFFI, this works on both CPython and PyPy, on the latter it is (supposed to be) a lot faster as the JIT can cross the boundary between C and Python code where it otherwise has extra overheads to emulate the C-Python API. Additionally it makes it safer to use compared to ctypes, it works on the API level using the compiler to figure out the correct details of the platform. Compared to ctypes which only works on the ABI level and you need to rely on knowing the layout of code when writing the Python bindings. Additionaly I belive these bindings fix a memory safety issue, certain situations in my test-suite would lead to coredump which is not something which should be possible from within Python. I believe I have seen similar reports in the list archives so am not the only one seeing these. Sadly these are hard to isolate and I have not managed to re-create this in a nice minimal example, however I believe the root cause is that in some situations, mostly interpreter shutdown, the __del__ method can have been called while there are still references to the object and while child-objects are still alive. This effectively results in double-frees as the child object frees memory already freed by the parent. These bindings solve this by adding the .alive property and using this to check parent objects are still alive before destroying themselves. This is somewhat expensive, but works and is easy to implement. Lastly there are some downsides to the choices I made: - I ended up going squarely for CPython 3.6+. Choosing Python 3 allowed better API design, e.g. with keyword-only parameters etc. Choosing CPython 3.4+ restricts the madness that can happen with __del__ and gives some newer (tho now unused) features in weakref.finalizer. - This is no longer drop-in compatible. - I haven't got to a stage where my initial goal of speed has been proven yet. In theory I think it's possible to create a CFFI-based drop-in replacement to the bindings, only adding the memory-safety fixes and keeping the Python 2.7 compatibility. It would then be possible to build the API proposed in these bindings on top of this, but once I was making these bindings safer it felt strange to still allow the API to be misused. There are a lot of details about this which can be discussed, also many finer implementation points and even just getting the proposed API right (you'll notice large gaps for now). But this mail is already too long. I look forward to your comments and feedback on the approach taken and on whether some form of this could make it into the main repo. Lastly a small note on the AUTHORS file patch, due to my own unfortunate choice of employer I have strict rules to follow on how to submit patches. One of which is to add this line if an AUTHORS file exists. Given clearly not everyone is listed here though maybe this is not appropriate. I would also rather receive email on flub@devork.be rather than the address I have to use in the git commits. Kind Regards, Floris