From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id KM8MFdyYb2PNCgEAbAwnHQ (envelope-from ) for ; Sat, 12 Nov 2022 14:00:12 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id sIzHFNyYb2PFPQEAauVa8A (envelope-from ) for ; Sat, 12 Nov 2022 14:00:12 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E8CE7B1EF for ; Sat, 12 Nov 2022 14:00:11 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1otq6F-0007SL-1M; Sat, 12 Nov 2022 07:59:07 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1otq6C-0007Rc-Jw for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 07:59:04 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1otq6A-0002K3-Ew for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 07:59:04 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id D8FE2240027 for ; Sat, 12 Nov 2022 13:58:59 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668257939; bh=sxRIX+3ZnJ793hWwYu3BdR+k4G4oeGOOlEIanWrVpnM=; h=From:To:Cc:Subject:Date:From; b=ZAB7hEX8BobHQXrOIvIpJhjTp1GXtSfamjjD/wUhAFRwno/XL6X8iuvmfbEPKjR5Z hLEygNq5GmKIT9FzZetoOBe+1NzLDM+N20QxZZkQk0mZl5kawcHu+R2CUzIRq0NO7c wkOLuDxQgKkhOcJnY0HtFb7H5wOay/pD7Q8tdw7mrCBhX7m3ahQAZjp/a2LybbKhGp FddukPXjAHVatdka6sz6q9pjaWkQP9T0qPdLNXQBbxvuDij0LwQ75p/d2P+xZZP/XQ Nge+jDIMp12EyxwiXPIFofZU/6XtNZkZ9Ywkz+FR00VbaQwPwtgtzo13O+Khf6HEZO TLM6CdPYrSEkQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4N8bHQ4Qm5z6tmF; Sat, 12 Nov 2022 13:58:58 +0100 (CET) From: Ihor Radchenko To: Tommy Kelly , Bastien Cc: emacs-orgmode@gnu.org Subject: [BUG] Null character in block/drawer regexps (but not in org-element parser) (was: BUG? Null character prevents org-babel-tangle from tangling a block) In-Reply-To: References: Date: Sat, 12 Nov 2022 12:59:40 +0000 Message-ID: <875yfk9vlv.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=185.67.36.65; envelope-from=yantar92@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1668258012; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=/mRCQET4vPrtOdPTsQBxkRNzJWZH8KwV4MMv9BXxsiw=; b=tnw8z+BPq5wMuRXj9/02e4tu2dcvJSL3+vLHNz4Y5mndh03Utg3l85BzhFFPHtkVYtAKf8 lVPr0AAA7t8azHUJE+XOfIpN34M4KSmwIbVPkKKpXerwMkGLcFf+Olg5Hm7HvmcciaeVRl Oy9baemxF/Rxfdm+fevpzhJ0QMWA3ghuLn2Nz7FwPNhZx8ee37XmKC10KLkXf4eAKPArSd bw1OZ9DauvYGf2WtaYN0t5nBDpD0/Q2L9/1vnn+M8PQYornvypcOVvdXdQMvnnNd4xHFdA nhIkb0BjshdaetsTd4qdBIfBVMjuo8Hwr4tfwMsMUcxGSLcKaQhkG0gKanFPrw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1668258012; a=rsa-sha256; cv=none; b=Cgvi6lILkTy9zfCgUXcvQv7TRCDw+VM24x1o3sd1x2ypLnvrFM3gj0HpbJ5ElWI+vZfIC+ OKuznGWxfM0nrr1CVvDAJH12Xg5gA+uaLLDaJ2p+RZdXGq+fiUyj+x3Q6/mVqWIHtTfwvA uB1Ph3bpfeuimgWU9hAz2jpu+7i9qbKr/T0Ai69qdPQoKBEqR9OqYlqnZSHNOiDhvC+j05 78oICphpPBqXYX0ikmRQMSCEAufoyHG6As5pS4L4uQyUAi8iR88lmcBINozwUcuWlMoh5B CEllcnlKMpfjxWARYqcsM2zZ9sGP1HXOHreNIXxlze31jH5nKCUFN+WDE2o/oA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=ZAB7hEX8; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -5.34 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=ZAB7hEX8; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: E8CE7B1EF X-Spam-Score: -5.34 X-Migadu-Scanner: scn0.migadu.com X-TUID: PaiRyU/DkXu6 Tommy Kelly writes: > The attached .org file describes a simple test to demonstrate the problem. > I've also attached a .zip version, in case the NULL character in the test > doesn't survive the gmailing process. (The null is In BLOCK 2, two > characters after the '3' in ';; line3' If it's there, you should see the > usual ^@ (as a single character) placeholder. Confirmed. This is because `org-babel-src-block-regexp' explicitly prohibits null characters in the body. Similar situation is with `org-block-regexp', `org-clock-drawer-re', `org-latex-regexps', and a number of other places in Org sources. At least for src blocks, prohibiting null character is inconsistent with org-element parser. I am not very sure what is the rationale behind now allowing null character. I see no clues in git history and a single possibly relevant comment in `org-latex-regexps': ;; \000 in the following regexp is needed for org-inside-LaTeX-fragment-p However, `org-inside-LaTeX-fragment-p' itself is outdated and needs to be replaced with org-element machinery. So, we should probably remove zero-width shenanigans from the code. Unless I miss something. Bastien, maybe you recall something about presence of null character in regexs? P.S. If we decide to remove the null character, I'd prefer to do it after the release: this change may affect a lot of code and the bug is not that major to risk breakage. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at