From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 40C7A1F487; Tue, 31 Mar 2020 08:32:50 +0000 (UTC) Date: Tue, 31 Mar 2020 08:32:50 +0000 From: Eric Wong To: meta@public-inbox.org Subject: how to gracefully handle spaces in Message-IDs? Message-ID: <20200331083250.GA27164@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline List-Id: There exist Message-IDs with spaces in them, at least (and maybe other strangeness) Take this example: https://lore.kernel.org/lkml/200203040330.g243URr05337@3%20(NXDOMAIN)%20/ That is: Message-ID: <200203040330.g243URr05337@3 (NXDOMAIN) > RFC 3977 (NNTP) struggles with that with HDR/XHDR commands, since it's split-on-spaces-or-tabs behavior. Not only that, even with a successful attempt to handle parsing of spaces in the Message-ID for -nntpd requests, Net::NNTP has trouble parsing responses with spaces in the Message-ID. I haven't tried other NNTP clients, but I don't expect clients to know what to do with invalid Message-IDs in responses, either... RFC 5322, Appendix A.6.3. Obsolete White Space and Comments has a particularly nasty example: Message-ID : <1234 @ local(blah) .machine .example> And RFC 733 is full of examples with spaces in Message-IDs for the historically-inclined: But I haven't found relevant docs on how to handle that case for NNTP in RFC 977 or 3977... In innd(*), the nnrpd/article.c::CMDpat function for HDR/XHDR commands calls lib/messageid.c::IsValidMessageID with the `stripspaces' parameter as `true', but `stripspaces' only strips leading and trailing whitespace. So I'm thinking at least stripping leading+trailing spaces is something we should be doing, and spaces in the middle of the Message-ID need to be preserved. But, maybe non-printable control characters can also be filtered out entirely, since I've definitely seen those in headers when they don't belong. I suspect those were introduced by hardware errors or software bugs. Anyways, my head hurts :< (*) svn co https://inn.eyrie.org/svn/trunk innd,