From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Visuwesh Newsgroups: gmane.emacs.devel Subject: Re: Non-ASCII characters in man pages produced by groff 1.23 Date: Sun, 29 Oct 2023 22:24:32 +0530 Message-ID: <87edhd2wtz.fsf@gmail.com> References: <9fb217d7-37d9-42e1-b3d2-227903d2d182@vodafonemail.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30764"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Jens Schmidt Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Oct 29 17:55:32 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qx94W-0007s2-4f for ged-emacs-devel@m.gmane-mx.org; Sun, 29 Oct 2023 17:55:32 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qx93g-0000av-Ox; Sun, 29 Oct 2023 12:54:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qx93e-0000ak-NC for emacs-devel@gnu.org; Sun, 29 Oct 2023 12:54:38 -0400 Original-Received: from mail-pg1-x542.google.com ([2607:f8b0:4864:20::542]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qx93c-0007Ge-V3 for emacs-devel@gnu.org; Sun, 29 Oct 2023 12:54:38 -0400 Original-Received: by mail-pg1-x542.google.com with SMTP id 41be03b00d2f7-5b8a88038b4so3448227a12.1 for ; Sun, 29 Oct 2023 09:54:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698598475; x=1699203275; darn=gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NrCa2VyFy98m6EI+11DMnLZtsQ7YGxDNMxULKso6C3g=; b=leaC0ZW5IQ7ec+LrqAYUglVuxWV61ltocNq9h0AdYEy2jotobnUXxDiIdfVf7VoYEA Q1NWLSgBn6exgun6CYV8Mll4yPIyavbm8qv9Kj+QWakm5eDE2DhPvnKho0co1ktdldwY /DqVFhE40SwVlGrTY1lIOLkB+E+1Fnnp2vf272wer1v8M3X1EHvMo9qJYllje+4TdpX9 dEPpnbBSWZsstomMvldsNTsSSTFaPrv5aRhqNrLnOu4fJUSF4iAtOaJBRXRHDYhCPvUH 4hLFAQyOXcww0C7Ie5T50JLa3vvdqgX87C6f1biiOE8nOua5kZb8l3clpTIlHxq+m0I8 hUyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698598475; x=1699203275; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NrCa2VyFy98m6EI+11DMnLZtsQ7YGxDNMxULKso6C3g=; b=IEaW3bdDp4ZOYpo6SwkKuE2w2VuGzT/l7QDD5opavUZ9HrfDfo5MWaLnQh/dhxqVbz DCkw6EW6UEGzdCjz5eD8URuzEQJgX9ocTzcv9wDiJONvXlw5OnU0aVVOhk1K9aCU3/cR WhPcSkrf2j+7lMr7W4pnYwVd6XvvsBdOswA/oeYiyOdGXQvNm8grM0xsCaZF7Xm7t2tQ aV2qESo8MqgyplnA6WUqpwD0dmDttJdZr0jF0O2v70OIfqfLBvVlCPq/7TJWSs39dmX8 s1IdEqVzpnz6/5zcQJtIasGel6/1lZe87aXZk/yw3EU/oownpb+iSvyXD5L5SN+kiENj G3CQ== X-Gm-Message-State: AOJu0YzFSKEOYCFekhZeV5u4raSCEf4XD0azFe/xx+FrdhBinrPAIwsd wehyyWGayHEdt3L0KnZ/FTGoQDxRgLuI9A== X-Google-Smtp-Source: AGHT+IFQ5T+qC40t8Xl8Q1tUiop53X8O2DkgF5f+FQwrBcojvJcWWT/obE2H3Ov+i68vMLKUs+hTDw== X-Received: by 2002:a17:90a:f30d:b0:27d:452a:8bb6 with SMTP id ca13-20020a17090af30d00b0027d452a8bb6mr10339145pjb.10.1698598475059; Sun, 29 Oct 2023 09:54:35 -0700 (PDT) Original-Received: from localhost ([115.240.90.130]) by smtp.gmail.com with ESMTPSA id gt1-20020a17090af2c100b0026b3f76a063sm4112615pjb.44.2023.10.29.09.54.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Oct 2023 09:54:34 -0700 (PDT) In-Reply-To: <9fb217d7-37d9-42e1-b3d2-227903d2d182@vodafonemail.de> (Jens Schmidt's message of "Sun, 29 Oct 2023 17:13:20 +0100") Received-SPF: pass client-ip=2607:f8b0:4864:20::542; envelope-from=visuweshm@gmail.com; helo=mail-pg1-x542.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:311974 Archived-At: [=E0=AE=9E=E0=AE=BE=E0=AE=AF=E0=AE=BF=E0=AE=B1=E0=AF=81 =E0=AE=85=E0=AE=95= =E0=AF=8D=E0=AE=9F=E0=AF=8B=E0=AE=AA=E0=AE=B0=E0=AF=8D 29, 2023] Jens Schmi= dt wrote: > [...] > To clarify what the problem is: If, for example, you search for > > --compressed-ssh > > with > > ?\N{HYPHEN-MINUS}?\N{HYPHEN-MINUS}compressed?\N{HYPHEN-MINUS}ssh > > in *Man curl* on Debian testing, you won't find that option, because > curl's author hasn't yet properly quoted all minus characters in the > generated man page source. As a result, they are rendered as > ?\N{HYPHEN} in man's output and occur as such in the Man-mode buffer. > (Well, that concrete example above got fixed already, but others are > still left.) > > > I have been bitten by that already, and not only me: > > https://lists.debian.org/debian-devel/2023/10/msg00083.html > > > So this is not an Emacs issue, it might get gradually better as man > page authors improve their text, and it probably will go away for > Debian when Debian freezes trixie. Which means this is just an FYI, > and nothing which requires any action ... or what do you think? Before I pulled the Debian changes which switched back to using ASCII hyphen, I thought about making Man-softhyphen-to-minus change HYPHEN to HYPHEN-MINUS to make the man page buffers searchable again. A solution like char-fold but which works in every search command would be better (external tools included if possible but that's a great asking). As a first step, perhaps we can use char-fold-to-regexp in read-regexp? For external tools, POSIX equivalence classes might work but from what I can tell from a cursory internet search they are not reliable.