From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.bugs Subject: bug#59341: 29.0.50; Lisp files with other encoding than UTF-8? Date: Thu, 17 Nov 2022 20:14:09 -0800 Message-ID: References: <83iljdcq77.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2246"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 59341@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Nov 18 05:15:20 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ovsmd-0000NP-1q for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 18 Nov 2022 05:15:19 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ovsmS-0005KU-UG; Thu, 17 Nov 2022 23:15:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ovsmM-0005Gk-Tl for bug-gnu-emacs@gnu.org; Thu, 17 Nov 2022 23:15:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ovsmM-0002AN-Ip for bug-gnu-emacs@gnu.org; Thu, 17 Nov 2022 23:15:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ovsmM-0006vi-3m for bug-gnu-emacs@gnu.org; Thu, 17 Nov 2022 23:15:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Kangas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 18 Nov 2022 04:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 59341 X-GNU-PR-Package: emacs Original-Received: via spool by 59341-submit@debbugs.gnu.org id=B59341.166874485826576 (code B ref 59341); Fri, 18 Nov 2022 04:15:02 +0000 Original-Received: (at 59341) by debbugs.gnu.org; 18 Nov 2022 04:14:18 +0000 Original-Received: from localhost ([127.0.0.1]:34853 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ovsld-0006ua-Oj for submit@debbugs.gnu.org; Thu, 17 Nov 2022 23:14:18 -0500 Original-Received: from mail-oa1-f54.google.com ([209.85.160.54]:35629) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ovslb-0006uM-Ch for 59341@debbugs.gnu.org; Thu, 17 Nov 2022 23:14:16 -0500 Original-Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-142612a5454so4035565fac.2 for <59341@debbugs.gnu.org>; Thu, 17 Nov 2022 20:14:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:mime-version:references:in-reply-to :from:from:to:cc:subject:date:message-id:reply-to; bh=aXPwD/c0Xu5pK2zKizwFa7aOZkqeKmBXNFRq4UHehoE=; b=hBrFxVAGqo7cs/uKOeg+IbhvC6+sTgHIBcNODBrfjFMFpnoZprztVwRP+aN0qKB3ak BSuV1+X0dNphuTbm1UxmCf9xo8LiwOPSjlhzUdC4pAUow22T1ZGWh9P2sknNt53cBDlF ceU7Yle0NIVa926syor6spvMF+3ke8c1Woam/GSn2zG4WCifmGnFbiXuvXIwsgmee03Z yE0chDIdAECpqZC9+rgzv6EN4ATWpAiiMRxten8l7+5LxFWHmQZBERI3nfmZJW/7+FTk tSl0rqj51q51XwVfl619wqSa3vZRwu0p/5AQ3/pmYlEVm590gkL+nFCsHDaxt0PSPTUI 5yvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:mime-version:references:in-reply-to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aXPwD/c0Xu5pK2zKizwFa7aOZkqeKmBXNFRq4UHehoE=; b=iZnyYz/GjdMcsQFbLhiENjbI0rWlwRid5iKfuP5wiGaUAJXOwqIUjcxERYvTdVdBLX s5EBlRQcbUSNhhVT7tBwduv4sZVeHOTU/chol3qBHVxgGR2SZSTm39LSENRwW4k7yD6S TbI/AT7OrvPrUtdJoM7h3Z0wdl/zlrqmaZ02insElhePzH2frFszQfw6RpezWNAf4oYC H5AlSi1aYvDQMmUgU8u3cRE9oek2Uge/Ekk8j0zp+cno5P+EM+6e1wAHJL8Ch+S45wsg pdCLN8DkBCqW7EfOaf8nrohkyjiw+PZbDggj7FYcflrkbsnzFO2XJ8i3UsDtWGMcd3Kz g7Ug== X-Gm-Message-State: ANoB5pmE/I6g3RrZq+b/5cNhIalBXnFLGAa6th1mzVfevvwcLPofiU1e aVCWpO4oKJVih2DBfAETmHcwMuKGU8Plfp/ZPYw= X-Google-Smtp-Source: AA0mqf7YFnMPfNMz7+fnG2k1XC0x7VKl1CqhO68a0A1yEVz8OoVGVe8toUPb0Ub3XCGNitP1jlPqSa1lgXksFEFrePo= X-Received: by 2002:a05:6870:c34a:b0:13b:8a07:2a1f with SMTP id e10-20020a056870c34a00b0013b8a072a1fmr6185744oak.199.1668744849678; Thu, 17 Nov 2022 20:14:09 -0800 (PST) Original-Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Thu, 17 Nov 2022 20:14:09 -0800 In-Reply-To: <83iljdcq77.fsf@gnu.org> X-Hashcash: 1:20:221118:59341@debbugs.gnu.org::IexaoKVQwzym7BSx:6eQp X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:248162 Archived-At: Eli Zaretskii writes: > No. AFAIR, they are in utf-8-emacs because they include characters > beyond the Unicode range, which UTF-8 cannot encode. See, for > example, the codepoints that start around line 645 in ind-util.el, > which are used for converting between IS 13194 (ISCII) and Unicode. I see, thanks. Do we need these characters to be raw bytes in the source code though? I was thinking of a change similar to the below, which would incidentally make it a bit easier to read the code. diff --git a/lisp/language/ind-util.el b/lisp/language/ind-util.el index e2a21820f4..16161319ef 100644 --- a/lisp/language/ind-util.el +++ b/lisp/language/ind-util.el @@ -644,9 +644,9 @@ indian-dev-aiba-decode-region ;;Unicode vs IS13194 ;; only Devanagari is supported now. ((ucs-devanagari-to-is13194-alist '((?\x0900 . "[U+0900]") - (?\x0901 . " ") - (?\x0902 . " ") - (?\x0903 . " ") + (?\x0901 . "?\x180000") + (?\x0902 . "?\x180001") + (?\x0903 . "?\x180002") (?\x0904 . "[U+0904]") [and so on] This change would also avoid confusing external tools. For example, the code is completely unreadable in many external viewers, such as: https://github.com/emacs-mirror/emacs/blob/master/lisp/language/ind-util.el#L647