From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: JSON parsing of Extended ASCII Date: Tue, 10 May 2022 20:30:48 +0300 Message-ID: <83fslhs3av.fsf@gnu.org> References: <4763d841-54f0-d146-d968-beb7988431ff@t-online.de> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19795"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Felix Weilbach Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 10 19:32:26 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1noTii-0004w3-NL for ged-emacs-devel@m.gmane-mx.org; Tue, 10 May 2022 19:32:24 +0200 Original-Received: from localhost ([::1]:51942 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1noTih-0007tg-NA for ged-emacs-devel@m.gmane-mx.org; Tue, 10 May 2022 13:32:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52696) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1noThF-0006TZ-Sh for emacs-devel@gnu.org; Tue, 10 May 2022 13:30:53 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:49468) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1noThE-0003pe-3N; Tue, 10 May 2022 13:30:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=V6nyyB0VKfdQkO66iLbnIG22oLiK41Ir6tWfkdsxAqw=; b=pgA48l+YlfTr 8YcbZgpRcRLC17x1Mfm3HINa37KZUtsKFBlRNYM8t4eMlA0OxQ/uvVFUOVr7gTf8WiOyze7C3a2lV Nrc/Frsmwp1KNEB6ninaixHBfieMha5QBc8PYRxeFxX22TiIQn1aDUkGangX589zBmtzGxtKwvlRn 0sothyoKfC6Xt1G6gT32CQ6xwhl3uKsMVnXkFzSzB8+mqSNwORrPj5C5bNpSaE8Z0bi5mRGhzknDk rohHtrrg57FWHPCqDwZPU0IJaj1xAtVHxCqsa1SmoI0e/fwUM04Dus79T8HXOVWBgrIuhr4IxbPeQ f+DGSxbp9KXHiBq1zcEIWg==; Original-Received: from [87.69.77.57] (port=3510 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1noThB-00048F-Hs; Tue, 10 May 2022 13:30:51 -0400 In-Reply-To: <4763d841-54f0-d146-d968-beb7988431ff@t-online.de> (message from Felix Weilbach on Tue, 10 May 2022 19:18:37 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289597 Archived-At: > Date: Tue, 10 May 2022 19:18:37 +0200 > From: Felix Weilbach > > I'm working on source code that has Extended ASCII characters inside > comments in it. What do you mean by "Extended ASCII"? Is this some encoding different from UTF-8, like Latin-1 or Windows codepage 1251? Or is it something else? > This is unfortunate because LSP-mode crashes if it > encounters an invalid UTF-8 string. I know this mailing list is not > about LSP-mode, but I could track down the issue to the function > json_stringn() that gets called in lisp_to_json() inside json.c. Visual > Studio Code can cope with these files fine and their LSP implementation > has no problem with these files. Therefore I think Emacs should handle > this situation as well. I want to find a solution to this problem and > implement it. What do you think about this problem? Have you ideas for a > solution? Should a potential solution be implemented inside Emacs or > LSP-mode? > > I think there could be implemented a function inside Emacs that tries to > convert Extended ASCII to UTF-8 and then encodes it to JSON. Strings held inside Emacs are stored in (a superset of) UTF-8 anyway, so this problem should not exist for you. IOW, the conversion to UTF-8 is done automatically when non-ASCII string is read into Emacs. So I think you should provide more details about your use case specifics, because it is likely that the problem is not where you think it is. In any case, Emacs and json.c require UTF-8 because that's what libjansson, the library we use for JSON parsing, enforces. It won't help if we pass non-UTF-8 strings to the library, because it won't work.