From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Use the Unicode replacement character for replacing unencodable characters into UTF-16 Date: Tue, 18 Aug 2020 17:36:10 +0200 Message-ID: <7399CD85-E381-4BE6-81D6-10AA9FC56685@acm.org> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.15\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_03B4C394-9EB2-40F8-93A2-DB047D92CA76" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17110"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 18 17:44:43 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k83mx-0004Fo-LN for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Aug 2020 17:44:39 +0200 Original-Received: from localhost ([::1]:35254 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k83mw-0005mt-MG for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Aug 2020 11:44:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55518) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k83f6-0000Zh-2T for emacs-devel@gnu.org; Tue, 18 Aug 2020 11:36:32 -0400 Original-Received: from mail236c50.megamailservers.eu ([91.136.10.246]:48444 helo=mail56c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k83f3-0002qM-H5 for emacs-devel@gnu.org; Tue, 18 Aug 2020 11:36:31 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1597764986; bh=7JrfQBCaLJzVWgCHSfTTQK326V2OQOqfnRUGvbUqGnI=; h=From:Subject:Date:To:From; b=e/mRjnLzM094O5tQa/2fOgTVZBqLYn4P0CgN6TlIpokvPODho5J3+nTa6yuKv/swF AGEXw+UnUBdJ5LUx7XtRTn3oauAji9xJvJLb6fcl5bNlOKHTNY79dF4nSU9HleVUqm EsIPiTstNU6kprmAHGQVImm9Q6pV5VeJndIdSH38= Feedback-ID: mattiase@acm.or Original-Received: from [192.168.0.4] (c188-150-171-71.bredband.comhem.se [188.150.171.71]) (authenticated bits=0) by mail56c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 07IFaO8u024287 for ; Tue, 18 Aug 2020 15:36:26 +0000 X-Mailer: Apple Mail (2.3445.104.15) X-CTCH-RefID: str=0001.0A782F25.5F3BF57A.0068, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=UqsdyN4B c=1 sm=1 tr=0 a=SF+I6pRkHZhrawxbOkkvaA==:117 a=SF+I6pRkHZhrawxbOkkvaA==:17 a=M51BFTxLslgA:10 a=V5rmgFBQJ2NcZHQ5LeAA:9 a=CjuIK1q_8ugA:10 a=WQR7AquaD5-qeQtGFWUA:9 a=B2y7HmGcmWMA:10 X-Origin-Country: SE Received-SPF: softfail client-ip=91.136.10.246; envelope-from=mattiase@acm.org; helo=mail56c50.megamailservers.eu X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/18 11:36:27 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x (no timestamps) [generic] X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:253947 Archived-At: --Apple-Mail=_03B4C394-9EB2-40F8-93A2-DB047D92CA76 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii The attached patch makes sure that non-Unicode characters are replaced = with U+FFFD REPLACEMENT CHARACTER instead of a space when converting to = UTF-16. (The space is from all evidence a historical accident.) This change is required for one possible solution of bug#42904. We can = do without this patch, but it fixes a clear bug. For some reason, unpaired surrogates aren't affected despite not being = encodable in UTF-16 -- another bug, but not one addressed here. --Apple-Mail=_03B4C394-9EB2-40F8-93A2-DB047D92CA76 Content-Disposition: attachment; filename=0001-Use-Unicode-replacement-character-for-unencodable-UT.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Use-Unicode-replacement-character-for-unencodable-UT.patch" Content-Transfer-Encoding: quoted-printable =46rom=2028764d55bf06f2b81a33ea03258ba62b9c02a6b9=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Tue,=2018=20Aug=202020=2017:00:15=20+0200=0A= Subject:=20[PATCH]=20Use=20Unicode=20replacement=20character=20for=20= unencodable=20UTF-16=0A=20characters=0A=0AUse=20the=20standard=20U+FFFD=20= REPLACEMENT=20CHARACTER=20instead=20of=20a=20space=20to=0Areplace=20= characters=20that=20cannot=20be=20encoded=20in=20UTF-16.=0A=0A*=20= lisp/international/mule-conf.el=20(utf-16le,=20utf-16be)=0A= (utf-16le-with-signature,=20utf-16be-with-signature,=20utf-16):=0AUse=20= U+FFFD=20as=20:default-char.=0A*=20test/src/coding-tests.el=20= (coding-utf-16-replacement-char):=20New=20test.=0A---=0A=20= lisp/international/mule-conf.el=20|=20=205=20+++++=0A=20= test/src/coding-tests.el=20=20=20=20=20=20=20=20|=2012=20++++++++++++=0A=20= 2=20files=20changed,=2017=20insertions(+)=0A=0Adiff=20--git=20= a/lisp/international/mule-conf.el=20b/lisp/international/mule-conf.el=0A= index=20edda79ba4e..b9acafc158=20100644=0A---=20= a/lisp/international/mule-conf.el=0A+++=20= b/lisp/international/mule-conf.el=0A@@=20-1336,6=20+1336,7=20@@=20= 'utf-16le=0A=20=20=20:mnemonic=20?U=0A=20=20=20:charset-list=20= '(unicode)=0A=20=20=20:endian=20'little=0A+=20=20:default-char=20#xfffd=0A= =20=20=20:mime-text-unsuitable=20t=0A=20=20=20:mime-charset=20'utf-16le)=0A= =20=0A@@=20-1345,6=20+1346,7=20@@=20'utf-16be=0A=20=20=20:mnemonic=20?U=0A= =20=20=20:charset-list=20'(unicode)=0A=20=20=20:endian=20'big=0A+=20=20= :default-char=20#xfffd=0A=20=20=20:mime-text-unsuitable=20t=0A=20=20=20= :mime-charset=20'utf-16be)=0A=20=0A@@=20-1355,6=20+1357,7=20@@=20= 'utf-16le-with-signature=0A=20=20=20:charset-list=20'(unicode)=0A=20=20=20= :bom=20t=0A=20=20=20:endian=20'little=0A+=20=20:default-char=20#xfffd=0A=20= =20=20:mime-text-unsuitable=20t=0A=20=20=20:mime-charset=20'utf-16)=0A=20= =0A@@=20-1365,6=20+1368,7=20@@=20'utf-16be-with-signature=0A=20=20=20= :charset-list=20'(unicode)=0A=20=20=20:bom=20t=0A=20=20=20:endian=20'big=0A= +=20=20:default-char=20#xfffd=0A=20=20=20:mime-text-unsuitable=20t=0A=20=20= =20:mime-charset=20'utf-16)=0A=20=0A@@=20-1375,6=20+1379,7=20@@=20= 'utf-16=0A=20=20=20:charset-list=20'(unicode)=0A=20=20=20:bom=20= '(utf-16le-with-signature=20.=20utf-16be-with-signature)=0A=20=20=20= :endian=20'big=0A+=20=20:default-char=20#xfffd=0A=20=20=20= :mime-text-unsuitable=20t=0A=20=20=20:mime-charset=20'utf-16)=0A=20=0A= diff=20--git=20a/test/src/coding-tests.el=20b/test/src/coding-tests.el=0A= index=20c438ae22ce..8b0adf0ad8=20100644=0A---=20= a/test/src/coding-tests.el=0A+++=20b/test/src/coding-tests.el=0A@@=20= -429,6=20+429,18=20@@=20coding-check-coding-systems-region=0A=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20'((iso-latin-1=203)=20= (us-ascii=201=203))))=0A=20=20=20(should-error=20= (check-coding-systems-region=20"=C3=A5"=20nil=20'(bad-coding-system))))=0A= =20=0A+(ert-deftest=20coding-utf-16-replacement-char=20()=0A+=20=20= (should=20(equal=20(encode-coding-string=20"A\351B"=20'utf-16be)=0A+=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20(unibyte-string=200=20?A=20= #xff=20#xfd=200=20?B)))=0A+=20=20(should=20(equal=20= (encode-coding-string=20"A\351B"=20'utf-16le)=0A+=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20(unibyte-string=20?A=200=20#xfd=20#xff=20?B=20= 0)))=0A+=20=20(should=20(equal=20(encode-coding-string=20= "A\ud8b6B=CE=A3\227D=F0=9D=84=9E"=20'utf-16be)=0A+=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20(unibyte-string=200=20?A=20#xd8=20#xb6=200=20= ?B=20#x03=20#xa3=20#xff=20#xfd=200=20?D=0A+=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= #xd8=20#x34=20#xdd=20#x1e)))=0A+=20=20(should=20(equal=20= (encode-coding-string=20"A\ud8b6B=CE=A3\227D=F0=9D=84=9E"=20'utf-16le)=0A= +=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20(unibyte-string=20?A=20= 0=20#xb6=20#xd8=20?B=200=20#xa3=20#x03=20#xfd=20#xff=20?D=200=0A+=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20#x34=20#xd8=20#x1e=20#xdd))))=0A+=0A=20;;=20Local=20= Variables:=0A=20;;=20byte-compile-warnings:=20(not=20obsolete)=0A=20;;=20= End:=0A--=20=0A2.21.1=20(Apple=20Git-122.3)=0A=0A= --Apple-Mail=_03B4C394-9EB2-40F8-93A2-DB047D92CA76--