From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mike Gran Newsgroups: gmane.lisp.guile.devel Subject: Re: UTF-16 and (ice-9 rdelim) Date: Mon, 18 Jan 2010 13:29:19 -0800 (PST) Message-ID: <544634.61469.qm@web37903.mail.mud.yahoo.com> References: <871vho9wk2.fsf@ossau.uklinux.net> <442751.82517.qm@web37908.mail.mud.yahoo.com> <87k4vff9xw.fsf@ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1263850852 8091 80.91.229.12 (18 Jan 2010 21:40:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 18 Jan 2010 21:40:52 +0000 (UTC) Cc: Guile Development To: Neil Jerram Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Jan 18 22:40:44 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NWzL0-0006cy-UW for guile-devel@m.gmane.org; Mon, 18 Jan 2010 22:40:43 +0100 Original-Received: from localhost ([127.0.0.1]:43483 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NWzL1-0000Sw-J8 for guile-devel@m.gmane.org; Mon, 18 Jan 2010 16:40:43 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NWzH8-0006pD-2p for guile-devel@gnu.org; Mon, 18 Jan 2010 16:36:42 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NWzGr-0006br-4C for guile-devel@gnu.org; Mon, 18 Jan 2010 16:36:28 -0500 Original-Received: from [199.232.76.173] (port=46009 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NWzGm-0006VZ-FH for guile-devel@gnu.org; Mon, 18 Jan 2010 16:36:20 -0500 Original-Received: from web37903.mail.mud.yahoo.com ([209.191.91.165]:22806) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1NWzA0-0005TB-DM for guile-devel@gnu.org; Mon, 18 Jan 2010 16:29:20 -0500 Original-Received: (qmail 62505 invoked by uid 60001); 18 Jan 2010 21:29:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1263850159; bh=p0s2K8JNKAuNrZAGeVoCBIJYzEGtBrb3NB3jZv60DoU=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=mvFxK31/fCQcDeh8FrKLgjmhY0VUSrwpGVHXUPnszxu4J70tZia5ZTuGOxjwwXA40lsaM5mY8qBxkxQhRJacE/U2u2NdcsZwYmT1zrIsWh7reb6ySmI27mCR61f+ib/CA3qtuUarJBd8+ZcTiTDwrE4u1i//RRccJkA0+/afdr8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ewz0XCUKY79w90cNfIMgi9bzK3+uFZUStaE7SF5Svz+kZxle3WO/kia1Wrr77NEjYl+QKlJnALwQyTjMYyih37cl98JQd5LIFUFVvDZYLW6iNnpJ1KQ4RAo6vj4tiCm2tHO/Cans4dtqJWuzkXcNUzJQztvtruxuwoEnA4KTtzk=; X-YMail-OSG: iey.mk4VM1ncpoPyqSoMZCXgiaaNhJirCGoSXuxnuLiViidzup0gy8lOssul436Nm08MOfU9RWGNg50hkXHT20rpPVJ1kVmhSQCcUCBW81EZ_iWOTye_3.DiswVR340eeBFu54PIWpNBeyJsvbMEQOZ_MmpdiJvY7nXzJ2rTfIsXoJC2VRgToyu77J0P0j8krKACS1dfhW1m5bJK0p0VarX4UXys1P_cBffQFLhqpQJSJE2kk.iolaKsyeR4OGben9wUTAKP5rqLY4R3LZmMzRgcOGMPS1oRWNAs_Qw_WYnBjU_FTV79m5JP9dw8hxuK6_IQbe3S9A26cxDJFucwY.GH0PcvnkCdkDuq2o6_pug_0ep0RMesDU_vffLl6Qb0PYpbHH3fmblmiPzINZTdG75qulqJxHwdUXCDPoqP26c- Original-Received: from [207.8.91.2] by web37903.mail.mud.yahoo.com via HTTP; Mon, 18 Jan 2010 13:29:19 PST X-Mailer: YahooMailRC/272.7 YahooMailWebService/0.8.100.260964 In-Reply-To: <87k4vff9xw.fsf@ossau.uklinux.net> X-detected-operating-system: by monty-python.gnu.org: FreeBSD 6.x (1) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9899 Archived-At: > From: Neil Jerram=0A> Hi Mike,=0A=0A> > But, if you just want to get rid = of a BOM, you can cut it down to =0A> > a rule.=A0 If the first code point = that a port reads is U+FEFF and if the=0A> > encoding has the string "utf" = in it, ignore it.=A0 If the first code point=0A> > is U+FFFE and the encodi= ng has "utf" in it, flag an error.=0A> =0A> Agreed.=0A> =0A> Out of interes= t, does that mean that iconv will auto-detect the=0A> endianness if the enc= oding does not explicitly say "le" or "be"?=0A=0AThe Unicode FAQ from unico= de.org says that "the unmarked form (UTF-16, UTF-32)=0Auses big-endian byte= serialization by default, but may include a byte order=0Amark at the begin= ning to indicate the actual byte serialization used."=A0 So,=0AI guess the= =A0strictly correct thing to do for UTF-16 would be to=0A=0A*=A0check for a= BOM.=A0 =0A*=A0if it exists=0A=A0=A0*=A0 if it is U+FFFE, modify the port = encoding to UTF-16-LE=0A=A0=A0*=A0 if it is U+FEFF,=A0leave the port encodi= ng=A0as UTF-16=0A=A0=A0*=A0 discard the BOM=0A*=A0else,=A0leave the port-en= coding to UTF-16=0A=0Aand similarly for UTF-32.=0A=0AThanks,=0A- Mike