From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.user Subject: Re: UTF16 encoding adds BOM everywhere? Date: Wed, 20 Jul 2022 16:42:53 -0400 Message-ID: <87pmhz347r.fsf@netris.org> References: <63f03f91-58d8-c247-2d2a-78848c2e5ca9@abou-samra.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27858"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-user@gnu.org To: Jean Abou Samra Original-X-From: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Wed Jul 20 22:43:49 2022 Return-path: Envelope-to: guile-user@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEGXt-000739-6n for guile-user@m.gmane-mx.org; Wed, 20 Jul 2022 22:43:49 +0200 Original-Received: from localhost ([::1]:49876 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oEGXs-0008SB-7t for guile-user@m.gmane-mx.org; Wed, 20 Jul 2022 16:43:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48584) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEGXU-0008Qw-Ru for guile-user@gnu.org; Wed, 20 Jul 2022 16:43:25 -0400 Original-Received: from world.peace.net ([64.112.178.59]:58786) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oEGXT-0007sg-7J for guile-user@gnu.org; Wed, 20 Jul 2022 16:43:24 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEGXE-0002Rm-JY; Wed, 20 Jul 2022 16:43:08 -0400 In-Reply-To: <63f03f91-58d8-c247-2d2a-78848c2e5ca9@abou-samra.fr> Received-SPF: pass client-ip=64.112.178.59; envelope-from=mhw@netris.org; helo=world.peace.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane-mx.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.io gmane.lisp.guile.user:18438 Archived-At: Hi, Jean Abou Samra wrote: > With this code: >=20 > (let ((p (open-output-file "x.txt"))) > =C2=A0 (set-port-encoding! p "UTF16") > =C2=A0 (display "ABC" p) > =C2=A0 (close-port p)) >=20 > the sequence of bytes in the output file x.txt is >=20 > ['FF', 'FE', '41', '0', 'FF', 'FE', '42', '0', 'FF', 'FE', '43', '0'] >=20 > FFE is a little-endian Byte Order Mark (BOM), fine. > But why is Guile adding it before every character > instead of just at the beginning of the string? > Is that expected? No, this is certainly a bug. It sounds like the 'at_stream_start_for_bom_write' port flag is not being cleared, as it should be, after the first character is written. I suspect that it worked correctly when I first implemented proper BOM handling in 2013 (commit cdd3d6c9f423d5b95f05193fe3c27d50b56957e9), but the ports code has seen some major reworking since then. I guess that BOM handling was broken somewhere along the way. I would suggest filing a bug report. I don't have time to look into it, sorry. I don't work on Guile anymore. I only happened to see your message by chance. Regards, Mark --=20 Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about .