From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ineiev Newsgroups: gmane.comp.tex.texinfo.bugs,gmane.emacs.devel Subject: Re: texi2html output validity Date: Thu, 25 Dec 2014 11:58:58 -0500 Message-ID: <20141225165857.GB7982@gnu.org> References: <87k31kga2y.fsf@fencepost.gnu.org> <87r3vsdps7.fsf@fencepost.gnu.org> <87a92ehctk.fsf_-_@violet.siamics.net> <20141223164911.GD5623@free.fr> <871tnp682q.fsf@uwakimon.sk.tsukuba.ac.jp> <20141225140528.GB16721@gnu.org> <87sig34t7j.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1419526754 22011 80.91.229.3 (25 Dec 2014 16:59:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 25 Dec 2014 16:59:14 +0000 (UTC) Cc: Yuri Khan , bug-texinfo@gnu.org, Emacs developers To: "Stephen J. Turnbull" Original-X-From: bug-texinfo-bounces+gnu-bug-texinfo2=m.gmane.org@gnu.org Thu Dec 25 17:59:07 2014 Return-path: Envelope-to: gnu-bug-texinfo2@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y4Bkf-0000Nj-TX for gnu-bug-texinfo2@m.gmane.org; Thu, 25 Dec 2014 17:59:06 +0100 Original-Received: from localhost ([::1]:51298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y4Bkf-0002Yl-2t for gnu-bug-texinfo2@m.gmane.org; Thu, 25 Dec 2014 11:59:05 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39478) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y4Bkc-0002Wm-Un for bug-texinfo@gnu.org; Thu, 25 Dec 2014 11:59:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y4Bkb-0007bZ-T7 for bug-texinfo@gnu.org; Thu, 25 Dec 2014 11:59:02 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:37535) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y4BkY-0007al-OW; Thu, 25 Dec 2014 11:58:58 -0500 Original-Received: from ineiev by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1Y4BkY-0006Ec-1o; Thu, 25 Dec 2014 11:58:58 -0500 Content-Disposition: inline In-Reply-To: <87sig34t7j.fsf@uwakimon.sk.tsukuba.ac.jp> User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: bug-texinfo@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports for the GNU Texinfo documentation system List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-texinfo-bounces+gnu-bug-texinfo2=m.gmane.org@gnu.org Original-Sender: bug-texinfo-bounces+gnu-bug-texinfo2=m.gmane.org@gnu.org Xref: news.gmane.org gmane.comp.tex.texinfo.bugs:7011 gmane.emacs.devel:180661 Archived-At: On Fri, Dec 26, 2014 at 12:58:24AM +0900, Stephen J. Turnbull wrote: > Ineiev writes: > > On Wed, Dec 24, 2014 at 12:27:25PM +0900, Stephen J. Turnbull wrote: > > > AFAIK the encoding declaration is optional, defaulting to UTF-8. In > > > that case, we can (and IMHO *should*, but I am no longer an expert on > > > current encoding practice) require that our software generate UTF-8 > > > and omit the declaration. Non-UTF-8 should be invalid in Info-HTML. > > > > The fact is that some users have ASCII-incompatible default > > encodings (like UTF-16). if we add the declaration, it costs little, > > but the pages just work for them. > > AFAIK, default encodings are not a problem. GNU webmasters did receive reports from such visitors. I'm sure many cases were not reported. > If Info-HTML is specified > to be served as XML (which has its own issues, but that's one way to > do it) then conformant browsers RFC2119-MUST assume Unicode as the > coded character set, and will automatically determine the > transformation format (UTF-8, UTF-16, UTF-16-little-endian) by > checking the first two octets. I believe HTML5 also specifies UTF-8 > as the default. I don't think HTML5 requirements are relevant because the browser may not realize that it's HTML5 rather than HTML4 (and if we use , we have few options but to produce HTML4, anyway), and for HTML4, it obeys user's bogus settings. Of course there may be ways to specify the encoding other than the explicit declaration; I just believe that the explicit declaration works reliably, and I'm not sure about other means. > Alternatively, for such systems it's trivial to generate UTF-16 from > UTF-8. I think I don't understand this. do you suggest that webmasters provide two versions of pages for the users to select them manually, or do you say that the users should convert the pages themselves?