From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24425: [PATCH] =?UTF-8?Q?Don=E2=80=99t?= cast Unicode to 8-bit when casing unibyte strings Date: Thu, 15 Sep 2016 21:55:20 +0300 Message-ID: <83twdh56xz.fsf@gnu.org> References: <1473720367-2807-1-git-send-email-mina86@mina86.com> <83mvjb98f5.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1473965786 1039 195.159.176.226 (15 Sep 2016 18:56:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 15 Sep 2016 18:56:26 +0000 (UTC) Cc: 24425@debbugs.gnu.org To: Michal Nazarewicz Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Sep 15 20:56:18 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkbpW-0007c6-28 for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Sep 2016 20:56:14 +0200 Original-Received: from localhost ([::1]:36858 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkbpU-0007sd-Cq for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Sep 2016 14:56:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:36619) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkbpO-0007sY-5x for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2016 14:56:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bkbpK-0000CS-DY for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2016 14:56:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34812) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkbpK-0000CN-AA for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2016 14:56:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bkbpK-0005kE-4D for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2016 14:56:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 15 Sep 2016 18:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24425 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24425-submit@debbugs.gnu.org id=B24425.147396574622060 (code B ref 24425); Thu, 15 Sep 2016 18:56:02 +0000 Original-Received: (at 24425) by debbugs.gnu.org; 15 Sep 2016 18:55:46 +0000 Original-Received: from localhost ([127.0.0.1]:60757 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkbp2-0005jj-Bf for submit@debbugs.gnu.org; Thu, 15 Sep 2016 14:55:46 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:45555) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bkbp0-0005jT-7c for 24425@debbugs.gnu.org; Thu, 15 Sep 2016 14:55:42 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bkbos-0008WD-7B for 24425@debbugs.gnu.org; Thu, 15 Sep 2016 14:55:36 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:40157) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bkbor-0008Vo-T8; Thu, 15 Sep 2016 14:55:34 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2250 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bkboo-0003HR-31; Thu, 15 Sep 2016 14:55:32 -0400 In-reply-to: (message from Michal Nazarewicz on Thu, 15 Sep 2016 16:23:54 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:123355 Archived-At: > From: Michal Nazarewicz > Cc: 24425@debbugs.gnu.org > Date: Thu, 15 Sep 2016 16:23:54 +0200 > > On Tue, Sep 13 2016, Eli Zaretskii wrote: > > Currently, case changes in unibyte characters and strings are only > > well defined for pure ASCII text; if the input or the result is not > > pure ASCII, we produce "undefined behavior". > > Would the following (not tested) make sense then: AFAIU, it would disallow handling unibyte text by setting up case tables for 8-bit characters in their multibyte representation, i.e. above #x3FFF00. I'd rather not lose that, although I don't think I've ever seen that used. > > Properly means that upcasing "istanbul" in the above example will > > produce "İSTANBUL", not "iSTANBUL", and downcasing "IRMA" will produce > > "ırma". > > I thought about that but then another corner case is "istanbul\xff" > which is a unibyte string with 8-bit bytes. And what is the problem in that case? > I have no strong feelings either way so I’m happy just leaving it as is > as well. That is fine with me. Was there some real-life use case where you bumped into this? If so, maybe we should discuss that use case, perhaps the solution, if we need one, is something other than what we talked about until now.