From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: [elpa] 02/04: company-clang: handle multibyte chars between bol and point Date: Fri, 21 Mar 2014 05:47:11 +0200 Message-ID: <532BB63F.2070509@yandex.ru> References: <20140319033013.17542.14344@vcs.savannah.gnu.org> <87mwgm9t81.fsf@yandex.ru> <834n2u9lj7.fsf@gnu.org> <5329DA52.2030704@yandex.ru> <83vbva82cy.fsf@gnu.org> <532A08FF.8020001@yandex.ru> <87ior9pohp.fsf@yandex.ru> <83k3bp8qrz.fsf@gnu.org> <532A6A21.8040802@yandex.ru> <83fvmc97ff.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1395373646 20114 80.91.229.3 (21 Mar 2014 03:47:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 21 Mar 2014 03:47:26 +0000 (UTC) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 21 04:47:36 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WQqQh-0005TJ-OQ for ged-emacs-devel@m.gmane.org; Fri, 21 Mar 2014 04:47:35 +0100 Original-Received: from localhost ([::1]:50551 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQqQh-0003Av-2v for ged-emacs-devel@m.gmane.org; Thu, 20 Mar 2014 23:47:35 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37597) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQqQZ-0003Ae-6h for emacs-devel@gnu.org; Thu, 20 Mar 2014 23:47:32 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WQqQT-0005eN-LX for emacs-devel@gnu.org; Thu, 20 Mar 2014 23:47:27 -0400 Original-Received: from mail-ee0-x22c.google.com ([2a00:1450:4013:c00::22c]:61701) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQqQN-0005e8-Hy; Thu, 20 Mar 2014 23:47:15 -0400 Original-Received: by mail-ee0-f44.google.com with SMTP id e49so1332723eek.31 for ; Thu, 20 Mar 2014 20:47:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=0kgXyMg+xiLhshRz39Fqu31dljE/hS4amMCVRU7x1hs=; b=XU3uxwG8IbgCJZdB+/W+gbxFFI1rnPg5m/8EjIMYKgi49pjelgyrGO5YXtsrDKsljf +iRELIN3V/JlxMTaSzjSw/i3SWm3Ijb2RhmaZJaV9oIv9+nJSsG7sv3I61ROtQ9WYQgP ID1ZPummMmNDVEwIDiYyhVbwU0YHhyt+o67l9sj0KudpiQ58NPnaIX/OOPW6Bo57bjYL CdLzQW9+q+nAKvHW2XtpFGhnO+/ul5kbIQWJkp1BzhvyNG32H6iRmnioyvarmhy8EDWF PB3IAKqPlcClyvG/zj4QQgU4ivC19IDC0Nwo8pkpG9OpMl5QUfxdCgrQG698xdmwbHwC q9oA== X-Received: by 10.14.182.5 with SMTP id n5mr29687788eem.68.1395373634348; Thu, 20 Mar 2014 20:47:14 -0700 (PDT) Original-Received: from [192.168.10.2] (93-121-245.netrun.cytanet.com.cy. [93.109.121.245]) by mx.google.com with ESMTPSA id q49sm8651700eem.34.2014.03.20.20.47.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 20 Mar 2014 20:47:13 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 In-Reply-To: <83fvmc97ff.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4013:c00::22c X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:170680 Archived-At: On 20.03.2014 18:11, Eli Zaretskii wrote: > I needed to look in their sources, but the information there isn't > clear-cut, either (or maybe I didn't understand the code ;-). Some > functions that convert file offsets to columns count bytes from the > beginning of the line, others count characters, assuming a UTF-8 > encoding. But since you say the attempt to count characters in > non-UTF-8 encoding failed, I guess clang needs byte counts of UTF-8 > encoding. Yes. And from what I've read (http://stackoverflow.com/a/8259610/615245), non-ANSI encoding support was added piecewise, so maybe the relevant code still hasn't settled. > In any case, please note that UTF-8 and the internal encoding used by > Emacs are not exactly identical, so IMO you should encode into UTF-8 > and then use 'length' to compute the "column". This makes sense. I don't think anyone's likely to encounter a source file with characters that are encoded differently between utf-8 and utf-8-emacs, but I guess the latter is unspecced, so it could change in the future.