From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: Automatic (e)tags generation and incremental updates Date: Sat, 16 Jan 2021 05:57:21 +0200 Message-ID: <8581b496-2093-42de-4e9d-deff8d4c9465@yandex.ru> References: <779a6328-9ca5-202a-25a2-b270c66fe6dd@yandex.ru> <8fc5e96c-ebb8-c668-9b2a-c7c4ee54c0b9@yandex.ru> <83r1mwltob.fsf@gnu.org> <0bee9ab4-46bc-b6fd-97b6-e26cc80f1610@yandex.ru> <875z45dbm7.fsf@tromey.com> <1e9c9572-52ee-339b-78a2-731b9eb5f3de@yandex.ru> <871resd93f.fsf@tromey.com> <83mtxffrou.fsf@gnu.org> <106abdbb-ce7a-4911-0831-149da3dccfb3@yandex.ru> <83o8hudwgo.fsf@gnu.org> <8335z6dql2.fsf@gnu.org> <3c688f2e-a32c-63b8-235b-8ef92e87fe83@yandex.ru> <83y2gyca4z.fsf@gnu.org> <09159508-db02-75f8-ec4e-692c62360905@yandex.ru> <837dogdgp6.fsf@gnu.org> <834kjkde1z.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21106"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 Cc: philipk@posteo.net, tom@tromey.com, emacs-devel@gnu.org, john@yates-sheets.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jan 16 04:58:44 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l0cjc-0005Oh-57 for ged-emacs-devel@m.gmane-mx.org; Sat, 16 Jan 2021 04:58:44 +0100 Original-Received: from localhost ([::1]:56924 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l0cjb-0005Ir-7m for ged-emacs-devel@m.gmane-mx.org; Fri, 15 Jan 2021 22:58:43 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58806) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l0ciP-0004rI-8C for emacs-devel@gnu.org; Fri, 15 Jan 2021 22:57:29 -0500 Original-Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]:37901) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1l0ciM-0002CY-Rs; Fri, 15 Jan 2021 22:57:28 -0500 Original-Received: by mail-ed1-x52d.google.com with SMTP id s11so4460239edd.5; Fri, 15 Jan 2021 19:57:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=R3y+lNgPxAlOas+RsLW2av9r/T9VguhvoGHf1AYW1S8=; b=IN/o9oBnnBcRRI+9mIEV7nLfEOdrhByrJndw+jb495l38yAhDWT4hKggyuX6VFSadk FJ1DCm5CJRCuCnhukRzHaKqxyD2iI7CqlGH8rBVrxR0H/xEz45t7bZ3JHhvgAB3W9heB rzaN/jhh9w+SNYNTpiMIFeEvCWhexLwuDox9edgGv4LwV220z45EZxpz5JOn6CRggR+S QJQIPm3GKT+HSjNvs932Dp72HB6RClmz4oo4JPcHECtLxPz5yXdgvT07lbvXnRRb+NEG xoWUZ7Jb++S26ZF66cJRXQ14IUme6IC6Cvt1iJ1V3dK9npY6U6rv/fKWAllX60TRU/Tt 1J0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=R3y+lNgPxAlOas+RsLW2av9r/T9VguhvoGHf1AYW1S8=; b=irKtTl7BiyC0H9t+rV6LZdNilBjQH76NAjxxxFVjk/Z40erW0wweEqSnvxf9ocVM0o t69WXOT1U0zCkGKTDXj+tP5flKPYw5k636Gbug9S7LE/TP/auKn4xWMvbqeUQRuRvtRa 7vEDjjBNtru0aZ32Gc4wBAdVT0Te7C4qesTnry4lgQxrrmwno3r9u0b02blE/hX59KK+ gXk6QaF9FwR+AMEqDVwYQEAGUeZoHJjoQo3Fs5Qk1Eno/AxfXmewzbbJCmnxum+Bzts2 MGXoA1PAWhVBb801uPD6NDheBqWGsvJV0Xu4hgMZc8SBVNAEg8DOokERxU69J4yjTt/X 9GUA== X-Gm-Message-State: AOAM530konSGvI9qQm6rM9aAbpXLvHG5BMNzHy9Tvdw4KYIzjuVIsdpe GcAZJzKZlnnAlDe/YQxJVYDrW4+pI48= X-Google-Smtp-Source: ABdhPJzEt+NYF8WED2JGuhWGnFQZ5xve6ddx2gWGAQATWlwSeNvqZLqyCxLQNOsMxpJnHO/yrfjhRg== X-Received: by 2002:aa7:d4d7:: with SMTP id t23mr282230edr.321.1610769444542; Fri, 15 Jan 2021 19:57:24 -0800 (PST) Original-Received: from [192.168.0.6] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k21sm2345363edq.60.2021.01.15.19.57.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 15 Jan 2021 19:57:23 -0800 (PST) In-Reply-To: <834kjkde1z.fsf@gnu.org> Content-Language: en-US Received-SPF: pass client-ip=2a00:1450:4864:20::52d; envelope-from=raaahh@gmail.com; helo=mail-ed1-x52d.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:263087 Archived-At: On 13.01.2021 17:58, Eli Zaretskii wrote: >>> Almost all the identifiers are ASCII, right? So maybe optimize 99.9% >>> of use cases by storing such tags tables in a unibyte buffer, read >>> with insert-file-contents-literally? >> >> All right, and that option is probably handled well enough already by >> the user choosing (l) in the prompt when the tags file is very big. > > Yes, but my idea was to do that automatically. After all, the size > threshold beyond which we prompt the user is customizable, so it could > be very large. Even so, this mode of operation removes a feature. How frequently it's used, I have no idea, but it's better to have full functionality by default. There must be a reason why all those languages added support for unicode chars in identifiers. For the time being, I just disabled synchronization to disk, given that we don't yet know how to refresh an existing file anyway. >> My (apparently faulty) intuition was that if utf-8-emacs is the memory >> representation of buffer text, converting it into that encoding can be >> faster because it could be done by copying from memory rather that >> having to do the work of recoding every character. > > We don't recode characters when they are valid UTF-8 sequences, but > you forget the raw bytes: they are converted from internal multibyte > representation to single bytes, and that requires walking the buffer > one character at a time. > > IOW, utf-8-emacs is the same as utf-8 for this purpose. So utf-8-emacs is not the same as "internal multibyte representation"?