From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Automatic (e)tags generation and incremental updates Date: Wed, 13 Jan 2021 17:58:16 +0200 Message-ID: <834kjkde1z.fsf@gnu.org> References: <779a6328-9ca5-202a-25a2-b270c66fe6dd@yandex.ru> <8fc5e96c-ebb8-c668-9b2a-c7c4ee54c0b9@yandex.ru> <83r1mwltob.fsf@gnu.org> <0bee9ab4-46bc-b6fd-97b6-e26cc80f1610@yandex.ru> <875z45dbm7.fsf@tromey.com> <1e9c9572-52ee-339b-78a2-731b9eb5f3de@yandex.ru> <871resd93f.fsf@tromey.com> <83mtxffrou.fsf@gnu.org> <106abdbb-ce7a-4911-0831-149da3dccfb3@yandex.ru> <83o8hudwgo.fsf@gnu.org> <8335z6dql2.fsf@gnu.org> <3c688f2e-a32c-63b8-235b-8ef92e87fe83@yandex.ru> <83y2gyca4z.fsf@gnu.org> <09159508-db02-75f8-ec4e-692c62360905@yandex.ru> <837dogdgp6.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22575"; mail-complaints-to="usenet@ciao.gmane.io" Cc: philipk@posteo.net, tom@tromey.com, emacs-devel@gnu.org, john@yates-sheets.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Jan 13 16:59:49 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kziYn-0005lU-Gn for ged-emacs-devel@m.gmane-mx.org; Wed, 13 Jan 2021 16:59:49 +0100 Original-Received: from localhost ([::1]:56022 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kziYm-0004Ym-Hv for ged-emacs-devel@m.gmane-mx.org; Wed, 13 Jan 2021 10:59:48 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49210) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kziXQ-0002sm-S9 for emacs-devel@gnu.org; Wed, 13 Jan 2021 10:58:24 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58629) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kziXP-0003Ch-CE; Wed, 13 Jan 2021 10:58:23 -0500 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:3102 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kziXK-0005D9-Nk; Wed, 13 Jan 2021 10:58:20 -0500 In-Reply-To: (message from Dmitry Gutov on Wed, 13 Jan 2021 17:52:16 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:263036 Archived-At: > Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net, > emacs-devel@gnu.org > From: Dmitry Gutov > Date: Wed, 13 Jan 2021 17:52:16 +0200 > > > Almost all the identifiers are ASCII, right? So maybe optimize 99.9% > > of use cases by storing such tags tables in a unibyte buffer, read > > with insert-file-contents-literally? > > All right, and that option is probably handled well enough already by > the user choosing (l) in the prompt when the tags file is very big. Yes, but my idea was to do that automatically. After all, the size threshold beyond which we prompt the user is customizable, so it could be very large. > > As for why utf-8-emacs didn't help: I'm not really sure why Stefan > > thought it will. I mean, look at the code: it still encodes, just > > differently. > > My (apparently faulty) intuition was that if utf-8-emacs is the memory > representation of buffer text, converting it into that encoding can be > faster because it could be done by copying from memory rather that > having to do the work of recoding every character. We don't recode characters when they are valid UTF-8 sequences, but you forget the raw bytes: they are converted from internal multibyte representation to single bytes, and that requires walking the buffer one character at a time. IOW, utf-8-emacs is the same as utf-8 for this purpose.