all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Policty question - encoding to use in git repository?
@ 2014-02-17 15:29 Eric S. Raymond
  2014-02-17 15:41 ` Juanma Barranquero
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric S. Raymond @ 2014-02-17 15:29 UTC (permalink / raw)
  To: emacs-devel

While continuing to try to identify correct deletion points for attic
files, I have run across a minor problem: Latin-1 characters in
Changelog files.  I have seen two, c-cedilla and something I can't 
identify that renders as a backtick.  There may be more.  I can fix
them up.

I request a policy decision about what encoding the repository
content should use.  I see three reasonable choices:

* Leave Latin-1 in place.

* Transcode to UTF-8. (I favor this as the best long-term solution.)

* Transcode to ASCII approximations - easy in the two cases I've
  found so far.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Nearly all men can stand adversity, but if you want to test a man's character,
give him power.
	-- Abraham Lincoln



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 15:29 Policty question - encoding to use in git repository? Eric S. Raymond
@ 2014-02-17 15:41 ` Juanma Barranquero
  2014-02-17 16:30   ` Eric S. Raymond
  2014-02-17 16:39 ` Karl Fogel
  2014-02-17 19:22 ` Ivan Kanis
  2 siblings, 1 reply; 17+ messages in thread
From: Juanma Barranquero @ 2014-02-17 15:41 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: Emacs developers

On Mon, Feb 17, 2014 at 4:29 PM, Eric S. Raymond <esr@thyrsus.com> wrote:

> I have seen two, c-cedilla and something I can't
> identify that renders as a backtick.  There may be more.  I can fix
> them up.

Can you please show examples of these?

   J



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 15:41 ` Juanma Barranquero
@ 2014-02-17 16:30   ` Eric S. Raymond
  2014-02-17 17:03     ` Andreas Schwab
  0 siblings, 1 reply; 17+ messages in thread
From: Eric S. Raymond @ 2014-02-17 16:30 UTC (permalink / raw)
  To: Juanma Barranquero; +Cc: Emacs developers

Juanma Barranquero <lekktu@gmail.com>:
> On Mon, Feb 17, 2014 at 4:29 PM, Eric S. Raymond <esr@thyrsus.com> wrote:
> 
> > I have seen two, c-cedilla and something I can't
> > identify that renders as a backtick.  There may be more.  I can fix
> > them up.
> 
> Can you please show examples of these?

Yes.  Excuse the odd formatting, I have processed the ChangeLogs into
a big Python initializer so I can write code to mine them for the
deletion points of attic files.

In the first one, the accent acute before [delete] (I confused it with the
later backtick before).  Probably this should just be edited to an ASCII 
backtick.

In the second one, the c-cedilla in Franc,ois Pinard's name.

There be others; I have not yet checked all Changelogs for these, though
I will do so now.

("2000-07-07T14:15:55Z!gerd@gnu.org",
"lisp/ChangeLog",
"refs/tags/emacs-pretest-21.0.90",
r"""\
2000-07-07  Gerd Moellmann  <gerd@gnu.org>

	* bindings.el: Bind ´[delete]' to delete-char.

	* dired.el (dired-find-alternate-file): New function.
	(dired-mode-map): Bind `a' to dired-find-alternate-file.
	(toplevel): Require dired-aux when compiling.
	(dired-buffers): Move defvar within file to avoid compiler warning.

	* info.el (Info-last-search): Variable removed.
	(Info-search-history): New variable.
	(Info-search): New Info-search-history.

	* battery.el, info-look.el: Change author's mail address.

"""),

("2000-08-28T20:35:45Z!pbreton@attbi.com",
"lisp/ChangeLog",
"refs/tags/emacs-pretest-21.0.90",
r"""\
2000-08-28  Peter Breton  <pbreton@ne.mediaone.net>

	* locate.el (locate): Cleaned up locate command's interactive prompting
	Thanks to François_Pinard <pinard@iro.umontreal.ca> for suggestions.

	* filecache.el (file-cache-case-fold-search): New variable 
	(file-cache-assoc-function): New variable
	(file-cache-minibuffer-complete): Use file-cache-assoc-function.
	Use file-cache-case-fold-search variable
	(file-cache-add-file): Use file-cache-assoc-function
	(file-cache-delete-file): likewise
	(file-cache-directory-name): likewise
	(file-cache-debug-read-from-minibuffer): likewise

"""),
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 15:29 Policty question - encoding to use in git repository? Eric S. Raymond
  2014-02-17 15:41 ` Juanma Barranquero
@ 2014-02-17 16:39 ` Karl Fogel
  2014-02-17 16:52   ` Eli Zaretskii
  2014-02-17 19:22 ` Ivan Kanis
  2 siblings, 1 reply; 17+ messages in thread
From: Karl Fogel @ 2014-02-17 16:39 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

esr@thyrsus.com (Eric S. Raymond) writes:
>While continuing to try to identify correct deletion points for attic
>files, I have run across a minor problem: Latin-1 characters in
>Changelog files.  I have seen two, c-cedilla and something I can't 
>identify that renders as a backtick.  There may be more.  I can fix
>them up.
>
>I request a policy decision about what encoding the repository
>content should use.  I see three reasonable choices:
>
>* Leave Latin-1 in place.
>
>* Transcode to UTF-8. (I favor this as the best long-term solution.)
>
>* Transcode to ASCII approximations - easy in the two cases I've
>  found so far.

+1 to UTF-8 !



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 16:39 ` Karl Fogel
@ 2014-02-17 16:52   ` Eli Zaretskii
  2014-02-17 17:01     ` Eric S. Raymond
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2014-02-17 16:52 UTC (permalink / raw)
  To: Karl Fogel; +Cc: esr, emacs-devel

> From: Karl Fogel <kfogel@red-bean.com>
> Date: Mon, 17 Feb 2014 10:39:39 -0600
> Cc: emacs-devel@gnu.org
> 
> esr@thyrsus.com (Eric S. Raymond) writes:
> >While continuing to try to identify correct deletion points for attic
> >files, I have run across a minor problem: Latin-1 characters in
> >Changelog files.  I have seen two, c-cedilla and something I can't 
> >identify that renders as a backtick.  There may be more.  I can fix
> >them up.
> >
> >I request a policy decision about what encoding the repository
> >content should use.  I see three reasonable choices:
> >
> >* Leave Latin-1 in place.
> >
> >* Transcode to UTF-8. (I favor this as the best long-term solution.)
> >
> >* Transcode to ASCII approximations - easy in the two cases I've
> >  found so far.
> 
> +1 to UTF-8 !

It's already UTF-8.  This is a non-issue.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 16:52   ` Eli Zaretskii
@ 2014-02-17 17:01     ` Eric S. Raymond
  2014-02-17 17:04       ` Andreas Schwab
  2014-02-17 17:24       ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Eric S. Raymond @ 2014-02-17 17:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Karl Fogel, emacs-devel

Eli Zaretskii <eliz@gnu.org>:
> It's already UTF-8.  This is a non-issue.

Do you mean the policy is already to use UTF-8?  Or that you believe there
are no non-UTF-8 characters in the Changelogs?

Python doesn't think the latter is true when I try to interpret string
initializers mined from the Changelogs.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 16:30   ` Eric S. Raymond
@ 2014-02-17 17:03     ` Andreas Schwab
  2014-02-17 17:12       ` Eric S. Raymond
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2014-02-17 17:03 UTC (permalink / raw)
  To: esr; +Cc: Juanma Barranquero, Emacs developers

"Eric S. Raymond" <esr@thyrsus.com> writes:

> ("2000-07-07T14:15:55Z!gerd@gnu.org",
> "lisp/ChangeLog",

The file was reencoded in
revid:lekktu@gmail.com-20080327114958-auavr50v7a90i6cw (git id c8ec82b).

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 17:01     ` Eric S. Raymond
@ 2014-02-17 17:04       ` Andreas Schwab
  2014-02-17 17:24       ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Andreas Schwab @ 2014-02-17 17:04 UTC (permalink / raw)
  To: esr; +Cc: Karl Fogel, Eli Zaretskii, emacs-devel

"Eric S. Raymond" <esr@thyrsus.com> writes:

> Python doesn't think the latter is true when I try to interpret string
> initializers mined from the Changelogs.

The file featured a couple of encodings in its history.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 17:03     ` Andreas Schwab
@ 2014-02-17 17:12       ` Eric S. Raymond
  0 siblings, 0 replies; 17+ messages in thread
From: Eric S. Raymond @ 2014-02-17 17:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Juanma Barranquero, Emacs developers

Andreas Schwab <schwab@suse.de>:
> "Eric S. Raymond" <esr@thyrsus.com> writes:
> 
> > ("2000-07-07T14:15:55Z!gerd@gnu.org",
> > "lisp/ChangeLog",
> 
> The file was reencoded in
> revid:lekktu@gmail.com-20080327114958-auavr50v7a90i6cw (git id c8ec82b).

OK, that being the case my plan is not to touch the earlier versions.  
That is, unless someone with decision authority tells me differently.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 17:01     ` Eric S. Raymond
  2014-02-17 17:04       ` Andreas Schwab
@ 2014-02-17 17:24       ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2014-02-17 17:24 UTC (permalink / raw)
  To: esr; +Cc: kfogel, emacs-devel

> Date: Mon, 17 Feb 2014 12:01:28 -0500
> From: "Eric S. Raymond" <esr@thyrsus.com>
> Cc: Karl Fogel <kfogel@red-bean.com>, emacs-devel@gnu.org
> 
> Eli Zaretskii <eliz@gnu.org>:
> > It's already UTF-8.  This is a non-issue.
> 
> Do you mean the policy is already to use UTF-8?  Or that you believe there
> are no non-UTF-8 characters in the Changelogs?

Both.  Each ChangeLog file has this file-local variable at the end:

  ;; Local Variables:
  ;; coding: utf-8
  ;; End:

> Python doesn't think the latter is true when I try to interpret string
> initializers mined from the Changelogs.

Then something is probably wrong with the mining process, because I
look in the file ChangeLog.9 and see a 2-byte UTF-8 sequence \303\247
there for the Latin-1 character ç in "François_Pinard".




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 15:29 Policty question - encoding to use in git repository? Eric S. Raymond
  2014-02-17 15:41 ` Juanma Barranquero
  2014-02-17 16:39 ` Karl Fogel
@ 2014-02-17 19:22 ` Ivan Kanis
  2014-02-17 21:47   ` Paul Eggert
  2 siblings, 1 reply; 17+ messages in thread
From: Ivan Kanis @ 2014-02-17 19:22 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

February, 17 at 10:29 Eric S. Raymond wrote:

> While continuing to try to identify correct deletion points for attic
> files, I have run across a minor problem: Latin-1 characters in
> Changelog files.  I have seen two, c-cedilla and something I can't 
> identify that renders as a backtick.  There may be more.  I can fix
> them up.
>
> I request a policy decision about what encoding the repository
> content should use.  I see three reasonable choices:
>
> * Leave Latin-1 in place.
>
> * Transcode to UTF-8. (I favor this as the best long-term solution.)
>
> * Transcode to ASCII approximations - easy in the two cases I've
>   found so far.

Transcode to UTF-8 would be best I think.
-- 
A faith is something you die for; a doctrine is something you kill
for: there is all the difference in the world.
    -- Tony Benn



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 19:22 ` Ivan Kanis
@ 2014-02-17 21:47   ` Paul Eggert
  2014-02-17 22:08     ` Eric S. Raymond
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Paul Eggert @ 2014-02-17 21:47 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

The closest thing we have to an encoding policy is given in the "Source 
file encoding" section of admin/notes/unicode.

The history is that there's been fitful recoding to UTF-8 over the 
years.  About the time I wrote "Source file encoding" (March 2013), I 
recoded several files.  Juanma recoded a bunch of ChangeLogs in March 
2008 -- these are the most-relevant to the issue of what should appear 
in the repository.  I assume there are other recodings as well; I 
haven't kept track.

While we're on the topic of normalization, is it the intent to normalize 
spelling of author and committer names in the repository?  E.g., replace 
"François_Pinard" with "François Pinard" (no underscore)?  Or replace 
"Richard M. Stallman" and "Richard M Stallman" with "Richard Stallman"? 
  How about email addresses?  Perhaps you've already addressed this 
point but if so I'm afraid I forgot what you wrote.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 21:47   ` Paul Eggert
@ 2014-02-17 22:08     ` Eric S. Raymond
  2014-02-18  0:50     ` Glenn Morris
  2014-02-18  6:40     ` David Kastrup
  2 siblings, 0 replies; 17+ messages in thread
From: Eric S. Raymond @ 2014-02-17 22:08 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Paul Eggert <eggert@cs.ucla.edu>:
> While we're on the topic of normalization, is it the intent to
> normalize spelling of author and committer names in the repository?
> E.g., replace "François_Pinard" with "François Pinard" (no
> underscore)?  Or replace "Richard M. Stallman" and "Richard M
> Stallman" with "Richard Stallman"?  How about email addresses?

That's the first request I've had for this sort of change.  It would
be easy enough to do.

All interested parties may email me normalization requests to be
incorporated into the lift script.  I would prefer to get them from
the owner of the name or email address in question.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 21:47   ` Paul Eggert
  2014-02-17 22:08     ` Eric S. Raymond
@ 2014-02-18  0:50     ` Glenn Morris
  2014-02-18  1:07       ` Stefan Monnier
  2014-02-18  6:40     ` David Kastrup
  2 siblings, 1 reply; 17+ messages in thread
From: Glenn Morris @ 2014-02-18  0:50 UTC (permalink / raw)
  To: emacs-devel

Paul Eggert wrote:

> While we're on the topic of normalization, is it the intent to
> normalize spelling of author and committer names in the repository?

I wondered when the revisionism would get to that stage...
Remember to fix all typos as well!
And don't forget to remove all trailing whitespace!
And use US spelling throughout!
And two spaces after all full stops!
And ...



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-18  0:50     ` Glenn Morris
@ 2014-02-18  1:07       ` Stefan Monnier
  2014-02-18 16:30         ` Karl Fogel
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Monnier @ 2014-02-18  1:07 UTC (permalink / raw)
  To: Glenn Morris; +Cc: emacs-devel

>> While we're on the topic of normalization, is it the intent to
>> normalize spelling of author and committer names in the repository?
> I wondered when the revisionism would get to that stage...
> Remember to fix all typos as well!
> And don't forget to remove all trailing whitespace!
> And use US spelling throughout!
> And two spaces after all full stops!
> And ...

Please Glenn, stop this nonsense!  Fixing those cosmetic issues is
a waste of time.  We should focus on fixing actual bugs.  Could someone
please help Eric collect a list of the various bug-fixes that should be
back-ported to older revisions?  If you could find to which revision it
should apply it's better, but if not, I'm confident Eric will find
a neat heuristic to add to his reposurgeon to automatically find the
right revision.


        Stefan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-17 21:47   ` Paul Eggert
  2014-02-17 22:08     ` Eric S. Raymond
  2014-02-18  0:50     ` Glenn Morris
@ 2014-02-18  6:40     ` David Kastrup
  2 siblings, 0 replies; 17+ messages in thread
From: David Kastrup @ 2014-02-18  6:40 UTC (permalink / raw)
  To: emacs-devel

Paul Eggert <eggert@cs.ucla.edu> writes:

> While we're on the topic of normalization, is it the intent to
> normalize spelling of author and committer names in the repository?
> E.g., replace "François_Pinard" with "François Pinard" (no
> underscore)?  Or replace "Richard M. Stallman" and "Richard M
> Stallman" with "Richard Stallman"? How about email addresses?  Perhaps
> you've already addressed this point but if so I'm afraid I forgot what
> you wrote.

Normalization of names and mail addresses is done by putting all
respective mail addresses with proper names into a .mailmap file in the
top directory of the repository.  Git will consult this file for his
various operations requiring unification of names, like git shortlog.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Policty question - encoding to use in git repository?
  2014-02-18  1:07       ` Stefan Monnier
@ 2014-02-18 16:30         ` Karl Fogel
  0 siblings, 0 replies; 17+ messages in thread
From: Karl Fogel @ 2014-02-18 16:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> While we're on the topic of normalization, is it the intent to
>>> normalize spelling of author and committer names in the repository?
>> I wondered when the revisionism would get to that stage...
>> Remember to fix all typos as well!
>> And don't forget to remove all trailing whitespace!
>> And use US spelling throughout!
>> And two spaces after all full stops!
>> And ...
>
>Please Glenn, stop this nonsense!  Fixing those cosmetic issues is
>a waste of time.  We should focus on fixing actual bugs.  Could someone
>please help Eric collect a list of the various bug-fixes that should be
>back-ported to older revisions?  If you could find to which revision it
>should apply it's better, but if not, I'm confident Eric will find
>a neat heuristic to add to his reposurgeon to automatically find the
>right revision.

Hah!

It seems the farther we get in this process, the closer we get to
http://bzr.savannah.gnu.org/lh/emacs/trunk/annotate/head:/etc/future-bug
becoming reality, yikes...



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-02-18 16:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-17 15:29 Policty question - encoding to use in git repository? Eric S. Raymond
2014-02-17 15:41 ` Juanma Barranquero
2014-02-17 16:30   ` Eric S. Raymond
2014-02-17 17:03     ` Andreas Schwab
2014-02-17 17:12       ` Eric S. Raymond
2014-02-17 16:39 ` Karl Fogel
2014-02-17 16:52   ` Eli Zaretskii
2014-02-17 17:01     ` Eric S. Raymond
2014-02-17 17:04       ` Andreas Schwab
2014-02-17 17:24       ` Eli Zaretskii
2014-02-17 19:22 ` Ivan Kanis
2014-02-17 21:47   ` Paul Eggert
2014-02-17 22:08     ` Eric S. Raymond
2014-02-18  0:50     ` Glenn Morris
2014-02-18  1:07       ` Stefan Monnier
2014-02-18 16:30         ` Karl Fogel
2014-02-18  6:40     ` David Kastrup

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.