unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: etags test is broken on MS-Windows
       [not found] ` <555A8E62.7060700@cs.ucla.edu>
@ 2015-05-19 15:27   ` Eli Zaretskii
  2015-05-19 17:57     ` Paul Eggert
                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-19 15:27 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

[Sorry, I didn't mean to discuss this in private, I just forgot to CC
the list.  Adding it now, and repeating my original message.]

I wrote:

> > Commit e0117b1 changed the new etags test suite in a way that makes it
> > always be skipped on MS-Windows (and in general on any platform that
> > doesn't have the 'locale' command or doesn't have a UTF-8 locale
> > installed).
> > 
> > I don't understand why a test suite needs to use UTF-8, but I don't
> > really mind as long as the tests can run on all supported platforms.
> > Can we fix the test to not require these features, please?

And Paul answered:

> Date: Mon, 18 May 2015 18:14:10 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> > I don't
> > really mind as long as the tests can run on all supported platforms.
> 
> Without that patch, the tests failed on my GNU/Linux host due to encoding 
> problems.  See attached file

I don't think it's due to encoding problem.  (AFAIK, etags doesn't
regard its input as characters, but as a stream of bytes.)

I think it's due to DOS CR-LF EOL format of some files in the test
suite.  For example, the first file whose tags were different in your
testing is dostorture.c, which has DOS EOLs, the second file, c.C, has
a lone ^M character at the end of one of its lines, and so on.

Could you please verify that this is indeed the source of the problem?

(There's also an unrelated problem with the gzip-compressed file in
f-src, which seems to be some Windows-specific glitch; I will look
into it separately.)

> > Can we fix the test to not require these features, please?
> 
> I don't know what will work on MS-Windows, but I checked in a stab
> at it.

Thanks, it works now, but I have the same problems due to EOL format,
and in the same files, just in reverse.

If we agree that the problem is due to EOL format, we could try
thinking about a solution.  The root cause for the problem is that on
Windows, etags accounts for the stripped CR characters, while on Unix
it treats them as part of the contents, so the byte counts are offset
by the number of the preceding lines.

> If this fails, I suggest removing all the non-ASCII characters from
> these test files and then regenerating the "good" data to match.

I don't see this as necessary, not yet.

Thanks.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-19 15:27   ` etags test is broken on MS-Windows Eli Zaretskii
@ 2015-05-19 17:57     ` Paul Eggert
  2015-05-19 18:26       ` Eli Zaretskii
  2015-05-20 15:38     ` Eli Zaretskii
  2015-05-21 13:16     ` Francesco Potortì
  2 siblings, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-19 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 05/19/2015 08:27 AM, Eli Zaretskii wrote:
> I think it's due to DOS CR-LF EOL format of some files in the test suite. 

You're right, I misdiagnosed the porting problem.  Sorry about that.


> If we agree that the problem is due to EOL format, we could try 
> thinking about a solution. The root cause for the problem is that on 
> Windows, etags accounts for the stripped CR characters, while on Unix 
> it treats them as part of the contents, so the byte counts are offset 
> by the number of the preceding lines.

That sounds like a problem, but not a problem that the test case is 
trying to detect.  A simple way that should cajole the tests into 
passing is to remove the trailing CRs from the test data, so I installed 
a patch to do that.  If we ever want to make ctags output portable among 
Unix vs DOS conventions we can bring back test cases involving CRs.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-19 17:57     ` Paul Eggert
@ 2015-05-19 18:26       ` Eli Zaretskii
  0 siblings, 0 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-19 18:26 UTC (permalink / raw)
  To: Paul Eggert, Francesco Potortì; +Cc: emacs-devel

> Date: Tue, 19 May 2015 10:57:57 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: emacs-devel@gnu.org
> 
> On 05/19/2015 08:27 AM, Eli Zaretskii wrote:
> > I think it's due to DOS CR-LF EOL format of some files in the test suite. 
> 
> You're right, I misdiagnosed the porting problem.  Sorry about that.

Well, I should have thought about that (and tested it) before
committing the test suite in the first place.

> > If we agree that the problem is due to EOL format, we could try 
> > thinking about a solution. The root cause for the problem is that on 
> > Windows, etags accounts for the stripped CR characters, while on Unix 
> > it treats them as part of the contents, so the byte counts are offset 
> > by the number of the preceding lines.
> 
> That sounds like a problem, but not a problem that the test case is 
> trying to detect.  A simple way that should cajole the tests into 
> passing is to remove the trailing CRs from the test data, so I installed 
> a patch to do that.

Thanks.

I'm not sure the test suite wasn't trying to test this, though:
dostorture.c seems to be an exact copy of torture.c, except for the
EOL format.

Francesco, can you please comment on this?  Given that the Unix build
of etags does not remove the CR characters from DOS CR-LF EOLs, what
was the purpose of including files with DOS EOLs in the test suite?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-19 15:27   ` etags test is broken on MS-Windows Eli Zaretskii
  2015-05-19 17:57     ` Paul Eggert
@ 2015-05-20 15:38     ` Eli Zaretskii
  2015-05-21  5:05       ` Paul Eggert
  2015-05-21 13:24       ` Francesco Potortì
  2015-05-21 13:16     ` Francesco Potortì
  2 siblings, 2 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-20 15:38 UTC (permalink / raw)
  To: Paul Eggert, Francesco Potortì; +Cc: emacs-devel

> Date: Tue, 19 May 2015 18:27:44 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> (There's also an unrelated problem with the gzip-compressed file in
> f-src, which seems to be some Windows-specific glitch; I will look
> into it separately.)

I found the reason for this: etags calls 'rewind' on a FILE stream
that was created by 'popen', which is non-portable, AFAIK.  On
Windows, that caused the initial portions of the input to be skipped
by etags, i.e. some symbols were not tagged.

There are a few comments about that in the source, like this:

  /* We rewind here, even if inf may be a pipe.  We fail if the
     length of the first line is longer than the pipe block size,
     which is unlikely. */
  rewind (inf);

These comments notwithstanding, it sounds like etags expects this to
work satisfactorily at least on GNU/Linux, and at least when "length
of the first line is not longer than the pipe block size", otherwise I
don't understand why the test suite includes gzip-compressed files
(Francesco?).

So, on the assumption that this does work on Posix hosts, at least
those that use glibc, I hacked etags to provide a Windows-specific
replacement for 'rewind' that supports this expectation, assuming the
stuff read and buffered before the call to 'rewind' is less than a
full buffer of the FILE object.  Then the Windows build no longer
misses symbols in the first part of the compressed files.

However, now I see something strange in the ETAGS.good files, which
AFAIU were produced by Paul on a Posix host.  Please look at this
excerpt from ETAGS.good_1:

f-src/entry.for,172
      LOGICAL FUNCTION PRTPKG ^?3,75
       ENTRY  SETPRT ^?194,3866
       ENTRY  MSGSEL ^?395,8478
     & intensity1(^?577,12231
       character*(*) function foo(^?579,12307
^L
f-src/entry.strange_suffix,172
      LOGICAL FUNCTION PRTPKG ^?3,75
       ENTRY  SETPRT ^?194,3866
       ENTRY  MSGSEL ^?395,8478
     & intensity1(^?577,12231
       character*(*) function foo(^?579,12307
^L
f-src/entry.strange,171
      LOGICAL FUNCTION PRTPKG ^?2,2
       ENTRY  SETPRT ^?193,3793
       ENTRY  MSGSEL ^?394,8405
     & intensity1(^?576,12158
       character*(*) function foo(^?578,12234

Now, these 3 files have exactly identical contents, and the _only_
difference between the first 2 and the 3rd is that the latter is
gzip-compressed.  So that should be the only reason why all its line
counts are off by 1, and its byte counts are all off by 73, which just
happens to be the length of the first line of the (uncompressed) file.

So could it be that rewinding a 'popen'-created stream doesn't work
correctly on GNU/Linux as well?  If so, we will have to make changes
in etags to not do that, I think, and instead reuse the already-read
stuff as needed.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-20 15:38     ` Eli Zaretskii
@ 2015-05-21  5:05       ` Paul Eggert
  2015-05-21 13:24       ` Francesco Potortì
  1 sibling, 0 replies; 51+ messages in thread
From: Paul Eggert @ 2015-05-21  5:05 UTC (permalink / raw)
  To: Eli Zaretskii, Francesco Potortì; +Cc: emacs-devel

Eli Zaretskii wrote:
> So could it be that rewinding a 'popen'-created stream doesn't work
> correctly on GNU/Linux as well?  If so, we will have to make changes
> in etags to not do that, I think

I think you're right.  The behavior of rewind on pipes is implementation-defined 
and etags shouldn't rely on it.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-19 15:27   ` etags test is broken on MS-Windows Eli Zaretskii
  2015-05-19 17:57     ` Paul Eggert
  2015-05-20 15:38     ` Eli Zaretskii
@ 2015-05-21 13:16     ` Francesco Potortì
  2015-05-21 16:31       ` Eli Zaretskii
  2 siblings, 1 reply; 51+ messages in thread
From: Francesco Potortì @ 2015-05-21 13:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, emacs-devel

>> Without that patch, the tests failed on my GNU/Linux host due to encoding 
>> problems.  See attached file
>
>I don't think it's due to encoding problem.  (AFAIK, etags doesn't
>regard its input as characters, but as a stream of bytes.)

Ye, etags has no notion of character sets.

>I think it's due to DOS CR-LF EOL format of some files in the test
>suite.  For example, the first file whose tags were different in your
>testing is dostorture.c, which has DOS EOLs, the second file, c.C, has
>a lone ^M character at the end of one of its lines, and so on.
>
>Could you please verify that this is indeed the source of the problem?

Those files were put there to test the behaviour of etags with different
EOL styles. However, few tests were in fact done for etags running on
DOS systems, so in fact there may be undetected regressions on etags for
DOS.

About utf-8, etags' behaviour should be independent of locale...



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-20 15:38     ` Eli Zaretskii
  2015-05-21  5:05       ` Paul Eggert
@ 2015-05-21 13:24       ` Francesco Potortì
  2015-05-21 16:49         ` Eli Zaretskii
  1 sibling, 1 reply; 51+ messages in thread
From: Francesco Potortì @ 2015-05-21 13:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, emacs-devel

>> (There's also an unrelated problem with the gzip-compressed file in
>> f-src, which seems to be some Windows-specific glitch; I will look
>> into it separately.)
>
>I found the reason for this: etags calls 'rewind' on a FILE stream
>that was created by 'popen', which is non-portable, AFAIK.  On
>Windows, that caused the initial portions of the input to be skipped
>by etags, i.e. some symbols were not tagged.
>
>There are a few comments about that in the source, like this:
>
>  /* We rewind here, even if inf may be a pipe.  We fail if the
>     length of the first line is longer than the pipe block size,
>     which is unlikely. */
>  rewind (inf);
>
>These comments notwithstanding, it sounds like etags expects this to
>work satisfactorily at least on GNU/Linux, and at least when "length
>of the first line is not longer than the pipe block size", otherwise I
>don't understand why the test suite includes gzip-compressed files
>(Francesco?).

Yes, that's it.  I implemented the gzip feature myself, and I included
tests for it.  A source file should produce the same tags whether
compressed or not.

>So, on the assumption that this does work on Posix hosts, at least
>those that use glibc, I hacked etags to provide a Windows-specific
>replacement for 'rewind' that supports this expectation, assuming the
>stuff read and buffered before the call to 'rewind' is less than a
>full buffer of the FILE object.  Then the Windows build no longer
>misses symbols in the first part of the compressed files.
>
>However, now I see something strange in the ETAGS.good files, which
>AFAIU were produced by Paul on a Posix host.  Please look at this
>excerpt from ETAGS.good_1:
>
>f-src/entry.for,172
>      LOGICAL FUNCTION PRTPKG ^?3,75
>       ENTRY  SETPRT ^?194,3866
>       ENTRY  MSGSEL ^?395,8478
>     & intensity1(^?577,12231
>       character*(*) function foo(^?579,12307
>^L
>f-src/entry.strange_suffix,172
>      LOGICAL FUNCTION PRTPKG ^?3,75
>       ENTRY  SETPRT ^?194,3866
>       ENTRY  MSGSEL ^?395,8478
>     & intensity1(^?577,12231
>       character*(*) function foo(^?579,12307
>^L
>f-src/entry.strange,171
>      LOGICAL FUNCTION PRTPKG ^?2,2
>       ENTRY  SETPRT ^?193,3793
>       ENTRY  MSGSEL ^?394,8405
>     & intensity1(^?576,12158
>       character*(*) function foo(^?578,12234
>
>Now, these 3 files have exactly identical contents, and the _only_
>difference between the first 2 and the 3rd is that the latter is
>gzip-compressed.  So that should be the only reason why all its line
>counts are off by 1, and its byte counts are all off by 73, which just
>happens to be the length of the first line of the (uncompressed) file.

This is a bug.

>So could it be that rewinding a 'popen'-created stream doesn't work
>correctly on GNU/Linux as well?  If so, we will have to make changes
>in etags to not do that, I think, and instead reuse the already-read
>stuff as needed.

It could well be.  It may have happened that, when I checked that the
TAGS files were the same, I just looked at them without running diff and
I missed this discrepancy.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 13:16     ` Francesco Potortì
@ 2015-05-21 16:31       ` Eli Zaretskii
  2015-05-21 16:37         ` Paul Eggert
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-21 16:31 UTC (permalink / raw)
  To: Francesco Potortì; +Cc: eggert, emacs-devel

> Date: Thu, 21 May 2015 15:16:01 +0200
> From: Francesco Potortì <pot@gnu.org>
> Cc: emacs-devel@gnu.org, Paul Eggert <eggert@cs.ucla.edu>
> 
> >I think it's due to DOS CR-LF EOL format of some files in the test
> >suite.  For example, the first file whose tags were different in your
> >testing is dostorture.c, which has DOS EOLs, the second file, c.C, has
> >a lone ^M character at the end of one of its lines, and so on.
> >
> >Could you please verify that this is indeed the source of the problem?
> 
> Those files were put there to test the behaviour of etags with different
> EOL styles. However, few tests were in fact done for etags running on
> DOS systems, so in fact there may be undetected regressions on etags for
> DOS.

There's no problem with etags on DOS and Windows, it behaves exactly
as designed and implemented.  The problem is on Unix: because etags on
Unix does not strip the CR characters, its character counts are wrong,
because Emacs will strip them when it reads the source file.

IOW, what was at some point only done by Emacs on DOS and Windows, is
now done by default on all platforms.  So I think etags should use teh
same code in Unix as well.  I mean this fragment:

	if (c == '\n')
	  {
	    if (p > buffer && p[-1] == '\r')
	      {
		p -= 1;
  #ifdef DOS_NT
	       /* Assume CRLF->LF translation will be performed by Emacs
		  when loading this file, so CRs won't appear in the buffer.
		  It would be cleaner to compensate within Emacs;
		  however, Emacs does not know how many CRs were deleted
		  before any given point in the file.  */
		chars_deleted = 1;
  #else
		chars_deleted = 2;
  #endif
	      }




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 16:31       ` Eli Zaretskii
@ 2015-05-21 16:37         ` Paul Eggert
  2015-05-21 16:55           ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-21 16:37 UTC (permalink / raw)
  To: Eli Zaretskii, Francesco Potortì; +Cc: emacs-devel

On 05/21/2015 09:31 AM, Eli Zaretskii wrote:
> I think etags should use teh
> same code in Unix as well.  I mean this fragment:
>
> 	if (c == '\n')
> 	  {
> 	    if (p > buffer && p[-1] == '\r')
> 	      {
> 		p -= 1;
>    #ifdef DOS_NT
> 	       /* Assume CRLF->LF translation will be performed by Emacs
> 		  when loading this file, so CRs won't appear in the buffer.
> 		  It would be cleaner to compensate within Emacs;
> 		  however, Emacs does not know how many CRs were deleted
> 		  before any given point in the file.  */
> 		chars_deleted = 1;
>    #else
> 		chars_deleted = 2;
>    #endif
> 	      }

Sorry, I'm a little lost.  Would it actually work with an Emacs on a 
GNUish host if we simply set chars_deleted = 1 here?

If etags is locale-agnostic, its output files must contain byte counts 
and not character counts.  This is because etags doesn't even know where 
the characters are.  And if the output files contain byte counts, surely 
they need to count the CR bytes as well as the LF bytes, at least on a 
GNU or POSIXish host.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 13:24       ` Francesco Potortì
@ 2015-05-21 16:49         ` Eli Zaretskii
  2015-05-23  8:46           ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-21 16:49 UTC (permalink / raw)
  To: Francesco Potortì; +Cc: eggert, emacs-devel

> Date: Thu, 21 May 2015 15:24:44 +0200
> From: Francesco Potortì <pot@gnu.org>
> Cc: emacs-devel@gnu.org, Paul Eggert <eggert@cs.ucla.edu>
> 
> >f-src/entry.for,172
> >      LOGICAL FUNCTION PRTPKG ^?3,75
> >       ENTRY  SETPRT ^?194,3866
> >       ENTRY  MSGSEL ^?395,8478
> >     & intensity1(^?577,12231
> >       character*(*) function foo(^?579,12307
> >^L
> >f-src/entry.strange_suffix,172
> >      LOGICAL FUNCTION PRTPKG ^?3,75
> >       ENTRY  SETPRT ^?194,3866
> >       ENTRY  MSGSEL ^?395,8478
> >     & intensity1(^?577,12231
> >       character*(*) function foo(^?579,12307
> >^L
> >f-src/entry.strange,171
> >      LOGICAL FUNCTION PRTPKG ^?2,2
> >       ENTRY  SETPRT ^?193,3793
> >       ENTRY  MSGSEL ^?394,8405
> >     & intensity1(^?576,12158
> >       character*(*) function foo(^?578,12234
> >
> >Now, these 3 files have exactly identical contents, and the _only_
> >difference between the first 2 and the 3rd is that the latter is
> >gzip-compressed.  So that should be the only reason why all its line
> >counts are off by 1, and its byte counts are all off by 73, which just
> >happens to be the length of the first line of the (uncompressed) file.
> 
> This is a bug.
> 
> >So could it be that rewinding a 'popen'-created stream doesn't work
> >correctly on GNU/Linux as well?  If so, we will have to make changes
> >in etags to not do that, I think, and instead reuse the already-read
> >stuff as needed.
> 
> It could well be.  It may have happened that, when I checked that the
> TAGS files were the same, I just looked at them without running diff and
> I missed this discrepancy.

After thinking a bit about the alternative solution, I concluded that
the simplest will be to decompress to a temporary file and read from
there.  Does the patch below look OK?  Or can someone think about a
more elegant way of solving this?


diff --git a/lib-src/etags.c b/lib-src/etags.c
index 0a308c1..28729da 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -116,6 +116,7 @@ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 # undef HAVE_NTGUI
 # undef  DOS_NT
 # define DOS_NT
+# define O_CLOEXEC O_NOINHERIT
 #endif /* WINDOWSNT */
 
 #include <unistd.h>
@@ -125,6 +126,7 @@ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 #include <sysstdio.h>
 #include <ctype.h>
 #include <errno.h>
+#include <fcntl.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <binary-io.h>
@@ -336,6 +338,7 @@ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 static char *absolute_dirname (char *, char *);
 static bool filename_is_absolute (char *f);
 static void canonicalize_filename (char *);
+static char *etags_mktmp (void);
 static void linebuffer_init (linebuffer *);
 static void linebuffer_setlen (linebuffer *, int);
 static void *xmalloc (size_t);
@@ -1437,7 +1440,7 @@ C code are parsed as C code (use --help --lang=c --lang=yacc\n\
   fdesc *fdp;
   compressor *compr;
   char *compressed_name, *uncompressed_name;
-  char *ext, *real_name;
+  char *ext, *real_name, *tmp_name;
   int retval;
 
   canonicalize_filename (file);
@@ -1522,9 +1525,20 @@ C code are parsed as C code (use --help --lang=c --lang=yacc\n\
     }
   if (real_name == compressed_name)
     {
-      char *cmd = concat (compr->command, " ", real_name);
-      inf = popen (cmd, "r" FOPEN_BINARY);
-      free (cmd);
+      tmp_name = etags_mktmp ();
+      if (!tmp_name)
+	inf = NULL;
+      else
+	{
+	  char *cmd1 = concat (compr->command, " ", real_name);
+	  char *cmd = concat (cmd1, " > ", tmp_name);
+	  free (cmd1);
+	  if (system (cmd) == -1)
+	    inf = NULL;
+	  else
+	    inf = fopen (tmp_name, "r" FOPEN_BINARY);
+	  free (cmd);
+	}
     }
   else
     inf = fopen (real_name, "r" FOPEN_BINARY);
@@ -1536,10 +1550,12 @@ C code are parsed as C code (use --help --lang=c --lang=yacc\n\
 
   process_file (inf, uncompressed_name, lang);
 
+  retval = fclose (inf);
   if (real_name == compressed_name)
-    retval = pclose (inf);
-  else
-    retval = fclose (inf);
+    {
+      remove (tmp_name);
+      free (tmp_name);
+    }
   if (retval < 0)
     pfatal (file);
 
@@ -1707,9 +1723,6 @@ C code are parsed as C code (use --help --lang=c --lang=yacc\n\
 	}
     }
 
-  /* We rewind here, even if inf may be a pipe.  We fail if the
-     length of the first line is longer than the pipe block size,
-     which is unlikely. */
   rewind (inf);
 
   /* Else try to guess the language given the case insensitive file name. */
@@ -1734,8 +1747,6 @@ C code are parsed as C code (use --help --lang=c --lang=yacc\n\
       if (old_last_node == last_node)
 	/* No Fortran entries found.  Try C. */
 	{
-	  /* We do not tag if rewind fails.
-	     Only the file name will be recorded in the tags file. */
 	  rewind (inf);
 	  curfdp->lang = get_language_from_langname (cplusplus ? "c++" : "c");
 	  find_entries (inf);
@@ -5015,8 +5026,6 @@ enum,		0,			st_C_enum
       TEX_opgrp = '<';
       TEX_clgrp = '>';
     }
-  /* If the input file is compressed, inf is a pipe, and rewind may fail.
-     No attempt is made to correct the situation. */
   rewind (inf);
 }
 
@@ -6344,6 +6353,51 @@ enum,		0,			st_C_enum
   return path;
 }
 
+/* Return a newly allocated string containing a name of a temporary file.  */
+static char *
+etags_mktmp (void)
+{
+  const char *tmpdir = getenv ("TMPDIR");
+  const char *slash = "/";
+
+#if MSDOS || defined (DOS_NT)
+  if (!tmpdir)
+    tmpdir = getenv ("TEMP");
+  if (!tmpdir)
+    tmpdir = getenv ("TMP");
+  if (!tmpdir)
+    tmpdir = ".";
+  if (tmpdir[strlen (tmpdir) - 1] == '/'
+      || tmpdir[strlen (tmpdir) - 1] == '\\')
+    slash = "";
+#else
+  if (!tmpdir)
+    tmpdir = "/tmp";
+  if (tmpdir[strlen (tmpdir) - 1] == '/')
+    slash = "";
+#endif
+
+  char *templt = concat (tmpdir, slash, "etXXXXXX");
+  int fd = mkostemp (templt, O_CLOEXEC);
+  if (fd < 0)
+    {
+      free (templt);
+      templt = NULL;
+    }
+  else
+    close (fd);
+
+#if defined (DOS_NT)
+  /* The file name will be used in shell redirection, so it needs to have
+     DOS-style backslashes, or else the Windows shell will barf.  */
+  char *p;
+  for (p = templt; *p; p++)
+    if (*p == '/')
+      *p = '\\';
+#endif
+  return templt;
+}
+
 /* Return a newly allocated string containing the file name of FILE
    relative to the absolute directory DIR (which should end with a slash). */
 static char *




^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 16:37         ` Paul Eggert
@ 2015-05-21 16:55           ` Eli Zaretskii
  2015-05-21 19:03             ` Paul Eggert
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-21 16:55 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, emacs-devel

> Date: Thu, 21 May 2015 09:37:02 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
> > 	if (c == '\n')
> > 	  {
> > 	    if (p > buffer && p[-1] == '\r')
> > 	      {
> > 		p -= 1;
> >    #ifdef DOS_NT
> > 	       /* Assume CRLF->LF translation will be performed by Emacs
> > 		  when loading this file, so CRs won't appear in the buffer.
> > 		  It would be cleaner to compensate within Emacs;
> > 		  however, Emacs does not know how many CRs were deleted
> > 		  before any given point in the file.  */
> > 		chars_deleted = 1;
> >    #else
> > 		chars_deleted = 2;
> >    #endif
> > 	      }
> 
> Sorry, I'm a little lost.  Would it actually work with an Emacs on a 
> GNUish host if we simply set chars_deleted = 1 here?

I think it will, and that's what I was suggesting: remove the #ifdef
and use the code currently conditioned by DOS_NT.

> If etags is locale-agnostic, its output files must contain byte counts 
> and not character counts.

Yes, they are called "character counts", but are actually byte counts.

> And if the output files contain byte counts, surely they need to
> count the CR bytes as well as the LF bytes, at least on a GNU or
> POSIXish host.

I think CRs don't need to be counted, because they will not be in the
Emacs buffer when a DOS-ish file is visited, due to EOL decoding.

IOW, the "CRLF->LF translation" that the comment mentions is done on
all platforms.  Or am I missing something?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 16:55           ` Eli Zaretskii
@ 2015-05-21 19:03             ` Paul Eggert
  2015-05-21 19:54               ` Eli Zaretskii
  2015-05-22 12:40               ` Francesco Potortì
  0 siblings, 2 replies; 51+ messages in thread
From: Paul Eggert @ 2015-05-21 19:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 575 bytes --]

On 05/21/2015 09:55 AM, Eli Zaretskii wrote:
> IOW, the "CRLF->LF translation" that the comment mentions is done on
> all platforms.  Or am I missing something?

I was thinking about the case where a source file has mostly lines with 
LF but a few lines end in CRLF.  E.g., the attached file has a CR at the 
end of the second line.  In that case, Emacs doesn't strip the trailing 
CRs on GNU/Linux.  Wouldn't the byte counts get messed up then?

Come to think of it, one of the etags test cases did that before I 
removed the CR (and perhaps that was part of the test...).


[-- Attachment #2: xx.c --]
[-- Type: text/plain, Size: 23 bytes --]

int x;
char y;
int z;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 19:03             ` Paul Eggert
@ 2015-05-21 19:54               ` Eli Zaretskii
  2015-05-21 23:28                 ` Paul Eggert
  2015-05-22 12:40               ` Francesco Potortì
  1 sibling, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-21 19:54 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, emacs-devel

> Date: Thu, 21 May 2015 12:03:44 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: pot@gnu.org, emacs-devel@gnu.org
> 
> On 05/21/2015 09:55 AM, Eli Zaretskii wrote:
> > IOW, the "CRLF->LF translation" that the comment mentions is done on
> > all platforms.  Or am I missing something?
> 
> I was thinking about the case where a source file has mostly lines with 
> LF but a few lines end in CRLF.  E.g., the attached file has a CR at the 
> end of the second line.  In that case, Emacs doesn't strip the trailing 
> CRs on GNU/Linux.  Wouldn't the byte counts get messed up then?

Yes, they would, but it's not fatal, since etags.el searches around
the position for the pattern stated on the tag line.

And of course, in the case you present, the byte counts will be
slightly off on Windows as well.

But the way etags works currently, a file with all of its lines ending
in CRLF will _always_ have all of its byte counts messed up.  Not a
catastrophe, either, but still worse than under my suggestion.

> Come to think of it, one of the etags test cases did that before I 
> removed the CR (and perhaps that was part of the test...).

Yes, one of the files has a single line with CRLF (I thought it was
part of the test as well).



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 19:54               ` Eli Zaretskii
@ 2015-05-21 23:28                 ` Paul Eggert
  2015-05-22  8:32                   ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-21 23:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, emacs-devel

On 05/21/2015 12:54 PM, Eli Zaretskii wrote:
> Yes, they would, but it's not fatal, since etags.el searches around
> the position for the pattern stated on the tag line.
>
> And of course, in the case you present, the byte counts will be
> slightly off on Windows as well.
>
> But the way etags works currently, a file with all of its lines ending
> in CRLF will_always_  have all of its byte counts messed up.  Not a
> catastrophe, either, but still worse than under my suggestion.

I don't see why it's worth our trouble to substitute one incorrect 
solution for another, if it's OK that the solutions are approximate.

If it's important to fix this, how about the following idea instead.  
Have etags always compute byte offsets the POSIX way, counting any CRs, 
and put POSIX-oriented byte counts into the TAGS file (the way it 
already does on GNU hosts).  When Emacs starts up, if the source file is 
in DOS mode (with CRLF replaced by LF internally), Emacs subtracts the 
line count from the POSIX byte count, and uses the resulting byte count 
instead. That way, we don't need to change how etags works on GNU 
platforms, nor do we need to tell GNU users to regenerate their TAGS files.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 23:28                 ` Paul Eggert
@ 2015-05-22  8:32                   ` Eli Zaretskii
  2015-05-22 13:08                     ` Francesco Potortì
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22  8:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, emacs-devel

> Date: Thu, 21 May 2015 16:28:21 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: pot@gnu.org, emacs-devel@gnu.org
> 
> I don't see why it's worth our trouble to substitute one incorrect 
> solution for another, if it's OK that the solutions are approximate.

It's OK if we don't want to include back in the test suite the files
with DOS EOLs that caused the trouble in the first place.  If we don't
care about that subtle feature, I'm OK with the current code.  After
all, it worked nicely until now.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 19:03             ` Paul Eggert
  2015-05-21 19:54               ` Eli Zaretskii
@ 2015-05-22 12:40               ` Francesco Potortì
  1 sibling, 0 replies; 51+ messages in thread
From: Francesco Potortì @ 2015-05-22 12:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel

>On 05/21/2015 09:55 AM, Eli Zaretskii wrote:
>> IOW, the "CRLF->LF translation" that the comment mentions is done on
>> all platforms.  Or am I missing something?
>
>I was thinking about the case where a source file has mostly lines with 
>LF but a few lines end in CRLF.  E.g., the attached file has a CR at the 
>end of the second line.  In that case, Emacs doesn't strip the trailing 
>CRs on GNU/Linux.  Wouldn't the byte counts get messed up then?
>
>Come to think of it, one of the etags test cases did that before I 
>removed the CR (and perhaps that was part of the test...).
>
>int x;
>char y;
>int z;

It was definitely part of the test :)



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22  8:32                   ` Eli Zaretskii
@ 2015-05-22 13:08                     ` Francesco Potortì
  2015-05-22 13:19                       ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Francesco Potortì @ 2015-05-22 13:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, emacs-devel

>
>> Date: Thu, 21 May 2015 16:28:21 -0700
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> CC: pot@gnu.org, emacs-devel@gnu.org
>> 
>> I don't see why it's worth our trouble to substitute one incorrect 
>> solution for another, if it's OK that the solutions are approximate.
>
>It's OK if we don't want to include back in the test suite the files
>with DOS EOLs that caused the trouble in the first place.  If we don't
>care about that subtle feature, I'm OK with the current code.  After
>all, it worked nicely until now.

However, if in fact Emacs works the same on all platforms, maybe there
is no reason for Etags to compensate for differences that do not exist
(any more?).  In fact, as back as I can go with the etags.c sources, I
see that code has always been there, so unless I'm mistaken it's very
very old.

If what I write is correct, I'd go with removeing the different
treatment of crlf on dos and unix.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 13:08                     ` Francesco Potortì
@ 2015-05-22 13:19                       ` Eli Zaretskii
  2015-05-22 18:23                         ` Paul Eggert
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22 13:19 UTC (permalink / raw)
  To: Francesco Potortì; +Cc: eggert, emacs-devel

> Date: Fri, 22 May 2015 15:08:28 +0200
> From: Francesco Potortì <pot@gnu.org>
> Cc: emacs-devel@gnu.org, Paul Eggert <eggert@cs.ucla.edu>
> 
> However, if in fact Emacs works the same on all platforms, maybe there
> is no reason for Etags to compensate for differences that do not exist
> (any more?).  In fact, as back as I can go with the etags.c sources, I
> see that code has always been there, so unless I'm mistaken it's very
> very old.

Yes, the code is very old.

> If what I write is correct, I'd go with removeing the different
> treatment of crlf on dos and unix.

I agree, but it sounds like Paul doesn't.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 13:19                       ` Eli Zaretskii
@ 2015-05-22 18:23                         ` Paul Eggert
  2015-05-22 19:08                           ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-22 18:23 UTC (permalink / raw)
  To: Eli Zaretskii, Francesco Potortì; +Cc: emacs-devel

On 05/22/2015 06:19 AM, Eli Zaretskii wrote:
>> I'd go with removeing the different
>> >treatment of crlf on dos and unix.
> I agree, but it sounds like Paul doesn't.

Yes and no.  My understanding is that the code now works without 
glitches for files with a few stray CRLFs but has some glitches for 
files that consistently use CRLF, and that the change proposed in 
<http://lists.gnu.org/archive/html/emacs-devel/2015-05/msg00637.html> 
would introduce glitches in the former case while removing glitches in 
the latter.  I'd rather not trade one bug for another.  That is, if 
we're going to change this code at all, let's do it in a way that 
doesn't introduce glitches for the stray CRLF case.

One possible way to do that is suggested in the last paragraph of 
<http://lists.gnu.org/archive/html/emacs-devel/2015-05/msg00657.html>. 
This approach does remove the different treatment of CRLF on MS-Windows 
and Unix (as Francesco suggested); but it does so in a different way, by 
using the Unix convention everywhere, and it suggests an approach that 
should let Emacs do the right thing on both Unix and MS-Windows, without 
any glitches on either platform.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 18:23                         ` Paul Eggert
@ 2015-05-22 19:08                           ` Eli Zaretskii
  2015-05-22 19:25                             ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22 19:08 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, emacs-devel

> Date: Fri, 22 May 2015 11:23:09 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: emacs-devel@gnu.org
> 
> One possible way to do that is suggested in the last paragraph of 
> <http://lists.gnu.org/archive/html/emacs-devel/2015-05/msg00657.html>. 
> This approach does remove the different treatment of CRLF on MS-Windows 
> and Unix (as Francesco suggested); but it does so in a different way, by 
> using the Unix convention everywhere, and it suggests an approach that 
> should let Emacs do the right thing on both Unix and MS-Windows, without 
> any glitches on either platform.

These byte counts are not file byte counts (if they were, then what
you suggest would have been TRT).  They are buffer byte counts,
i.e. etags needs to compute the byte counts that Emacs will see when
the file is visited in an Emacs buffer.  So each CRLF EOL needs to be
counted as 1 byte, not 2.  Therefore, the DOS_NT code does TRT in this
case, for both Windows and Posix hosts, as amazing as it sounds.

It is, of course possible to let etags count file bytes, and then have
etags.el correct those to get buffer bytes instead.  But that doesn't
sound right to me: first "break" the perfectly correct code, and then
"unbreak" the result in Emacs.  To say nothing of the fact that
visiting a TAGS table will be slower that way.

However, the issue is minor, and I really don't want to waste any more
time arguing about it.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:08                           ` Eli Zaretskii
@ 2015-05-22 19:25                             ` Andreas Schwab
  2015-05-22 19:38                               ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-22 19:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, Paul Eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> These byte counts are not file byte counts (if they were, then what
> you suggest would have been TRT).  They are buffer byte counts,
> i.e. etags needs to compute the byte counts that Emacs will see when
> the file is visited in an Emacs buffer.  So each CRLF EOL needs to be
> counted as 1 byte, not 2.

What do you do about non-ASCII characters?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:25                             ` Andreas Schwab
@ 2015-05-22 19:38                               ` Eli Zaretskii
  2015-05-22 19:41                                 ` Andreas Schwab
  2015-05-22 19:42                                 ` Eli Zaretskii
  0 siblings, 2 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22 19:38 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  pot@gnu.org,  emacs-devel@gnu.org
> Date: Fri, 22 May 2015 21:25:59 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > These byte counts are not file byte counts (if they were, then what
> > you suggest would have been TRT).  They are buffer byte counts,
> > i.e. etags needs to compute the byte counts that Emacs will see when
> > the file is visited in an Emacs buffer.  So each CRLF EOL needs to be
> > counted as 1 byte, not 2.
> 
> What do you do about non-ASCII characters?

Etags counts bytes, not characters, so it doesn't matter.  Or maybe I
misunderstand the question.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:38                               ` Eli Zaretskii
@ 2015-05-22 19:41                                 ` Andreas Schwab
  2015-05-22 19:42                                 ` Eli Zaretskii
  1 sibling, 0 replies; 51+ messages in thread
From: Andreas Schwab @ 2015-05-22 19:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: Paul Eggert <eggert@cs.ucla.edu>,  pot@gnu.org,  emacs-devel@gnu.org
>> Date: Fri, 22 May 2015 21:25:59 +0200
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > These byte counts are not file byte counts (if they were, then what
>> > you suggest would have been TRT).  They are buffer byte counts,
                                                 ^^^^^^^^^^^^^^^^^^
>> > i.e. etags needs to compute the byte counts that Emacs will see when
>> > the file is visited in an Emacs buffer.  So each CRLF EOL needs to be
>> > counted as 1 byte, not 2.
>> 
>> What do you do about non-ASCII characters?
>
> Etags counts bytes, not characters, so it doesn't matter.

See above.  How does etags know how many bytes the characters will count
in an Emacs buffer?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:38                               ` Eli Zaretskii
  2015-05-22 19:41                                 ` Andreas Schwab
@ 2015-05-22 19:42                                 ` Eli Zaretskii
  2015-05-22 19:50                                   ` Andreas Schwab
  1 sibling, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22 19:42 UTC (permalink / raw)
  To: schwab; +Cc: pot, eggert, emacs-devel

> Date: Fri, 22 May 2015 22:38:15 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: pot@gnu.org, eggert@cs.ucla.edu, emacs-devel@gnu.org
> 
> > From: Andreas Schwab <schwab@linux-m68k.org>
> > Cc: Paul Eggert <eggert@cs.ucla.edu>,  pot@gnu.org,  emacs-devel@gnu.org
> > Date: Fri, 22 May 2015 21:25:59 +0200
> > 
> > Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > These byte counts are not file byte counts (if they were, then what
> > > you suggest would have been TRT).  They are buffer byte counts,
> > > i.e. etags needs to compute the byte counts that Emacs will see when
> > > the file is visited in an Emacs buffer.  So each CRLF EOL needs to be
> > > counted as 1 byte, not 2.
> > 
> > What do you do about non-ASCII characters?
> 
> Etags counts bytes, not characters, so it doesn't matter.  Or maybe I
> misunderstand the question.

Or maybe you mean the use case where a Latin-1 file is read into an
Emacs buffer, and each non-ASCII character is expanded into a UTF-8
sequence.  Indeed, that will make the byte counts inaccurate (and
etags.el will have to compensate by searching around the specified
place).  One more reason not to change anything, I guess.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:42                                 ` Eli Zaretskii
@ 2015-05-22 19:50                                   ` Andreas Schwab
  2015-05-22 20:05                                     ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-22 19:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Or maybe you mean the use case where a Latin-1 file is read into an
> Emacs buffer, and each non-ASCII character is expanded into a UTF-8
> sequence.  Indeed, that will make the byte counts inaccurate (and
> etags.el will have to compensate by searching around the specified
> place).  One more reason not to change anything, I guess.

??? It's exactly the counter argument.  The indices in the tag file must
be file offsets, everything else will lead to wrong offsets.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 19:50                                   ` Andreas Schwab
@ 2015-05-22 20:05                                     ` Eli Zaretskii
  2015-05-22 20:30                                       ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-22 20:05 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 22 May 2015 21:50:48 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Or maybe you mean the use case where a Latin-1 file is read into an
> > Emacs buffer, and each non-ASCII character is expanded into a UTF-8
> > sequence.  Indeed, that will make the byte counts inaccurate (and
> > etags.el will have to compensate by searching around the specified
> > place).  One more reason not to change anything, I guess.
> 
> ??? It's exactly the counter argument.  The indices in the tag file must
> be file offsets, everything else will lead to wrong offsets.

If by "file offsets" you mean counting bytes in the file, then those
will also be wrong after decoding non-ASCII characters, unless the
file was encoded in UTF-8 to begin with, right?

And if you mean counting characters in the file, then etags will be
unable to do that, unless it grows the capability to detect the
encoding of the file, or rely on the locale and assume that the file
is encoded in locale's codeset.  Right?

Or am I again missing something?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 20:05                                     ` Eli Zaretskii
@ 2015-05-22 20:30                                       ` Andreas Schwab
  2015-05-22 21:26                                         ` Paul Eggert
  2015-05-23  6:39                                         ` Eli Zaretskii
  0 siblings, 2 replies; 51+ messages in thread
From: Andreas Schwab @ 2015-05-22 20:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> If by "file offsets" you mean counting bytes in the file,

Of course!

> then those will also be wrong after decoding non-ASCII characters,
> unless the file was encoded in UTF-8 to begin with, right?

Yes, of course.  Emacs will have to cope.

> And if you mean counting characters in the file,

This is impossible to do.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 20:30                                       ` Andreas Schwab
@ 2015-05-22 21:26                                         ` Paul Eggert
  2015-05-23  6:40                                           ` Eli Zaretskii
  2015-05-23  6:39                                         ` Eli Zaretskii
  1 sibling, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-22 21:26 UTC (permalink / raw)
  To: Andreas Schwab, Eli Zaretskii; +Cc: pot, emacs-devel

On 05/22/2015 01:30 PM, Andreas Schwab wrote:
>> then those will also be wrong after decoding non-ASCII characters,
>> >unless the file was encoded in UTF-8 to begin with, right?
> Yes, of course.  Emacs will have to cope.
>

Andreas is right, as usual.  TAGS should contain hard info about file 
contents, not guesswork about what Emacs's internal encoding might be, 
as the latter depends on user input.  If the input file is UTF-8 and 
isn't munged by CRLF removal etc., file byte offsets should equal buffer 
byte offsets.  If not, it's up to Emacs to map the hard info to its 
internal representation.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 20:30                                       ` Andreas Schwab
  2015-05-22 21:26                                         ` Paul Eggert
@ 2015-05-23  6:39                                         ` Eli Zaretskii
  2015-05-23  8:02                                           ` Andreas Schwab
  1 sibling, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23  6:39 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Fri, 22 May 2015 22:30:57 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > If by "file offsets" you mean counting bytes in the file,
> 
> Of course!
> 
> > then those will also be wrong after decoding non-ASCII characters,
> > unless the file was encoded in UTF-8 to begin with, right?
> 
> Yes, of course.  Emacs will have to cope.

OK, then how do you go from "byte offsets will be wrong" and "Emacs
will have to cope" to this:

> ??? It's exactly the counter argument.  The indices in the tag file must
> be file offsets, everything else will lead to wrong offsets.

This seems to say that such byte offsets are the only "right" ones.
But we have just established that all of the byte offsets discussed
here, including the ones currently produced by etags, are wrong in
some sense.  What makes this "wrong" be "the only right one"?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-22 21:26                                         ` Paul Eggert
@ 2015-05-23  6:40                                           ` Eli Zaretskii
  0 siblings, 0 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23  6:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, schwab, emacs-devel

> Date: Fri, 22 May 2015 14:26:27 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: pot@gnu.org, emacs-devel@gnu.org
> 
> On 05/22/2015 01:30 PM, Andreas Schwab wrote:
> >> then those will also be wrong after decoding non-ASCII characters,
> >> >unless the file was encoded in UTF-8 to begin with, right?
> > Yes, of course.  Emacs will have to cope.
> >
> 
> Andreas is right, as usual.  TAGS should contain hard info about file 
> contents, not guesswork about what Emacs's internal encoding might be, 
> as the latter depends on user input.  If the input file is UTF-8 and 
> isn't munged by CRLF removal etc., file byte offsets should equal buffer 
> byte offsets.  If not, it's up to Emacs to map the hard info to its 
> internal representation.

I don't see how this is better than what we have already, but I don't
mind such a change.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  6:39                                         ` Eli Zaretskii
@ 2015-05-23  8:02                                           ` Andreas Schwab
  2015-05-23  8:27                                             ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23  8:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> This seems to say that such byte offsets are the only "right" ones.

Right.

> But we have just established that all of the byte offsets discussed
> here, including the ones currently produced by etags, are wrong in

??? etags _does_ produce byte offsets, currently.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  8:02                                           ` Andreas Schwab
@ 2015-05-23  8:27                                             ` Eli Zaretskii
  2015-05-23  9:41                                               ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23  8:27 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 10:02:52 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This seems to say that such byte offsets are the only "right" ones.
> 
> Right.
> 
> > But we have just established that all of the byte offsets discussed
> > here, including the ones currently produced by etags, are wrong in
> 
> ??? etags _does_ produce byte offsets, currently.

The issue is their accuracy, not their existence or being byte
offsets.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-21 16:49         ` Eli Zaretskii
@ 2015-05-23  8:46           ` Eli Zaretskii
  0 siblings, 0 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23  8:46 UTC (permalink / raw)
  To: pot; +Cc: eggert, emacs-devel

> Date: Thu, 21 May 2015 19:49:22 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org
> 
> > >Now, these 3 files have exactly identical contents, and the _only_
> > >difference between the first 2 and the 3rd is that the latter is
> > >gzip-compressed.  So that should be the only reason why all its line
> > >counts are off by 1, and its byte counts are all off by 73, which just
> > >happens to be the length of the first line of the (uncompressed) file.
> > 
> > This is a bug.
> > 
> > >So could it be that rewinding a 'popen'-created stream doesn't work
> > >correctly on GNU/Linux as well?  If so, we will have to make changes
> > >in etags to not do that, I think, and instead reuse the already-read
> > >stuff as needed.
> > 
> > It could well be.  It may have happened that, when I checked that the
> > TAGS files were the same, I just looked at them without running diff and
> > I missed this discrepancy.
> 
> After thinking a bit about the alternative solution, I concluded that
> the simplest will be to decompress to a temporary file and read from
> there.  Does the patch below look OK?  Or can someone think about a
> more elegant way of solving this?

No comments, so I pushed these changes.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  8:27                                             ` Eli Zaretskii
@ 2015-05-23  9:41                                               ` Andreas Schwab
  2015-05-23  9:49                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23  9:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> The issue is their accuracy, not their existence or being byte
> offsets.

Byte offsets are always accurate.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  9:41                                               ` Andreas Schwab
@ 2015-05-23  9:49                                                 ` Eli Zaretskii
  2015-05-23  9:59                                                   ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23  9:49 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 11:41:45 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The issue is their accuracy, not their existence or being byte
> > offsets.
> 
> Byte offsets are always accurate.

Not if they are supposed to be byte counts in an Emacs buffer.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  9:49                                                 ` Eli Zaretskii
@ 2015-05-23  9:59                                                   ` Andreas Schwab
  2015-05-23 10:20                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23  9:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Not if they are supposed to be byte counts in an Emacs buffer.

This concept doesn't make sense.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23  9:59                                                   ` Andreas Schwab
@ 2015-05-23 10:20                                                     ` Eli Zaretskii
  2015-05-23 10:54                                                       ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 10:20 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 11:59:24 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Not if they are supposed to be byte counts in an Emacs buffer.
> 
> This concept doesn't make sense.

Nonetheless, that's what etags attempts to compute.  Its byte counts
are prepared for consumption by etags.el, not by humans.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 10:20                                                     ` Eli Zaretskii
@ 2015-05-23 10:54                                                       ` Andreas Schwab
  2015-05-23 11:31                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23 10:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Nonetheless, that's what etags attempts to compute.

That's why it fails.  Emacs has removed this API for good reason.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 10:54                                                       ` Andreas Schwab
@ 2015-05-23 11:31                                                         ` Eli Zaretskii
  2015-05-23 12:10                                                           ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 11:31 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 12:54:34 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Nonetheless, that's what etags attempts to compute.
> 
> That's why it fails.

It doesn't fail.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 11:31                                                         ` Eli Zaretskii
@ 2015-05-23 12:10                                                           ` Andreas Schwab
  2015-05-23 13:46                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23 12:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> Date: Sat, 23 May 2015 12:54:34 +0200
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Nonetheless, that's what etags attempts to compute.
>> 
>> That's why it fails.
>
> It doesn't fail.

In which way does it not fail?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 12:10                                                           ` Andreas Schwab
@ 2015-05-23 13:46                                                             ` Eli Zaretskii
  2015-05-23 17:27                                                               ` Andreas Schwab
  2015-05-23 19:01                                                               ` Paul Eggert
  0 siblings, 2 replies; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 13:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 14:10:27 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Andreas Schwab <schwab@linux-m68k.org>
> >> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> >> Date: Sat, 23 May 2015 12:54:34 +0200
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > Nonetheless, that's what etags attempts to compute.
> >> 
> >> That's why it fails.
> >
> > It doesn't fail.
> 
> In which way does it not fail?

There was never a requirement for etags to compute precise byte
positions for Emacs, mainly because source files get changed, and we
don't want users to have to re-run etags upon every change.  The
function 'etags-goto-tag-location' will look around that position in a
progressively-expanding window.  You will also see there that the
position in TAGS is interpreted as a character position, which already
introduces inaccuracies.

So etags is only required to produce approximately correct byte
positions, and it does that well enough.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 13:46                                                             ` Eli Zaretskii
@ 2015-05-23 17:27                                                               ` Andreas Schwab
  2015-05-23 17:37                                                                 ` Eli Zaretskii
  2015-05-23 19:01                                                               ` Paul Eggert
  1 sibling, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23 17:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> So etags is only required to produce approximately correct byte
> positions, and it does that well enough.

If you want approximate positions then a line-column pair would be
better.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 17:27                                                               ` Andreas Schwab
@ 2015-05-23 17:37                                                                 ` Eli Zaretskii
  2015-05-23 18:46                                                                   ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 17:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 19:27:43 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So etags is only required to produce approximately correct byte
> > positions, and it does that well enough.
> 
> If you want approximate positions then a line-column pair would be
> better.

How do you define a column count?  Doesn't that require counting
characters, rather than bytes?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 17:37                                                                 ` Eli Zaretskii
@ 2015-05-23 18:46                                                                   ` Andreas Schwab
  2015-05-23 19:04                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2015-05-23 18:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, eggert, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> Date: Sat, 23 May 2015 19:27:43 +0200
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > So etags is only required to produce approximately correct byte
>> > positions, and it does that well enough.
>> 
>> If you want approximate positions then a line-column pair would be
               ^^^^^^^^^^^
>> better.
>
> How do you define a column count?

It doesn't matter.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 13:46                                                             ` Eli Zaretskii
  2015-05-23 17:27                                                               ` Andreas Schwab
@ 2015-05-23 19:01                                                               ` Paul Eggert
  2015-05-23 19:27                                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-23 19:01 UTC (permalink / raw)
  To: Eli Zaretskii, Andreas Schwab; +Cc: pot, emacs-devel

Eli Zaretskii wrote:
> etags is only required to produce approximately correct byte
> positions, and it does that well enough.

OK, but in that case the method I proposed in 
<http://lists.gnu.org/archive/html/emacs-devel/2015-05/msg00657.html> will do a 
better job (on all platforms) that what we have now, right?  So it'd be an 
improvement, even if it's not perfect.

Although Andreas's suggestion of switching to a byte column count would also be 
an improvement over what we have now, it'd require a change to the tags file 
format whereas the method I proposed would leave the file format unchanged.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 18:46                                                                   ` Andreas Schwab
@ 2015-05-23 19:04                                                                     ` Eli Zaretskii
  2015-05-25 12:33                                                                       ` Francesco Potortì
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 19:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: pot, eggert, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> Date: Sat, 23 May 2015 20:46:04 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Andreas Schwab <schwab@linux-m68k.org>
> >> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
> >> Date: Sat, 23 May 2015 19:27:43 +0200
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > So etags is only required to produce approximately correct byte
> >> > positions, and it does that well enough.
> >> 
> >> If you want approximate positions then a line-column pair would be
>                ^^^^^^^^^^^
> >> better.
> >
> > How do you define a column count?
> 
> It doesn't matter.

If you count bytes from the beginning of line, then yes, the accuracy
should be better that way.  However, I wonder whether the format of
TAGS, which assumes file offsets, is a de-facto standard by now.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 19:01                                                               ` Paul Eggert
@ 2015-05-23 19:27                                                                 ` Eli Zaretskii
  2015-05-25 16:44                                                                   ` Paul Eggert
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-23 19:27 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, schwab, emacs-devel

> Date: Sat, 23 May 2015 12:01:04 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: pot@gnu.org, emacs-devel@gnu.org
> 
> Eli Zaretskii wrote:
> > etags is only required to produce approximately correct byte
> > positions, and it does that well enough.
> 
> OK, but in that case the method I proposed in 
> <http://lists.gnu.org/archive/html/emacs-devel/2015-05/msg00657.html> will do a 
> better job (on all platforms) that what we have now, right?  So it'd be an 
> improvement, even if it's not perfect.

I already said I didn't mind, although I don't think it's an
improvement.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 19:04                                                                     ` Eli Zaretskii
@ 2015-05-25 12:33                                                                       ` Francesco Potortì
  0 siblings, 0 replies; 51+ messages in thread
From: Francesco Potortì @ 2015-05-25 12:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, Andreas Schwab, emacs-devel

Eli Zaretskii:
>> From: Andreas Schwab <schwab@linux-m68k.org>
>> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> Date: Sat, 23 May 2015 20:46:04 +0200
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> From: Andreas Schwab <schwab@linux-m68k.org>
>> >> Cc: pot@gnu.org,  eggert@cs.ucla.edu,  emacs-devel@gnu.org
>> >> Date: Sat, 23 May 2015 19:27:43 +0200
>> >> 
>> >> Eli Zaretskii <eliz@gnu.org> writes:
>> >> 
>> >> > So etags is only required to produce approximately correct byte
>> >> > positions, and it does that well enough.
>> >> 
>> >> If you want approximate positions then a line-column pair would be
>>                ^^^^^^^^^^^
>> >> better.
>> >
>> > How do you define a column count?
>> 
>> It doesn't matter.
>
>If you count bytes from the beginning of line, then yes, the accuracy
>should be better that way.  However, I wonder whether the format of
>TAGS, which assumes file offsets, is a de-facto standard by now.

In fact it is.  The format has not changed in at least 20 years, etags
has always been available for all platforms and it could be used outside
of Emacs.  For a format that old, changing it for no other reason than
elegance is probably not a good idea.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-23 19:27                                                                 ` Eli Zaretskii
@ 2015-05-25 16:44                                                                   ` Paul Eggert
  2015-05-25 19:33                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 51+ messages in thread
From: Paul Eggert @ 2015-05-25 16:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, schwab, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 363 bytes --]

Eli Zaretskii wrote:
> I already said I didn't mind, although I don't think it's an
> improvement.

Isn't it an improvement in the sense that it makes TAGS files portable between 
MS-Windows and other platforms?  That is, with the attached patch one can create 
a TAGS file on GNU/Linux and use it on MS-Windows and vice versa, and it works 
the same either way.

[-- Attachment #2: 0001-Make-TAGS-files-more-portable-to-MS-Windows.patch --]
[-- Type: text/x-patch, Size: 2603 bytes --]

From 247e3bf4aa06b5b2dab9f70556292458751f0445 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 25 May 2015 09:40:45 -0700
Subject: [PATCH] Make TAGS files more portable to MS-Windows

* etc/NEWS: Document this.
* lib-src/etags.c (readline_internal) [DOS_NT]:
Don't treat CRs differently from GNUish hosts.
* lisp/progmodes/etags.el (etags-goto-tag-location):
Adjust STARTPOS to account for the skipped CRs in dos-style files.
---
 etc/NEWS                | 3 +++
 lib-src/etags.c         | 9 ---------
 lisp/progmodes/etags.el | 8 ++++++--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index b922a27..9f861b2 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -992,6 +992,9 @@ of Windows starting with Windows 9X.
 +++
 ** Emacs running on MS-Windows now supports the daemon mode.
 
+** The byte counts in etags-generated TAGS files are now the same on
+MS-Windows as they are on other platforms.
+
 ** OS X 10.5 or older is no longer supported.
 
 ** OS X on PowerPC is no longer supported.
diff --git a/lib-src/etags.c b/lib-src/etags.c
index f124d29..8b7f53c 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -6075,16 +6075,7 @@ readline_internal (linebuffer *lbp, FILE *stream, char const *filename)
 	  if (p > buffer && p[-1] == '\r')
 	    {
 	      p -= 1;
-#ifdef DOS_NT
-	     /* Assume CRLF->LF translation will be performed by Emacs
-		when loading this file, so CRs won't appear in the buffer.
-		It would be cleaner to compensate within Emacs;
-		however, Emacs does not know how many CRs were deleted
-		before any given point in the file.  */
-	      chars_deleted = 1;
-#else
 	      chars_deleted = 2;
-#endif
 	    }
 	  else
 	    {
diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 60ea456..d99db8b 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1355,9 +1355,13 @@ hits the start of file."
 	    pat (concat (if (eq selective-display t)
 			    "\\(^\\|\^m\\)" "^")
 			(regexp-quote (car tag-info))))
-      ;; The character position in the tags table is 0-origin.
+      ;; The character position in the tags table is 0-origin and counts CRs.
       ;; Convert it to a 1-origin Emacs character position.
-      (if startpos (setq startpos (1+ startpos)))
+      (when startpos
+        (setq startpos (1+ startpos))
+        (when (and line
+                   (eq 1 (coding-system-eol-type buffer-file-coding-system)))
+          (setq startpos (- startpos (1- line)))))
       ;; If no char pos was given, try the given line number.
       (or startpos
 	  (if line
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-25 16:44                                                                   ` Paul Eggert
@ 2015-05-25 19:33                                                                     ` Eli Zaretskii
  2015-05-25 20:29                                                                       ` Paul Eggert
  0 siblings, 1 reply; 51+ messages in thread
From: Eli Zaretskii @ 2015-05-25 19:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: pot, schwab, emacs-devel

> Date: Mon, 25 May 2015 09:44:00 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: schwab@linux-m68k.org, pot@gnu.org, emacs-devel@gnu.org
> 
> Eli Zaretskii wrote:
> > I already said I didn't mind, although I don't think it's an
> > improvement.
> 
> Isn't it an improvement in the sense that it makes TAGS files portable between 
> MS-Windows and other platforms?

Yes, it is, which is why I said I didn't mind.

> That is, with the attached patch one can create a TAGS file on
> GNU/Linux and use it on MS-Windows and vice versa, and it works the
> same either way.

We'll get the same result -- platform-independence of TAGS files -- if
we use the Windows variant of the code, with the added advantage that
the Lisp part will not be needed, and reading the data from TAGS will
be slightly faster.

But again, I don't mind it either way.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: etags test is broken on MS-Windows
  2015-05-25 19:33                                                                     ` Eli Zaretskii
@ 2015-05-25 20:29                                                                       ` Paul Eggert
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Eggert @ 2015-05-25 20:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pot, schwab, emacs-devel

Eli Zaretskii wrote:
> I said I didn't mind.

OK, thanks, I installed the patch.



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-05-25 20:29 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <83y4kmdjmj.fsf@gnu.org>
     [not found] ` <555A8E62.7060700@cs.ucla.edu>
2015-05-19 15:27   ` etags test is broken on MS-Windows Eli Zaretskii
2015-05-19 17:57     ` Paul Eggert
2015-05-19 18:26       ` Eli Zaretskii
2015-05-20 15:38     ` Eli Zaretskii
2015-05-21  5:05       ` Paul Eggert
2015-05-21 13:24       ` Francesco Potortì
2015-05-21 16:49         ` Eli Zaretskii
2015-05-23  8:46           ` Eli Zaretskii
2015-05-21 13:16     ` Francesco Potortì
2015-05-21 16:31       ` Eli Zaretskii
2015-05-21 16:37         ` Paul Eggert
2015-05-21 16:55           ` Eli Zaretskii
2015-05-21 19:03             ` Paul Eggert
2015-05-21 19:54               ` Eli Zaretskii
2015-05-21 23:28                 ` Paul Eggert
2015-05-22  8:32                   ` Eli Zaretskii
2015-05-22 13:08                     ` Francesco Potortì
2015-05-22 13:19                       ` Eli Zaretskii
2015-05-22 18:23                         ` Paul Eggert
2015-05-22 19:08                           ` Eli Zaretskii
2015-05-22 19:25                             ` Andreas Schwab
2015-05-22 19:38                               ` Eli Zaretskii
2015-05-22 19:41                                 ` Andreas Schwab
2015-05-22 19:42                                 ` Eli Zaretskii
2015-05-22 19:50                                   ` Andreas Schwab
2015-05-22 20:05                                     ` Eli Zaretskii
2015-05-22 20:30                                       ` Andreas Schwab
2015-05-22 21:26                                         ` Paul Eggert
2015-05-23  6:40                                           ` Eli Zaretskii
2015-05-23  6:39                                         ` Eli Zaretskii
2015-05-23  8:02                                           ` Andreas Schwab
2015-05-23  8:27                                             ` Eli Zaretskii
2015-05-23  9:41                                               ` Andreas Schwab
2015-05-23  9:49                                                 ` Eli Zaretskii
2015-05-23  9:59                                                   ` Andreas Schwab
2015-05-23 10:20                                                     ` Eli Zaretskii
2015-05-23 10:54                                                       ` Andreas Schwab
2015-05-23 11:31                                                         ` Eli Zaretskii
2015-05-23 12:10                                                           ` Andreas Schwab
2015-05-23 13:46                                                             ` Eli Zaretskii
2015-05-23 17:27                                                               ` Andreas Schwab
2015-05-23 17:37                                                                 ` Eli Zaretskii
2015-05-23 18:46                                                                   ` Andreas Schwab
2015-05-23 19:04                                                                     ` Eli Zaretskii
2015-05-25 12:33                                                                       ` Francesco Potortì
2015-05-23 19:01                                                               ` Paul Eggert
2015-05-23 19:27                                                                 ` Eli Zaretskii
2015-05-25 16:44                                                                   ` Paul Eggert
2015-05-25 19:33                                                                     ` Eli Zaretskii
2015-05-25 20:29                                                                       ` Paul Eggert
2015-05-22 12:40               ` Francesco Potortì

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).