unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
@ 2017-11-17 20:11 Philipp Stephani
  2017-12-16 14:29 ` Philipp Stephani
  2018-03-17  0:37 ` bug#29343: 27.0.50; Match data doesn't contain elements for " Noam Postavsky
  0 siblings, 2 replies; 8+ messages in thread
From: Philipp Stephani @ 2017-11-17 20:11 UTC (permalink / raw)
  To: 29343


$ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
(0 1 nil nil 0 1)

Note that neither the `a` nor the `c` group matched, but there are
entries for `a` in `match-data`, but not for `c`.  This makes working
with the match data unnecessarily hard because its length depends on
whether certain optional groups have matched or not.  I haven't seen any
discussion about this behavior in either the manual or the docstring.  I
think the match data in this case should be (0 1 nil nil 0 1 nil nil).


In GNU Emacs 27.0.50 (build 12, x86_64-pc-linux-gnu, GTK+ Version 3.22.17)
 of 2017-11-16 built on localhost
Repository revision: bc462efec89c3317a6ee3ef9404356c1c7e52bda
Windowing system distributor 'The X.Org Foundation', version 11.0.11903000
System Description:	Debian GNU/Linux

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --enable-gcc-warnings=warn-only
 --enable-gtk-deprecation-warnings --without-pop --with-mailutils
 --enable-checking --enable-check-lisp-object-type --with-modules
 'CFLAGS=-O0 -ggdb3''

Configured features:
XPM JPEG TIFF GIF PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS FREETYPE XFT
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95129 7264)
 (symbols 48 20393 1)
 (miscs 40 41 120)
 (strings 32 28284 1631)
 (string-bytes 1 747257)
 (vectors 16 14056)
 (vector-slots 8 497402 8748)
 (floats 8 49 68)
 (intervals 56 224 0)
 (buffers 992 12))

-- 
Google Germany GmbH
Erika-Mann-Straße 33
80636 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

If you received this communication by mistake, please don’t forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person.  Thanks.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2017-11-17 20:11 bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups Philipp Stephani
@ 2017-12-16 14:29 ` Philipp Stephani
  2022-01-29 15:40   ` bug#29343: Match data doesn't contain elements for trailing " Lars Ingebrigtsen
  2018-03-17  0:37 ` bug#29343: 27.0.50; Match data doesn't contain elements for " Noam Postavsky
  1 sibling, 1 reply; 8+ messages in thread
From: Philipp Stephani @ 2017-12-16 14:29 UTC (permalink / raw)
  To: 29343

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

Philipp Stephani <p.stephani2@gmail.com> schrieb am Fr., 17. Nov. 2017 um
21:12 Uhr:

>
> $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$"
> "b") (print (match-data)))'
> (0 1 nil nil 0 1)
>
> Note that neither the `a` nor the `c` group matched, but there are
> entries for `a` in `match-data`, but not for `c`.  This makes working
> with the match data unnecessarily hard because its length depends on
> whether certain optional groups have matched or not.  I haven't seen any
> discussion about this behavior in either the manual or the docstring.  I
> think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>
>
It turns out that this is harder than I expected, because the information
about the number of groups in the pattern isn't stored anywhere, and
search_regs.num_regs may be different from the group count. If it turns out
too hard to fix, the behavior should at least be documented.

[-- Attachment #2: Type: text/html, Size: 1281 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2017-11-17 20:11 bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups Philipp Stephani
  2017-12-16 14:29 ` Philipp Stephani
@ 2018-03-17  0:37 ` Noam Postavsky
  2019-04-19 18:22   ` Philipp Stephani
  1 sibling, 1 reply; 8+ messages in thread
From: Noam Postavsky @ 2018-03-17  0:37 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 29343

Philipp Stephani <p.stephani2@gmail.com> writes:

> $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> (0 1 nil nil 0 1)
>
> Note that neither the `a` nor the `c` group matched, but there are
> entries for `a` in `match-data`, but not for `c`.  This makes working
> with the match data unnecessarily hard because its length depends on
> whether certain optional groups have matched or not.  I haven't seen any
> discussion about this behavior in either the manual or the docstring.  I
> think the match data in this case should be (0 1 nil nil 0 1 nil nil).

You can get that result by passing a list of the expected length as the
REUSE argument to match-data:

(progn
  (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b")
  (match-data t (make-list 8 nil)))
  ;=> (0 1 nil nil 0 1 nil nil)





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2018-03-17  0:37 ` bug#29343: 27.0.50; Match data doesn't contain elements for " Noam Postavsky
@ 2019-04-19 18:22   ` Philipp Stephani
  2019-04-19 18:29     ` Noam Postavsky
  0 siblings, 1 reply; 8+ messages in thread
From: Philipp Stephani @ 2019-04-19 18:22 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 29343

Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs@gmail.com>:
>
> Philipp Stephani <p.stephani2@gmail.com> writes:
>
> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> > (0 1 nil nil 0 1)
> >
> > Note that neither the `a` nor the `c` group matched, but there are
> > entries for `a` in `match-data`, but not for `c`.  This makes working
> > with the match data unnecessarily hard because its length depends on
> > whether certain optional groups have matched or not.  I haven't seen any
> > discussion about this behavior in either the manual or the docstring.  I
> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>
> You can get that result by passing a list of the expected length as the
> REUSE argument to match-data:

True, but that also requires knowing the expected length. In the most
general case this should work for unknown regular expressions.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2019-04-19 18:22   ` Philipp Stephani
@ 2019-04-19 18:29     ` Noam Postavsky
  2019-04-19 18:42       ` Philipp Stephani
  0 siblings, 1 reply; 8+ messages in thread
From: Noam Postavsky @ 2019-04-19 18:29 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 29343

Philipp Stephani <p.stephani2@gmail.com> writes:

> Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs@gmail.com>:
>>
>> Philipp Stephani <p.stephani2@gmail.com> writes:
>>
>> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
>> > (0 1 nil nil 0 1)
>> >
>> > Note that neither the `a` nor the `c` group matched, but there are
>> > entries for `a` in `match-data`, but not for `c`.  This makes working
>> > with the match data unnecessarily hard because its length depends on
>> > whether certain optional groups have matched or not.  I haven't seen any
>> > discussion about this behavior in either the manual or the docstring.  I
>> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>>
>> You can get that result by passing a list of the expected length as the
>> REUSE argument to match-data:
>
> True, but that also requires knowing the expected length. In the most
> general case this should work for unknown regular expressions.

I don't understand how the general case you describe could occur.  If
you don't know the expected length, that means you don't what groups are
in the regexp, so you can only rely on group 0 existing, i.e., you only
care about the first two elements in the match-data.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2019-04-19 18:29     ` Noam Postavsky
@ 2019-04-19 18:42       ` Philipp Stephani
  2019-04-19 18:54         ` Noam Postavsky
  0 siblings, 1 reply; 8+ messages in thread
From: Philipp Stephani @ 2019-04-19 18:42 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 29343

Am Fr., 19. Apr. 2019 um 20:29 Uhr schrieb Noam Postavsky <npostavs@gmail.com>:
>
> Philipp Stephani <p.stephani2@gmail.com> writes:
>
> > Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs@gmail.com>:
> >>
> >> Philipp Stephani <p.stephani2@gmail.com> writes:
> >>
> >> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> >> > (0 1 nil nil 0 1)
> >> >
> >> > Note that neither the `a` nor the `c` group matched, but there are
> >> > entries for `a` in `match-data`, but not for `c`.  This makes working
> >> > with the match data unnecessarily hard because its length depends on
> >> > whether certain optional groups have matched or not.  I haven't seen any
> >> > discussion about this behavior in either the manual or the docstring.  I
> >> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
> >>
> >> You can get that result by passing a list of the expected length as the
> >> REUSE argument to match-data:
> >
> > True, but that also requires knowing the expected length. In the most
> > general case this should work for unknown regular expressions.
>
> I don't understand how the general case you describe could occur.  If
> you don't know the expected length, that means you don't what groups are
> in the regexp, so you can only rely on group 0 existing, i.e., you only
> care about the first two elements in the match-data.
>

The context here is https://github.com/magnars/s.el/pull/117. Normally
you'd expect something like Python's Match.group
(https://docs.python.org/3/library/re.html#re.Match.group), i.e. a
group match per defined group, even if the group didn't match. That
Emacs doesn't behave this way is surprising and should at least be
documented.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups
  2019-04-19 18:42       ` Philipp Stephani
@ 2019-04-19 18:54         ` Noam Postavsky
  0 siblings, 0 replies; 8+ messages in thread
From: Noam Postavsky @ 2019-04-19 18:54 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 29343

Philipp Stephani <p.stephani2@gmail.com> writes:

>> >> You can get that result by passing a list of the expected length as the
>> >> REUSE argument to match-data:
>> >
>> > True, but that also requires knowing the expected length. In the most
>> > general case this should work for unknown regular expressions.

> The context here is https://github.com/magnars/s.el/pull/117.

Ah, I see, the problem is that s-match is trying to present a "nicer"
interface, so it doesn't have a REUSE argument.

> That Emacs doesn't behave this way is surprising and should at least
> be documented.

Yeah, no argument there.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#29343: Match data doesn't contain elements for trailing non-matched subgroups
  2017-12-16 14:29 ` Philipp Stephani
@ 2022-01-29 15:40   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2022-01-29 15:40 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: 29343

Philipp Stephani <p.stephani2@gmail.com> writes:

> It turns out that this is harder than I expected, because the
> information about the number of groups in the pattern isn't stored
> anywhere, and search_regs.num_regs may be different from the group
> count. If it turns out too hard to fix, the behavior should at least
> be documented.

I've now mentioned this in the doc string in Emacs 29.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-29 15:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-17 20:11 bug#29343: 27.0.50; Match data doesn't contain elements for non-matched subgroups Philipp Stephani
2017-12-16 14:29 ` Philipp Stephani
2022-01-29 15:40   ` bug#29343: Match data doesn't contain elements for trailing " Lars Ingebrigtsen
2018-03-17  0:37 ` bug#29343: 27.0.50; Match data doesn't contain elements for " Noam Postavsky
2019-04-19 18:22   ` Philipp Stephani
2019-04-19 18:29     ` Noam Postavsky
2019-04-19 18:42       ` Philipp Stephani
2019-04-19 18:54         ` Noam Postavsky

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).