regex for empty string after a space?

Whatever is on your mind, whether Lisp related or not.
Post Reply
Matafou
Posts: 2
Joined: Thu Feb 02, 2012 3:16 am

regex for empty string after a space?

Post by Matafou » Thu Feb 02, 2012 3:24 am

Hi, a little question to emacs lisp regex experts. I am a user of the \b \B \< \> special regex that match empty regex at special position. But I am stuck on the following:

Is it possible to define a regex that matches an empty string after a space?

Or, as a workaround to this in my case, is it possible to define a regex that matches an empty string if at the beginning of anything but a space (word, puctuation etc)?

Best regards,

P.

nuntius
Posts: 538
Joined: Sat Aug 09, 2008 10:44 am
Location: Newton, MA

Re: regex for empty string after a space?

Post by nuntius » Fri Feb 03, 2012 8:26 pm

I don't understand the question.
Could you post some examples of expected matches?

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: regex for empty string after a space?

Post by ramarren » Sat Feb 04, 2012 3:18 am

As far as I can tell the engine Emacs uses for regular expressions doesn't include Perl-style look-around assertions which are necessary for this sort of thing.

Depending on exactly what you are doing it might be possible to implement equivalent functionality, or even use shell-command-on-region to call Perl.

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: regex for empty string after a space?

Post by edgar-rft » Sun Feb 05, 2012 6:20 am

Every regular expression matching a space also matches the empty string after the space, but there is no way to find out where the empty string comes from. From the view point of a regexp engine [no matter what programming language] there are empty strings before and after every character in the string, so if you have a string "ABC", then the regexp engine sees it as:

<start-of-string><empty-string>A<empty-string>B<empty-string>C<empty-string><end-of-string>

That's the reason why the ELisp manual writes:

\b - matches the empty string, but only at the beginning or end of a word.

\B - matches the empty string, but not at the beginning or end of a word.

The reason for this somewhat strange sounding definition is that from the view point of the regexp engine there are empty strings before and after every character in the string and the only way to find out if an empty string occurs at the beginning or end of a word is to look at the characters before and after the empty string between the characters.

The question is: which empty string do you want to match?

The particular problem in practice is that after concatenating two strings, the regexp engine has no chance to find out where the concatenation happened and if an empty string was concatenated or not. In such situations the only way is to use lists or vectors of strings instead of concatenation.

To match the empty string between a space and a non-space character or the empty string after a last space character in a string you could use:

Code: Select all

(string-match " \\([^ ]\\|\\'\\)" "ABC")   => NIL ; no space character
(string-match " \\([^ ]\\|\\'\\)" " ABC")  => 0
(string-match " \\([^ ]\\|\\'\\)" "A BC")  => 1
(string-match " \\([^ ]\\|\\'\\)" "AB C")  => 2
(string-match " \\([^ ]\\|\\'\\)" "ABC ")  => 3
(string-match " \\([^ ]\\|\\'\\)" "A B C") => 1
(string-match " \\([^ ]\\|\\'\\)" "A  B ") => 2
(string-match " \\([^ ]\\|\\'\\)" "A   B") => 3
This gives the position of the last space character before the empty string after the space.

Is that what you wanted?

- edgar

Matafou
Posts: 2
Joined: Thu Feb 02, 2012 3:16 am

Re: regex for empty string after a space?

Post by Matafou » Thu May 17, 2012 7:08 am

Thanks all for your answers. Iknow that what I ask is not possible with standard regular expressions. But \b and others are made to allow for more control on what is *matched" and go a bit beyond regex power.

When writing emacs modes it is particularly important to have the exact matched string. The point of this empty strings stuff in emacs regexp is to be able to accept a string depending of what is around it *without including what's around it in the matched text*. In particular it allows for good behavior with functions like re-search-forward (which stops at the right place then).

Shy regex (written (?:...)) are sometimes ok but not always.

For example in the following code kindlt proposed by edgar-rft:

> (string-match " \\([^ ]\\|\\'\\)" " ABC") => 0

the problem is that this does not match the empty string, in matches the character following the space (i.e. "A"). therefore if I do (re-search-forward thisregex) I end up with the point *after* the A, and not before.

My explanations are maybe unclear sorry. If you have ideas I am interested.

Thanks again for your time anyway.

Post Reply