Text Matching with Ruby

Today I will show you two good ways of matching text.

  • Using regex placeholders
  • Using the pre_match & post_match methods

Let’s start with a blob of text. Lorem ipsum would do nicely.

text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.    \
Nunc vel tellus dictum, condimentum orci rhoncus, ultricies lorem.  \
Morbi non nisi pharetra, cursus nisl ut, efficitur metus. Phasellus \
facilisis pellentesque elit, at dapibus mauris pulvinar non. Aenean \
rutrum metus neque, eu rutrum erat ornare a. Maecenas mi ligula,    \
dignissim sit amet dolor in, vulputate fringilla tellus. Vivamus ac \
odio metus. Proin feugiat elit in risus bibendum, in tincidunt arcu \
semper. Nullam gravida nibh blandit, porta nisi sit amet, semper    \
enim. Vestibulum nec laoreet nisi, eu hendrerit sapien."

Now let’s say we want to match the text between:

Lorem ipsum

and

Nunc vel tellus

We’ll use the regex approach and do this:

text.match(/Lorem ipsum(.*)Nunc vel tellus/)

That’s easy enough. If we inspect this data we’ll see that it contains two matching sets. We can return those sets by addressing them in this way:

irb > text.match(/Lorem ipsum(.*)Nunc vel tellus/)[0]
 => "Lorem ipsum dolor sit amet, consectetur adipiscing elit.    Nunc vel tellus" 
irb > text.match(/Lorem ipsum(.*)Nunc vel tellus/)[1]
 => " dolor sit amet, consectetur adipiscing elit.    " 
irb > 

That’s nice. Now let’s review the data that pre_match and post_match would return.

irb > text.match(/Lorem ipsum(.*)Nunc vel tellus/).pre_match
 => "" 
irb > text.match(/Lorem ipsum(.*)Nunc vel tellus/).post_match
 => " dictum, condimentum orci rhoncus, ultricies lorem.  Morbi non nisi pharetra, cursus nisl ut, efficitur metus. Phasellus facilisis pellentesque elit, at dapibus mauris pulvinar non. Aenean rutrum metus neque, eu rutrum erat ornare a. Maecenas mi ligula,    dignissim sit amet dolor in, vulputate fringilla tellus. Vivamus ac odio metus. Proin feugiat elit in risus bibendum, in tincidunt arcu semper. Nullam gravida nibh blandit, porta nisi sit amet, semper    enim. Vestibulum nec laoreet nisi, eu hendrerit sapien." 
irb > 

The pre_match is empty as expected since we started matching the beginning of the string. The post_method on the other hand contains everything we haven’t match with our regex and is left in the string.

Categories:

Updated: