Lecture 13 - Regular Expressions

String validation is an important problem in computer science. We are often interested in processing some kind of text and extracting information from it, or verifying that it matches some expected input schema. Regular expressions are an extremely well-established tool in the field of computer science, which aim to solve exactly that problem.

In this lecture, we explored the theory of regular expressions, and saw how they could be defined by a simple SML datatype. Regular expressions are defined to correspond to a particular set of strings, known as a language, according to a simple recursive mathematical description.

We then used this SML datatype of regexp to define a function that attempts to match a string to a regular expression, by recursive decomposition on the structure of the regular expression. We found that this admitted a reasonably terse and simple implementation, via a match function with a particular specification having to do with splitting the string into a prefix and suffix satisfying certain conditions.

While this description seems scary, it is easier to think of in terms of a picture, which gives credence to thinking about code behind a layer of intuitive abstraction, whenever possible. By reasoning via specification or picture, we can prove our implementation correct, and learn how to implement such a function well.