Thursday, 12 July 2007

4: Ruby regular expressions

In the wireless engineering applications area I work, we often have to deal with new data types and formats, and for years I relied on Perl for fast and convenient scripting to deal with data conversion and data validation. I never became a master at all that Perl offered, but was very proficient in a number of key areas applicable to these kinds of problems, notably text file manipulation, data management and regular expressions. This knowledge was valuable also when using regular expressions in Java, initially with the OROMatcher library and later with the standard regex library in java 1.4, but Java was never as convenient and concise as Perl, which was always a small irritation.

Needless to say I was thrilled to discover that Ruby regular expressions were the best of both worlds, with the conciseness of Perl and the Object-Orientation of Java. My first example here is a comparison of a Perl script with the Ruby equivalent:

This Perl script opens a text file and iterates over all lines, splitting each line on the ';' character and using a regular expression to extract two important bits of data out of one of the fields. The results look something like the following:

OK, so this data might not mean anything to you if you don't work in mobile network engineering, but the principle stands. The Perl script is short and simple, and the single line regular expression very useful. The Perl syntax for this was $fields[1] =~ /ID \"(\w+)\", cellRef (\d+)/, which basically compares the second field to the regular expression, and stores the two parenthesized substrings in the convenience variables $1 and $2. Concise, simple and powerful. The Ruby example is almost exactly the same:

Amazingly this is even shorter than Perl, since we merged the two loops into one, with Ruby's lovely convenience methods, and we have less punctuation, but otherwise the scripts look almost identical, and the regular expression line is nearly exactly the same, with only the first '$' removed. For Perl fans, this is wonderful. But what about the Object Orientation fans out there? Perls regular expressions are not apparently object orientated, and the use of global variables $1 and $2 in both the Perl and Ruby examples does not please everyone.

The good news is that this Ruby approach is actually a convenience wrapper on what is fundamentally a very object orientated regular expression library in Ruby. And of course you can write the code in a very OO way, remarkably similar to Java. Consider the following Java example:

As with the previous two scripts, the output is identical. However, the program code is much, much more verbose, to the extent that I had to shrink the font size to fit it on the page! Firstly we have the required Java class and static main method plumbing, and then the plumbing to open and read the file. The String.split method introduced in Java2 helps, and looks somewhat like the Perl and Ruby version, but the regular expression code is sure ugly - lots of steps: creating Pattern, Matcher objects and so many methods: compile(), matches(), and group(). Whew!. So is there anything good about this approach? Well, it is object orientated, and no global variables are used, and if you work in eclipse you can ctrl-space on all objects to figure out what to do next, aiding programming very much. It is relatively easy to code, but somewhat less convenient to read later.

So how does Ruby do this? Let's re-write the previous Ruby script using a purely object orientated syntax:

Wow! Amazing! The Ruby version of the program is almost exactly like the Java version, constructing a Regexp object, calling match, and then extracting the groups with the [] method. But it certainly looks a lot less verbose than the Java version, and is definitely easier to read. It is not as concise as the Perl-like Ruby script before, but it has achieved a clean object-orientated approach with no use of globals, while only being a little more verbose. Nice!

For a clear comparison of the two Ruby versions, I place them here side by side with the line including the regular expression highlighted. The differences are not that great, even though if you did the same comparison between Perl and java you might have trouble seeing the connections. Ruby has found a great way to bridge the gap:

Wednesday, 11 July 2007

3: Ruby Threads

In this short snippet I introduced our Java programmers to Ruby threads by showing them a simple example that used the Ruby classes Thread and Monitor which are very similar to their java equivalents. Consider the following example:

The last line represents the final result printed out: 14141. But if you take a look back at the code you can see that there are two threads created, t1 and t2, each of which is counting 10000 times and incrementing the same counter 'c'. You might easily expect this to result in the counter reaching 20000, but it did not. This is a simple example of a very well known problem with multiple threads accessing shared resources, and the solution is equally well known.

The key is the line '@count += 1' where the internal counter is incremented. The reason this fails to count all the way to 20000, the number of times the 'tick' method was called, is because this operation is not atomic. As indicated in the pseudo-code to the right of that line above, there are actually three separate steps to this increment operation, and it is very likely that one thread might be part way through this when the other enters. So thread t1 gets the current value of @count, but before it can store the incremented value, thread t2 gets the current value (now old value). t1 stores the incremented value, but then t2 also increments and stores the old value, so the final result is only one increment, not two. If this happened every time, the total would be only 10000, and if it never happened the total would be 20000, so we can see from the result 14141 I got above that it happened more than half the time. Just how many times this happens in reality depends on many factors, so the final total is quite unreliable.

As I said before, the solution to this is well known, but before we get to that let me comment quickly on the differences between this code and the Java equivalent. In java we would also create a Thread class and run it. But in order to pass the block of code in, we would have to instantiate a Runnable instance, with a run method, which is a lot more 'plumbing' than in Ruby. Another difference is that Ruby starts the thread on construction, while in Java you still need to call the start() method. I always thought the Java way made complete sense, but have yet to write a Ruby thread where I missed being able to call start later. In the end, a single line for constructing a thread, passing it code to run in a Ruby block and having it run directly is very concise and convenient, and certainly much easier to read.

Now for the solution to the problem. In Java we would use a concept called a 'monitor' which is built into every java Object, with the result that all objects have methods like wait() and notify(), and Java has the language keyword 'synchronized' which can be applied to methods and blocks of code. In some ways this could be seen as a case where Java has more convenience than Ruby, which does not have the Monitor built into every single object. But Ruby's overall elegance still keeps it beautifully concise. In Ruby you need to extend the class 'Monitor', or include the mixin 'MonitorMixin' if you have already extended some other class. See the example below where I've indicated with arrows the changes required to use the Monitor.

The blue arrows include Monitor support, and the red arrow is the key change to the code to make the '@count += 1' line atomic to multi-threaded access. Very easy indeed. In this case Java would have been simpler. We would not need any of the blue changes, and the red change could be done exactly as in ruby by synchronizing a block, or even simpler by synchronizing a method. But still the overall application is much shorter in Ruby, and the block synchronization is by itself simpler than java due to Ruby's cool code blocks, which make the Monitor#synchronize method look remarkably similar to the java 'synchronized' keyword. In fact this is a nice example of how Ruby has avoided the need for many keywords, and even leads to the subject of extending the language through libraries, which is beyond the topic of this snippet :-)