In this third and last part, we turn to another example taken from chapter 5 of Simon Marlows upcoming book on parallelism and concurrency in Haskell and conclude with a discussion of the lessons learned.
Overlapping IO without error handling
The following program loads two websites concurrently and prints their sizes:
main _ = do
m1 <- MVar.newEmpty
m2 <- MVar.newEmpty
forkIO do
r <- getURL "http://www.wikipedia.org/wiki/Shovel"
m1.put r
forkIO do
r <- getURL "http://www.wikipedia.org/wiki/Spade"
m2.put r
r1 <- m1.take
r2 <- m2.take
println (sum (map length r1), sum (map length r2))
The output will be:
(80278, 66924)
This is, however, sure to produce deadlock whenever, for some reason, one of the threads cannot complete to the point where it writes the result to its MVar - or, in other words, if the getURL action throws exceptions. We need not go into the details of the implementation of getURL, suffice it to say that it is a composition of several native methods that each can throw exceptions for a variety of reasons: bad URL syntax, IO error, network failure, and so on. For example, if we change the protocol in the second URL to htto (simulating a typo, where someone typed 'o' instead of 'p'), the output will look like:
Exception in thread "Thread-1" frege.runtime.WrappedCheckedException:
at frege.runtime.WrappedCheckedException.wrapIfNeeded(WrappedCh
...
Caused by: java.net.MalformedURLException: unknown protocol: htto
In this case, the main thread will forever block in the attempt to get the result of the second thread.
Overlapping IO with error handling
To correct this issue, we can follow the approaches Simon Marlow explains in his book. Remember though, that exceptions in Frege are based on Java exceptions. Introducing new exceptions is not yet possible in Frege. Instead, one simply introduces appropriate Java classes that are subclasses of java.lang.Throwable. However, exceptions thrown from native methods are usually already implemented, and so is the "catch all" type Throwable. Hence all we need to make the examples work is the following:
type SomeException = Throwable
try :: IO a -> IO (SomeException|a)
try action = action >>= return . Right
`catch` any
where
any :: SomeException -> IO (SomeException|a)
any = return . Left
To show this in action without using the more elaborate and better abstracted approaches of Simon Marlow, here is a non-deadlocking version of the program above:
main _ = do
m1 <- MVar.newEmpty
m2 <- MVar.newEmpty
forkIO do
r <- (try . getURL) "http://www.wikipedia.org/wiki/Shovel"
m1.put r
forkIO do
r <- (try . getURL) "htto://www.wikipedia.org/wiki/Spade"
m2.put r
r1 <- m1.take
r2 <- m2.take
println (result r1, result r2)
where
result :: (SomeException|[String]) -> String
result (Left x) = x.getClass.getName
result (Right y) = (show . sum . map length) y
When we run this, it produces the following output:
("80278", "java.net.MalformedURLException")
Not very nice, but it does now terminate despite of errors.
Lessons learned
It turns out that it was easy to add concurrency to Frege, thanks to the concurrency support that comes with Java and the JVM and the ability of Frege to make use of those feature in a seamless way.
It was even easy to do this in such a way that Haskell programmers would feel comfortable, by approximating the abstractions they are used to: forkIO, MVars, exceptions, etc. to such an extend that porting Haskell code via copy/paste was possible.
But we also observed important differences in the respective run time concurrency support:
- Termination of the main thread does not terminate any other thread in the JVM.
- MVars as implemented here are strict.
- The JVM does not detect deadlocks.
- The forkIO we implemented in Frege is really what Haskell knows as forkOS, as genuine OS threads are forked, as opposed to Haskell's lightweight "green" threads. The Java concurrency API has nothing comparable to the latter, as far as I know. One could probably use thread pools, executor services and asynchronous tasks to achieve something like it, but there remains a basic difference: if an asynchronous task performs a blocking action like reading from a MVar, it will block the OS thread it is running on. Hence, in the Frege/Java/JVM concurrency model, a fixed number of OS threads cannot - on principle - support an arbitrary number of asynchronous tasks that may potentially block.
- In the JVM, there is no way to throw an asynchronous exception in another thread. The only possibility is to interrupt another thread, which the other thread may or may not honor. But, if one interrupts a thread that is in a blocking operation, that operation will throw an InterruptedException.
- In the JVM world, there is - to my knowledge - no support for software transactional memory (STM) out of the box, one would need to implement this manually and it would probably not be easy.
- ...