Thursday, March 7, 2013

Adding Concurrency to Frege (part III)

In the second part of this series, we implemented MVars and noted some differences in the concurrency support of Haskell and Frege.

In this third and last part, we turn to another example taken from chapter 5 of Simon Marlows upcoming book on parallelism and concurrency in Haskell and conclude with a discussion of the lessons learned.

Overlapping IO without error handling

The following program loads two websites concurrently and prints their sizes:

main _ =  do
    m1 <- MVar.newEmpty
    m2 <- MVar.newEmpty
    
    forkIO do
        r <- getURL "http://www.wikipedia.org/wiki/Shovel"
        m1.put r
    
    forkIO do
        r <- getURL "http://www.wikipedia.org/wiki/Spade"
        m2.put r
    
    r1 <- m1.take
    r2 <- m2.take
    println (sum (map length r1), sum (map length r2))

The output will be:

(80278, 66924)

This is, however, sure to produce deadlock whenever, for some reason, one of the threads cannot complete to the point where it writes the result to its MVar - or, in other words, if the getURL action throws exceptions. We need not go into the details of the implementation of getURL, suffice it to say that it is a composition of several native methods that each can throw exceptions for a variety of reasons: bad URL syntax, IO error, network failure, and so on. For example, if we change the protocol in the second URL to htto (simulating a typo, where someone typed 'o' instead of 'p'), the output will look like:

Exception in thread "Thread-1" frege.runtime.WrappedCheckedException:
        at frege.runtime.WrappedCheckedException.wrapIfNeeded(WrappedCh
        ...
Caused by: java.net.MalformedURLException: unknown protocol: htto

In this case, the main thread will forever block in the attempt to get the result of the second thread.

Overlapping IO with error handling


To correct this issue, we can follow the approaches Simon Marlow explains in his book. Remember though, that exceptions in Frege are based on Java exceptions. Introducing new exceptions is not yet possible in Frege. Instead, one simply introduces appropriate Java classes that are subclasses of java.lang.Throwable. However, exceptions thrown from native methods are usually already implemented, and so is the "catch all" type Throwable. Hence all we need to make the examples work is the following:

type SomeException = Throwable

try :: IO a -> IO (SomeException|a)
try action = action >>= return . Right
        `catch` any
    where
        any :: SomeException -> IO (SomeException|a)
        any = return . Left 

To show this in action without using the more elaborate and better abstracted approaches of Simon Marlow, here is a non-deadlocking version of the program above:

main _ =  do
    m1 <- MVar.newEmpty
    m2 <- MVar.newEmpty
    
    forkIO do
        r <- (try . getURL) "http://www.wikipedia.org/wiki/Shovel"
        m1.put r
    
    forkIO do
        r <- (try . getURL) "htto://www.wikipedia.org/wiki/Spade"
        m2.put r
    
    r1 <- m1.take
    r2 <- m2.take
    println (result r1, result r2)
  where
    result :: (SomeException|[String]) -> String
    result (Left x)  = x.getClass.getName
    result (Right y) = (show . sum . map length)  y


When we run this, it produces the following output:

("80278", "java.net.MalformedURLException")


Not very nice, but it does now terminate despite of errors.

Lessons learned

It turns out that it was easy to add concurrency to Frege, thanks to the concurrency support that comes with Java and the JVM and the ability of Frege to make use of those feature in a seamless way.

It was even easy to do this in such a way that Haskell programmers would feel comfortable, by approximating the abstractions they are used to: forkIO, MVars, exceptions, etc. to such an extend that porting Haskell code via copy/paste was possible.

But we also observed important differences in the respective run time concurrency support:
  • Termination of the main thread does not terminate any other thread in the JVM.
  • MVars as implemented here are strict.
  • The JVM does not detect deadlocks.
But there is more, and it could not be easily ignored:
  • The forkIO we implemented in Frege is really what Haskell knows as forkOS, as genuine OS threads are forked, as opposed to Haskell's lightweight "green" threads. The Java concurrency API has nothing comparable to the latter, as far as I know. One could probably use thread pools, executor services and asynchronous tasks to achieve something like it, but there remains a basic difference: if an asynchronous task performs a blocking action like reading from a MVar, it will block the OS thread it is running on. Hence, in the Frege/Java/JVM concurrency model, a fixed number of OS threads cannot - on principle - support an arbitrary number of asynchronous tasks that may potentially block.
  • In the JVM, there is no way to throw an asynchronous exception in another thread. The only possibility is to interrupt another thread, which the other thread may or may not honor. But, if one interrupts a thread that is in a blocking operation, that operation will throw an InterruptedException.
  • In the JVM world, there is - to my knowledge - no support for software transactional memory (STM) out of the box, one would need to implement this manually and it would probably not be easy.
  • ...
The conclusion is, unfortunately, that one could not get very far in Frege if one only knows the Haskell concurrency model. To do serious concurrent work  in an JVM environment is different and it will thus require different knowledge. One way to obtain this knowledge would be to read books like "Java Concurrency in Practice".

4 comments:

  1. java Thread interrupt method
    one important thing to know about this method :
    This method throws interruption exception only for waiting thread .
    If thread is not in waiting status thread's interrupted status will be set to true but it won't stop it's execution there instead thread will continue running normally
    Effective java simplified
    Java concurrency interrupt method theory

    ReplyDelete
    Replies
    1. This is why I said the interrupted thread may or may not honor the attempt to interrupt it. It is just very different to the Haskell runtime environment, where you can throw asynchronous exceptions in another thread.

      Delete
  2. If you're willing to mix in Quasar, you could take advantage of fibers, which are essentially green threads and would work well for implementing forkIO. The Strand interface in general would allow you to implement forkIO and forkOS, but comes with the cost of needed a jvm agent.

    ReplyDelete
  3. Thank you for pointing to more advanced libraries. If I am not mistaken, this stuff is used a lot in the Clojure realm?

    Anyway, good to know that there are alternatives.

    ReplyDelete

Comments will be censored by me as I see fit, most likely if they contain insults or propaganda for ideologies I do not like. Comments that are on topic will not be censored. If I leave a comment uncensored this does not imply that I agree with the opinions expressed therein.