sequencescape-api-in-scala

Created: 2012-03-22 19:51
Updated: 2013-10-05 19:11

README.markdown

Sequencescape API

This application is an attempt to build the Sequencescape API using the Play 2.0 and Scala in order to see just how difficult it is. The final goal is to support the Pulldown application, as this is a production ready application built on the existing API.

It is also my attempt to teach myself these two technologies, as well as a guide for how I went about it; so consider it a learning tool too. There are bound to be commits where I do something naive & stupid, only to refactor it in a later one as my understanding improves.

The sections of this document are described in terms of the git commits.

Starting out (initial commit)

First I've removed the application controller, its views, and all of the static assets, as they are not needed.

I've mapped the / route in conf/routes to the Root.index action and that will be returning the root JSON. This JSON links into all the other actions that can be performed by the client application which, at the moment, is an empty list.

Over time I'll be adding to the root JSON as I add functionality, and limiting what each client sees, based on the client and the user, in an appropriate manner.

Plate purposes (second commit)

So now I'm adding the ability to list the plate purposes. This involves adding the information about how to list them into the root JSON, which is relatively straight forward; the only thing we need to do is have a reverse route to the appropriate controller. And that means we have to have a controllers.PlatePurpose object.

The implementation of controllers.PlatePurpose is simple, returning JSON that represents an empty page. Nothing complicated for the moment, this is about using the routing, where the controllers.PlatePurpose.index method takes a page parameter, which is defaulted to 1, both in the code and the conf/routes file.

There is a small difference in the implementation here from what Sequencescape currently does: the URLs for the actions in the returned JSON is absolute but not fully qualified. It is missing the server information as I can't quite work out how to call play.api.mvc.Call.absoluteURL with the right parameters yet!

Listing plate purposes (third commit)

We're going to need a PlatePurpose model that can load the rows from the database, and that will have a custom JSON serialisation process. And that means that we need the database configured:

  • project/Build.scala contains the MySQL connector dependency
  • config/application.conf has the database connection details

The assumption is that this application sits on an already populate database, so I'm not going to look at the evolutions stuff in Play 2.0 at this time.

At this point a quick mention about Scala objects & classes:

  • class Foo is a class of which you create instances
  • object Foo is a singleton object called Foo

The object Foo is not an instance of class Foo. It's the companion object and is, for the Java programmers out there, effectively the static methods of class Foo.

You'll notice that I've nested the JSON imports for models.PlatePurpose.Json inside that object definition. Whilst some people think that all imports should be at the top of the source file, I believe that they should be close to the code that uses them.

Note that I've not implemented a proper paging system as I know there is only a page of results! I've also not included entries in the actions of the plate purpose JSON as that needs UUID remapping, which will come next.

Mapping UUIDs (fourth commit)

Firstly I've introduced a page of results in the models.Page[T] class, along with a JSON serializer in models.Page.Json[T]. These are generic, so you can have a page of any kind of thing which gets serialized appropriately; the only thing I've had to do is add a serializer() method that returns what should be an implicit.

It's important to note that with a ResultSet row parser you can parse multiple rows or single rows. The multiple row parsing is done by appending the *. In the models.PlatePurpose.page method you'll find both; one for parsing the plate purposes from the results, the other for determining the total number.

You'll also notice that I've built the full and basic JSON serialization for models.PlatePurpose using a trait. For the moment it's fine, but that'll probably be refactored later too.

The UUID mapping isn't quite as good as I like and I have a nasty feeling that's because of Scala. First how it works now:

In conf/routes there is a mapping from a /:uuid URL to the controllers.Uuid.read action, passing the UUID in as a string. That action finds the UUID using models.Uuid which can then delegate a read call to the appropriate controller (in the current case, just controllers.PlatePurpose).

Really what I wanted to try to do was to map the UUID to the right controller before Play got to routing the request, so transparently switching /uuid to /resource/id. But that looks like it involves interceptors, of which there is zero documentation as far as I can see, certainly for Scala.

The problem with the current approach is that Scala makes dynamic code really difficult, if not impossible. I'd like to be able to simply take "PlatePurpose" and get the controllers.PlatePurpose controller object, but that does not seem viable. Worse than that, dynamically mapping actions to methods is going to probably be impossible.

I've also noticed that there are errors being reported saying: java.lang.RuntimeException: SqlMappingError(No rows when expecting a single one). This appears to be a common error report for Play even though the result is perfectly usable and the application appears to be working fine!

Adding plates (fifth commit)

This has involved a bit of refactoring in the models.Page code so that the serializer can define the URLs and the root JSON element name. Nothing spectacular there.

The controllers have now been refactored to use a controllers.ResourceController trait that defines the two currenly supported methods: read and index. This means that the models.Uuid class can now delegate properly for both plates and plate purposes. Again, shouldn't really be a surprise.

And the implementation of models.Plate shouldn't be a shock either.

The only thing I'm disappointed in, although I now understand why, is that I haven't been able to make the controller code generic. Ideally I would like the read and index methods to be implemented commonly and have the individual controllers just say what type it is they handle. But that doesn't work because you can't use the parametric type in the actual code: it's the class, not the object.

The thing I'm angry about is that the models.Uuid has to have special cases in it for anything that is in an STI hierarchy, like plates. It's got a good example, though, of how to write straight queries if you're not interested in return models and stuff.

Anyway, hooked up plates to their plate purpose too, in a simple way that just works for the moment. There is no reverse link though because that's going to be more complicated (see commit 4 above) and needs to be thought about.

Next will be a biggie: exposing the contents of the wells, which covers aliquots, samples, tags and bait libraries.

Associations, like wells (sixth commit)

Because we know that in all current cases (ok, it's just plates at the moment!) there are associations to load we can do this as part of the loading mechanism. In our case this is a query that involves several joins because of the schema design that has arisen from ActiveRecord; it's not ideal but we're about replacing the existing Ruby implemented API, rather than fixing problems.

The important thing to note in the massive query we're doing for wells, is that wells can be empty, and that should not mean that they do not exist. In other words, there are a number of LEFT JOIN clauses in the query and some contents of aliquots (tag & bait library) are optional. However, the presence of a sample is always required on an aliquot. Hence you'll see that a models.Aliquot has Option[Tag].

You're also going to find that, because of this, the models.Well.rowParserForWells code is "more complicated": actually it simply uses pattern matching to decide what to do in the cases where the sample isn't available. But that then leads to the need to deal with None and Some(Aliquot) which is, amazingly, where Scala's flatMap comes in: think of it like compact.map{} in Ruby! Awesome.

And I'm not sure I entirely like the models.Well.rowParserForWells code that much. Sure, it is concise about what it is trying to achieve, but it's just going to get worse as fields are added. I'm assuming that I'll be able to combine these row parsers in some way, probably through ~, which would be useful. For the moment I'll live with the mess!

One problem I encountered, which is very disappointing, is that there is no way with the Java JDBC code to set an array of values for a query variable. When the query needs IN you have to generate the list as part of the query and cannot simply provide it to the anorm.SimpleSql.on() method. This is not Play's fault but the java.sql.PreparedStatement itself.

The interesting thing I've noted here is that query performance is considerably faster under Play and Scala, than in the Ruby code. Loading 100 plates, along with their 96 wells, is taking under 2 seconds; in the Ruby code this takes closer to a minute. I've even loaded 100 plates, their 96 wells, with each well having an aliquot with sample, tag & bait library present: 4 seconds with Play. That had to be disabled in our Ruby system because it couldn't do it quick enough! It is this performance difference that shows why Ruby, at least MRI, is not a good choice for API provision.

In the process of writing some JSON I realised that you don't have to write (a,b) but can write a -> b, which looks nicer in some circumstances. So I did a bit of JSON serialization tidy up.

I think the next commit will be refactoring to collapse a load of the queries and the row parsers as that's looking pretty ugly now.

Making things better (seventh commit)

This is where I try to improve the quality of the query generation and parsing.

First I discovered that you can specify the full table and column name, which means that you don't have to worry about name clashing in the parsing. For instance, a bait library might have a "name", as might the sample it is bound to. With this I was able to reuse row parsers from different classes and combine them, so now the models.Aliquot.rowParser uses the row parsers from Sample, Tag and Bait, and is, in turn, used by models.Plate.rowParserForWells. Very nice.

Then I found out you can make methods package protected, so these row parsers are now declared with protected[models] to allow some of them to be reused. Obviously I've remained tight with the ones that aren't, leaving them as private[this].

There's a compiler warning coming from Sample.find only because I've left out the None match. I've done this on purpose as it will never happen in the context of that method, so the code should fall in a big heap at that point. Feels better to have the warning than to raise an exception, which is actually wrong IMO.

I've refactored the individual models so that they can be reused, putting the SQL columns and JOINs in the models themselves. I've put in a models.Model trait that defines the expected interface. With that, and the addition of a models.Model.Factory[T] trait I have been able to refactor the controllers so that controllers.ResourceController can do much of their work.

I wanted to try to limit the type that the models.Model.Factory[T] trait can take, by saying models.Model.Factory[T <: models.Model]; my assumption being that this said "T must be a subclass of Model". But T, in our case, does not extend Model, it's companion object does; hence, I get a whole load of errors about type arguments not conforming to type parameter bounds.

I'm beginning to find the class and object pairing confusing, in some respects, and the inability to directly access a companion object really annoying.

Wherein I have a small companion object breakthrough! (8th commit)

I think I've got a handle on how to retrieve a companion object for a class: you need an implicit method that follows the format () => Companion[T], where Companion is a trait your companion object implements. This has allowed me to extend the companion objects for my models with models.Model.Factory[T] and use the bounds to say that T must be a Model (once I'd decided that the model is the class and the object is the factory).

And then I had another spark of understanding: implicit objects. Rather than using a method, and having the companion object for a model implement a load of interfaces, I split it down into other implicit objects. These each implement certain traits that are then implicitly require elsewhere. For example, the models.Model.Json and models.Model.Factory traits are both implicitly required by controllers.ResourceController.

That brought about some heavy refactoring in the JSON serialization, so that now a lot of the serialization behaviour can be automatically generated. The biggest gain has been that models.Page now serializes itself based on implicit parameters, which feels much much better.

I've also refactored the joining trait so that you can specify whether it is always present (required) or may not exist (optional). This simply enforces the correct type for the rowParser that is exposed.

It also feels appropriate to move the actions for a model into the model itself; then the JSON serializer simply needs to map these to JSON. This is all moving towards a fairly generic model serialization trait.

By the way, if you're looking for comments in the models then the best on is models.PlatePurpose. I'll maintain this one, and the variations in the others where they arise (like models.Plate), but I'm going to ditch commenting the rest where they conform.

Making SQL queries (9th commit)

It's bothered me for a while that the queries are string manipulations and, in some cases, very long. What I've wanted is something that can describe the basic information for a query, that can be joined together to get SQL JOINs, and can perform the physical SQL queries. I've started to do with with Model.SQL, Model.Query[T] and Model.Join[A,B] (not that I particular like the names!).

Model.SQL is the basic trait that assumes a single table, that is has a primary key, needs to get some fields, and needs to be JOINed to zero-or-more tables. Model.Query[T] is an extension of this for models: this it typed so that it can be used for determining how to join later.

Finally, Model.Join[A,B] is a join between two Model.SQL and relies on an implicit method to determine the JOIN statement. It's that implicit method that requires the Model.Query[T] so that you can define a method that joins, say, Plate and PlatePurpose (as Plate.Query does). The best example of join methods are in Aliquot, which needs several to build the final query from samples, baits, and tags.

I've also added the concept of an Exposed model, like plates & samples. It's nothing special other than it has a UUID and a number of actions that can be applied to it.

Starting the pulldown pipeline (10th commit)

Everything in this commit is about starting to get the pulldown application working. It needed several models adding, including Search, which has been very stubbed out at the moment.

There has been refactoring of the JSON and Query handling to make it nicer. Whilst I like the idea of the functional programming aspects of Scala, it's important to remember that it has OO in there too: hence, the JSON generation is now better suited to a method call on a class instance, and joining Query instances behaves similarly.

The behaviour of the main search, models.search.FindPulldownPlates, is such that it only finds the stock plates at the moment, and they are fixed to the pending state. To get any further with those I'm going to have to implement the request graph structure that flows between the wells.

But, in writing that I've also written models.search.FindAssetByBarcode which supports the ability to find a plate by its EAN13 barcode. That yielded the Barcode model, which lead me down a whole rabbit hole of horrible Ruby code, which could then be used by Plate and the new Tube model too.

So, this commit should allow for the inbox of the pulldown application to display, along with a single plate. It lacks the ability to find a user, and the plate JSON isn't complete enough for the pulldown application to actually do anything useful. But it's a start.

Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more