Helper for streaming MongoDB GridFS files in lift web applications

2010-10-28

Lift is a web application framework written in scala and comes with native integration for mongodb. The module is called “lift-mongodb” and integrates mongodb as the persistence layer for its Record and Mapper framework.

GridFS is a specification for storing large files in MongoDB. Most drivers support it directly.

In this post I‘m going to develop a helper that makes GridFS files accessible via http. Furthermore the helper should support http caching so the files can be cached by the clients.

Let‘s get started.

Basic setup

I assume you have a plain lift project. I‘m using sbt for building the lift application. If you (for whatever reason) prefer maven you can certainly do so.

First of all we need to tell lift that we want to use mongodb. Therefore we‘ll add the lift-mongodb module as a dependency. See http://www.assembla.com/wiki/show/liftweb/lift mongodb to find out more.

val lift_mongo = "net.liftweb" % "lift-mongodb" % "2.2"

First shot

To start simple I wrote an object named GridFSHelper with a get function. The get function takes a file name as the only argument and returns a value of type Box[LiftResponse]. Like a real-world box the lift box can be empty or full. And so Box has two subtypes called Empty and Full.

The behaviour of the get function is like this:
It uses the default mongo connection to query for a file with the given filename. If no file was found it returns Empty to signal that it has nothing to response. This leads into a 404 (Not Found) http message.

If the file was found it returns a “Full Box” containing a StreamingResponse . A StreamingResponse takes six arguments.
First of all the InputStream that should be send to the client. The second argument is a function which is called when the stream was done or aborted. This is perfect to cleanup resources. The third argument is the length of the stream. The last three arguments are a map with http header fields, a map with cookies and the http status code.

object GridFSHelper {

def get(filename: String): Box[LiftResponse] = {
MongoDB.use(DefaultMongoIdentifier) ( db => {
val fs = new GridFS(db)

fs.findOne(filename) match {
case file:GridFSDBFile =>
val headers = ("Content-Type" <del>> "application/octet</del>stream") :: Nil
val stream = file.getInputStream

Full(StreamingResponse(
stream,
() => stream.close,
file.getLength,
headers, Nil, 200))

case _ => Empty
}
})
}
}

You can use the GridFSHelper by binding it to an uri.

Add the following code to the Boot.scala file.

LiftRules.dispatch.append {
case req @ Req(List("files", filename), <em>, </em> => {
() => GridFSHelper.get(req, filename <ins> "." </ins> req.path.suffix)
}
}

However this implementation has some restrictions:

  • The content-type is not set propably.
  • It doesn‘t support http caching (no 304 messages).

Know the content type

Currently we have the following line which sets a fixed content type.

val headers = ("Content-Type" <del>> "application/octet</del>stream") :: Nil

This means all responses has the same content type regardless if it‘s an image, a html or pdf …
This is far away from being perfect.

To determine the file‘s type we can look at the file extention. Fortunately, web containers do this already, so we don‘t have to implement it ourselves.

To get the content type evaluated just replace the previous line with the following:

def get(filename: String): Box[LiftResponse] = {
// some code ...
val headers = ("Content-Type" -> contentType(filename)) :: Nil
// more code ...
}

private def contentType(filename:String) =
LiftRules.context.mimeType(filename) openOr "application/octet-stream"

Now the http response should come with the right content type. if not, the content type for the given file extention is not known by the web container. In that case you can add the content type to the web.xml :

For example:

<mime-mapping>
<extension>svg</extension>
<mime-type>image/svg+xml</mime-type>
</mime-mapping>

Handle HTTP caching

In the default configuration Lift sets a bunch of http header fields to tell the client that nothing should be cached. This rule applies to our GridFS response as well. To allow clients to cache our response we have to reset some http header fields:

val headers =
("Content-type" -> contentType(filename) ::
("Pragma" -> "") ::
("Cache-Control" -> "") :: Nil

Namely, we have to reset the Pargma and the Cache-Control field.

Next, we have to set the Date , Last-Modified and the Expires headers. The header list will now look like this:

val headers =
("Content-Type" -> contentType(filename)) ::
("Pragma" -> "") ::
("Cache-Control" -> "") ::
("Last-Modified" -> toInternetDate(lastModified)) ::
("Expires" -> toInternetDate(millis + 10.days)) ::
("Date" -> nowAsInternetDate) :: Nil

Great, our http header is set probably. Now we need to check the request to see if we can return a 304 (not modified) response. This tells the client that there is no need to download the hole file again. The client can use the chached file.

Fortunately there is a testFor304 function in the Req which we can use.

def get(req: Req, filename: String): Box[LiftResponse] = {
// some code ...
req.testFor304(lastModified, "Expires" -> toInternetDate(millis + 10.days)) openOr {
// create and return StreamingResponse
}
// more code ...

As you see I introduced a new parameter req to pass the current Request to the function.

All the magic is done by the testFor304 function. If its return value is Empty we have to build our response, otherwise we can simply return the already prepared response.

This simple helper allows us to stream files from GridFS to the client. It sets the proper content type and supports http caching.

The complete code can be found at github: http://gist.github.com/653101

Comments and improvments welcome!


me

Marco Rico Gomez is a passionate software developer located in Germany who likes to share his thoughts and experiences about software development and technologies with others.


blog comments powered by Disqus