Helper for streaming MongoDB GridFS files in lift web applications

Lift is a web application framework written in scala and comes with native integration for mongodb. The module is called “lift-mongodb” and integrates mongodb as the persistence layer for its Record and Mapper framework.

GridFS is a specification for storing large files in MongoDB. Most drivers support it directly.

In this post, I‘m going to develop a helper that makes GridFS files accessible via HTTP. Furthermore, the helper should support HTTP caching so the files can be cached by the clients.

Let‘s get started.

Basic setup

I assume you have a plain lift project. I‘m using sbt for building the lift application. If you (for whatever reason) prefer maven you can certainly do so.

First of all, we need to tell lift that we want to use mongodb. Therefore we‘ll add the lift-mongodb module as a dependency. See mongodb to find out more.

val lift_mongo = "net.liftweb" % "lift-mongodb" % "2.2"

First shot

To start simple I wrote an object named GridFSHelper with a get function. The get function takes a file name as the only argument and returns a value of type Box[LiftResponse]. Like a real-world box, the lift box can be empty or full. And so Box has two subtypes called Empty and Full.

The behaviour of the get function is like this:
It uses the default mongo connection to query for a file with the given filename. If no file was found it returns Empty to signal that it has nothing to respond. This leads into a 404 (Not Found) HTTP message.

If the file was found it returns a “Full Box” containing a StreamingResponse . A StreamingResponse takes six arguments.
First of all the InputStream that should be sent to the client. The second argument is a function which is called when the stream was done or aborted. This is perfect to cleanup resources. The third argument is the length of the stream. The last three arguments are a map with HTTP header fields, a map with cookies and the HTTP status code.

object GridFSHelper {

  def get(filename: String): Box[LiftResponse] = {
    MongoDB.use(DefaultMongoIdentifier) ( db => {
    val fs = new GridFS(db)

    fs.findOne(filename) match {
       case file:GridFSDBFile => 
         val headers = ("Content-Type" ->  "application/octet</del>stream") :: Nil
         val stream = file.getInputStream

              () => stream.close,
              headers, Nil, 200))

       case _ => Empty

You can use the GridFSHelper by binding it to an uri.

Add the following code to the Boot.scala file.

LiftRules.dispatch.append {
  case req @ Req(List("files", filename), <em>, </em> => {
     () => GridFSHelper.get(req, filename <ins> "." </ins>     req.path.suffix)

However this implementation has some restrictions:

  • The content-type is not set properly.
  • It doesn‘t support HTTP caching (no 304 messages).

Know the content type

Currently we have the following line which sets a fixed content type.

val headers = ("Content-Type" <del>> "application/octet</del>stream") :: Nil

This means all responses have the same content type regardless if it‘s an image, an HTML or pdf …
This is far away from being perfect.

To determine the file‘s type we can look at the file extension. Fortunately, web containers do this already, so we don‘t have to implement it ourselves.

To get the content type evaluated just replace the previous line with the following:

def get(filename: String): Box[LiftResponse] = {
// some code ...
val headers = ("Content-Type" -> contentType(filename)) :: Nil
// more code ...

private def contentType(filename:String) =
LiftRules.context.mimeType(filename) openOr "application/octet-stream"

Now the HTTP response should come with the right content type. if not, the content type for the given file extension is not known by the web container. In that case you can add the content type to the web.xml :

For example:


Handle HTTP caching

In the default configuration Lift sets a bunch of HTTP header fields to tell the client that nothing should be cached. This rule applies to our GridFS response as well. To allow clients to cache our response we have to reset some HTTP header fields:

val headers =
("Content-type" -> contentType(filename) ::
("Pragma" -> "") ::
("Cache-Control" -> "") :: Nil

Namely, we have to reset the Pargma and the Cache-Control field.

Next, we have to set the Date , Last-Modified and the Expires headers. The header list will now look like this:

val headers =
("Content-Type" -> contentType(filename)) ::
("Pragma" -> "") ::
("Cache-Control" -> "") ::
("Last-Modified" -> toInternetDate(lastModified)) ::
("Expires" -> toInternetDate(millis + 10.days)) ::
("Date" -> nowAsInternetDate) :: Nil

Great, our HTTP header is set probably. Now we need to check the request to see if we can return a 304 (not modified) response. This tells the client that there is no need to download the whole file again. The client can use the cached file.

Fortunately there is a testFor304 function in the Req which we can use.

def get(req: Req, filename: String): Box[LiftResponse] = {
// some code ...
req.testFor304(lastModified, "Expires" -> toInternetDate(millis + 10.days)) openOr {
// create and return StreamingResponse
// more code ...

As you see I introduced a new parameter req to pass the current Request to the function.

All the magic is done by the testFor304 function. If its return value is Empty we have to build our response, otherwise, we can simply return the already prepared response.

This simple helper allows us to stream files from GridFS to the client. It sets the proper content type and supports HTTP caching.

The complete code can be found at github:

Comments and improvements welcome!

Did you like this post?

Leave a Reply

Your email address will not be published. Required fields are marked *