Exploring Streams in Scala

2011-02-07

In this blog post I‘m going to explore Streams in Scala‘s Collection API. So first of all what is a Stream? A Stream is a lazy evaluated list. This means that elements in a Stream gets only evaluated when they are needed. Therefore Streams can be infinite while strict collections cannot. A Stream can also be seen as an immutable Iterator.

Creating a Stream

To create a Stream at least two values must be given:

  • an initial value
  • a function to compute the next value

For demonstration of an infinite Stream I'll choose a Stream of natural numbers (0, 1, 2, ...). The set of natural numbers starts with zero what is the initial value of our Stream. The successor of the initial value can easily computed by n+1. Then the successor of n+1 is n+1+1 and so forth. To build such a Stream we could write the following function:

def from(start: Int): Stream[Int] =
Stream.cons(start, from(start + 1))

The Stream is created by calling the cons function. It takes two parameters. The first parameter is the initial value. The second parameter takes a function which returns another Stream. The cons call returns a new Stream. The Stream consists of the initial value and the “rest“. The “rest” will only be evaluated when needed.
Ok, I lied a bit. cons feels like a function but in reality it is a object with an apply method. When we call Stream.cons(...) we call the apply method. Written out in full it would be Stream.cons.apply(...) but we don‘t really want to write it like that.

The same function can also be written in a shorter notion using the #:: operator:

def from(start: Int): Stream[Int] =
start #:: from(start + 1)

Such common builder functions like from, range, continually and so on for the creation of Streams can be found in the Stream object.

Now consider the following example:

val nn = from(0)
println(nn.take(10).mkString(","))

A variable nn is created and an infinite Stream of natural numbers is assigned. At this point the from
function is only called once. Recursive calls of from(start + 1) doesn't happen at this point. The next line takes the first 10 elements of the Stream and prints them to the console. Now guess how many times the from function will be called? Right 9 times, because the initial value (zero) has already been initialised.

0|1|2|3|4|5|6|7|8|9|...

(Just call nn.toString on the REPL to verify)

Filtering/ Mapping Streams

A Stream can be used like every other collection because it is of type LinearSeq. Operations like filter and map are lazy evalutated. They simply return another Stream and perform the operation when required. I think we can say that all operations that return another Stream can be considered as lazy operations (like filter, map). Of course, operations like foldLeft, foldRight, find and exists are not lazy.

For example to filter even values:

val even = nn filter (_ % 2 == 0)
println(even.take(10).mkString(","))

The 1st line calls filter on the Stream of natual numbers. Calling the filter method will return a new Stream. The new Stream uses the original Stream as its input and only returns elements for which the filter conditiion holds true. In short: filter is lazy evaluated. The second line prints the first 10 even elements to the console. This should be: 0,2,4,6,8,10,12,14,16,18

Now the first 20 elements of the natural number‘s Stream has been initialized:

0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|...

Streams can be used in pattern matching

A Stream can be used in pattern matching. To do so you can use the #:: extractor. The following example prints out the string "matched" on the console if the first two elements of our nn Stream is 0 and 1. And yes, it does :-)

nn match {
case 0 #:: 1 #:: _ => println("matched")
}

Conclusion

Stream is the immutable equivalent to Iterator. While an Iterator doesn't keep computated values Stream does. So as with any immutable data structure you don't have to care about state with Streams. This is useful when you pass a Stream around to other functions. If you do so with a Iterator you have to care about state what needs much more caution and can lead to logic errors. The immutability of Streams makes it also quite easy to compose them. Keep in mind that a Stream has a higher memory footprint than an Iterator because it keeps the computed elements.


me

Marco Rico Gomez is a passionate software developer located in Germany who likes to share his thoughts and experiences about software development and technologies with others.


blog comments powered by Disqus