What are the key differences between Scala and Groovy?

They’re both object oriented languages for the JVM that have lambdas and closures and interoperate with Java. Other than that, they’re extremely different. Groovy is a “dynamic” language in not only the sense that it is dynamically typed but that it supports dynamic meta-programming. Scala is a “static” language in that it is statically typed … Read more

map vs mapValues in Spark

mapValues is only applicable for PairRDDs, meaning RDDs of the form RDD[(A, B)]. In that case, mapValues operates on the value only (the second part of the tuple), while map operates on the entire record (tuple of key and value). In other words, given f: B => C and rdd: RDD[(A, B)], these two are identical (almost – see comment at the bottom): The latter is simply shorter … Read more

Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

RDDs extend the Serialisable interface, so this is not what’s causing your task to fail. Now this doesn’t mean that you can serialise an RDD with Spark and avoid NotSerializableException Spark is a distributed computing engine and its main abstraction is a resilient distributed dataset (RDD), which can be viewed as a distributed collection. Basically, RDD’s elements are … Read more

Read entire file in Scala?

By the way, “scala.” isn’t really necessary, as it’s always in scope anyway, and you can, of course, import io’s contents, fully or partially, and avoid having to prepend “io.” too. The above leaves the file open, however. To avoid problems, you should close it like this: Another problem with the code above is that … Read more

Editor does not contain a main type

I have this problem a lot with Eclipse and Scala. It helps if you clean your workspace and rebuild your Project. Sometimes Eclipse doesn’t recognize correctly which files it has to recompile 🙁 Edit: The Code runs fine in Eclipse