}In the above code, the first map operation splits each line read from the CSV file into separate fields split by the comma (, ). Val result = data.map(row => row.split(',')) Filter out the header from the dataset Extract the first row which is the header Var data = sc.textFile("/tmp/ml-latest-small/movies.csv") Val sc = new SparkContext("local", "ActionMovies") Logger.getLogger("org").setLevel(Level.ERROR) Let's remove the unnecessary movieId column from the record using the map operation. FLATMAP SCALA MOVIEPlease note that so far we haven't done anything new compared to the first program: Spark 01: Movie Rating Counter. In the following example, we are going to list all action movies on the console.Ĭreate a new Scala → sbt project in IntelliJ IDEA with a name ActionMovies and add the following dependencies to the build.sbt file. The following table shows a sample dataset with rows extracted from the movies.csv file.Īs you can see, the genres column contains all genres of a movie separated by a pipe ( | ) character. The MovieLens dataset has a movies.csv file which contains genres of each movie. FLATMAP SCALA DOWNLOADIf you don't have the dataset, please follow the first article and download the dataset. As we did in the previous articles, we are going to use the same MovieLens dataset we used in the first article: Spark 01: Movie Rating Counter. Now, it’s time to try to give to the yield body some dignity.With this knowledge, we can jump into the example which uses the flatMapValues transform operation to list action movies from the MovieLens dataset. Let’s add it to the Result type:ĭef foreach(f: A => Unit): Unit = f(result) So, the Scala compiler desugars the above construct to a call to the foreach method. The compiler warns us that we cannot use the variable res in the for-comprehension: Value foreach is not a member of .ForComprehension.Result. Let’s create a Result class that is a wrapper around an expression result: case class Result(result: A)įirst things first, we’ll try to print to standard output the value of a Result using a for-comprehension: val result: Result = Result(42) Let’s see an example.įirst of all, let’s define a class to work on. We can use for-comprehension syntax on every type that defines such methods. In Scala, the for-comprehension is nothing more than syntactic sugar to a sequence of calls to one or more of the methods: In the previous example, we saw how the semantics of a for -comprehension are equal to that of a sequence of operations on streams or sequences. FLATMAP SCALA GENERATORSo, we can mix a generator of type List with a generator of type List. In our previous example, both were instances of List. The values contained in the numberOfAssertsWithExecutionTime list are: ListĪll the generators inside a for-comprehension must share the same type they loop over. } yield ((id, result.totalAsserts, time)) Val numberOfAssertsWithExecutionTime: List = Then, we merge the two elements, listing the total number of asserts executed for each test result, along with the execution time: val executionTimeList = List(("test 1", 100), ("test 2", 230)) In the example, we loop over the list of results and the list of execution times. They loop independently from each other, producing all the possible combinations of their variables. We can have as many generators as we want. It introduces a new variable, result, that loops over each value of the variable results. The statement result <- results represent a generator.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |