Thursday, May 11, 2017

Casbah - Scala toolkit for MongoDB

 Casbah is a Scala toolkit for MongoDB  and it  integrates a layer on top of the official mongo-java-driver for better integration with Scala.

The recommended way to get started is with a dependency management system. 

 libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"

Casbah is MongoDB project and will continue to improve the interaction of Scala + MongoDB.

Add import:
import com.mongodb.casbah.Imports._

 ---------------------------------------------

You could get the source from :
https://github.com/alvinj/ScalaCasbahConnections

Then you could modify your
-----------------
spb@spb-VirtualBox:~/
mongoConnector/ScalaCasbahConnections$ cat build.sbt
organization := "com.alvinalexander"

name := "ScalatraCasbahMongo"

version := "0.1.0-SNAPSHOT"

scalaVersion := "2.11.8"

libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"

libraryDependencies += "com.mongodb.casbah" % "casbah-gridfs_2.8.1" % "2.1.5-1"

libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.24"

resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/"
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$

spb@spb-VirtualBox:~/
mongoConnector/ScalaCasbahConnections$ sbt run
[info] Loading project definition from /home/spb/mongoConnector/ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ScalaCasbahConnections/)
[info] Compiling 1 Scala source to /home/spb/mongoConnector/ScalaCasbahConnections/target/scala-2.11/classes...
[warn] there was one deprecation warning; re-run with -deprecation for details
[warn] one warning found
[info] Running casbahtests.MainDriver
debug: a
log4j:WARN No appenders could be found for logger (com.mongodb.casbah.commons.conversions.scala.RegisterConversionHelpers$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
debug: b
debug: c
debug: d
debug: e
debug: f
debug: g
debug: h
debug: i
debug: j
debug: k
debug: l
debug: m
debug: n
debug: o
debug: p
debug: q
debug: r
debug: s
debug: t
debug: u
debug: v
debug: w
debug: x
debug: y
debug: z
sleeping at the end
  sleeping: 1
  sleeping: 2
  sleeping: 3
  sleeping: 4
  sleeping: 5
  sleeping: 6
  sleeping: 7
  sleeping: 8
  sleeping: 9
  sleeping: 10
  sleeping: 11
  sleeping: 12
  sleeping: 13
  sleeping: 14
  sleeping: 15
  sleeping: 16
  sleeping: 17
  sleeping: 18
  sleeping: 19
  sleeping: 20
  sleeping: 21
  sleeping: 22
  sleeping: 23
  sleeping: 24
  sleeping: 25
  sleeping: 26
  sleeping: 27
  sleeping: 28
  sleeping: 29
  sleeping: 30
game over
[success] Total time: 62 s, completed 13 Mar, 2017 5:37:31 PM
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$ sbt package
[info] Loading project definition from /home/spb/mongoConnector/ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ScalaCasbahConnections/)
[info] Packaging /home/spb/mongoConnector/ScalaCasbahConnections/target/scala-2.11/scalatracasbahmongo_2.11-0.1.0-SNAPSHOT.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed 13 Mar, 2017 5:54:42 PM
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$
------------------------------------------------------------


spb@spb-VirtualBox:~/Scala_project$ mongo
MongoDB shell version: 3.2.12
connecting to: test
Server has startup warnings:
> show dbs
local  0.000GB
mydb   0.000GB
> show dbs
finance  0.000GB
local    0.000GB
mydb     0.000GB
> show collections
> use finance
switched to db finance
> show collections
stocks
> db.stocks.find()
{ "_id" : ObjectId("58cd184edffa1f1829bfbc94"), "name" : "a", "symbol" : "a" }
{ "_id" : ObjectId("58cd184fdffa1f1829bfbc95"), "name" : "b", "symbol" : "b" }
{ "_id" : ObjectId("58cd1850dffa1f1829bfbc96"), "name" : "c", "symbol" : "c" }
{ "_id" : ObjectId("58cd1851dffa1f1829bfbc97"), "name" : "d", "symbol" : "d" }
{ "_id" : ObjectId("58cd1852dffa1f1829bfbc98"), "name" : "e", "symbol" : "e" }
{ "_id" : ObjectId("58cd1853dffa1f1829bfbc99"), "name" : "f", "symbol" : "f" }
{ "_id" : ObjectId("58cd1854dffa1f1829bfbc9a"), "name" : "g", "symbol" : "g" }
{ "_id" : ObjectId("58cd1855dffa1f1829bfbc9b"), "name" : "h", "symbol" : "h" }
{ "_id" : ObjectId("58cd1856dffa1f1829bfbc9c"), "name" : "i", "symbol" : "i" }
{ "_id" : ObjectId("58cd1857dffa1f1829bfbc9d"), "name" : "j", "symbol" : "j" }
{ "_id" : ObjectId("58cd1858dffa1f1829bfbc9e"), "name" : "k", "symbol" : "k" }
{ "_id" : ObjectId("58cd1859dffa1f1829bfbc9f"), "name" : "l", "symbol" : "l" }
{ "_id" : ObjectId("58cd185adffa1f1829bfbca0"), "name" : "m", "symbol" : "m" }
{ "_id" : ObjectId("58cd185bdffa1f1829bfbca1"), "name" : "n", "symbol" : "n" }
{ "_id" : ObjectId("58cd185cdffa1f1829bfbca2"), "name" : "o", "symbol" : "o" }
{ "_id" : ObjectId("58cd185ddffa1f1829bfbca3"), "name" : "p", "symbol" : "p" }
{ "_id" : ObjectId("58cd185edffa1f1829bfbca4"), "name" : "q", "symbol" : "q" }
{ "_id" : ObjectId("58cd185fdffa1f1829bfbca5"), "name" : "r", "symbol" : "r" }
{ "_id" : ObjectId("58cd1860dffa1f1829bfbca6"), "name" : "s", "symbol" : "s" }
{ "_id" : ObjectId("58cd1861dffa1f1829bfbca7"), "name" : "t", "symbol" : "t" }
Type "it" for more
>
-------------------------

----------------

There are two ways of getting the data from MongoDB to Apache Spark.
Method 1: Using Casbah (Layer on MongDB Java Driver)
val uriRemote = MongoClientURI("mongodb://RemoteURL:27017/")
val mongoClientRemote =  MongoClient(uriRemote)
val dbRemote = mongoClientRemote("dbName")
val collectionRemote = dbRemote("collectionName")
val ipMongo = collectionRemote.find
val ipRDD = sc.makeRDD(ipMongo.toList)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")

Method 2: Spark Worker at our use
Better version of code: Using Spark worker and multiple core to use to get the data in short time.

val config = new Configuration()
config.set("mongo.job.input.format","com.mongodb.hadoop.MongoInputFormat")
config.set("mongo.input.uri", "mongodb://RemoteURL:27017/dbName.collectionName")
val keyClassName = classOf[Object]
val valueClassName = classOf[BSONObject]
val inputFormatClassName = classOf[com.mongodb.hadoop.MongoInputFormat]
val ipRDD = sc.newAPIHadoopRDD(config,inputFormatClassName,keyClassName,valueClassName)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")

---------------------------------------------------------------
Reference:

https://web.archive.org/web/20120402085626/http://api.mongodb.org/scala/casbah/current/setting_up.html#setting-up-sbt