Thursday, October 4, 2018

Cloudera to merge with Hortonworks, creating a $5.2 billion company

source
  
 Cloudera and Hortonworks, the two leading enterprise Hadoop providers, announce Merger to Create World’s Leading Next Generation Data Platform and Deliver Industry’s First Enterprise Data Cloud. According to the companies, the combined entity has a better chance to be a next-gen data platform across multiple clouds, on-premises and Edge computing. Hortonworks and Cloudera also have complementary approaches, customers and industries. They will combine in a merger of equals in a deal valued at $5.2 billion. Both companies were pioneers in Hadoop, an open-source platform that could analyze data in ways that scaled up easily—a necessity during a time when the availability of data was increasing exponentially each year.

source

First, remember the history of Apache Hadoop.

Google built an innovative scale-out platform for data storage and analysis in the late 1990s and early 2000s, and published research papers about their work. Doug Cutting and Mike Cafarella were working together on a personal project, a web crawler, and read the Google papers. The two of them started the Hadoop project to build an open-source implementation of Google’s system.

Yahoo quickly recognized the promise of the project. It staffed up a team to drive Hadoop forward, and hired Doug Cutting. That team delivered the first production cluster in 2006 and continued to improve it in the years that followed.

In 2008,
Rob Bearden co-founded Cloudera with folks from Google, Facebook, and Yahoo to deliver a big data platform built on Hadoop to the enterprise market. We believed then, and we still believe today, that the rest of the world would need to capture, store, manage and analyze data at massive scale. We were the first company to commercialize the software, building on the work done at Yahoo and other consumer internet companies.

Three years later, the core team of developers working inside Yahoo on Hadoop spun out to found Hortonworks. They, too, saw the enormous potential for data at scale in the enterprise. They had proven their ability to build and deliver the technology at Yahoo.


Hortonworks, which spun out of Yahoo, went public in 2014, and Cloudera, which is larger than Hortonworks in terms of market capitalization and revenue, went public in 2017. Intel was a major Cloudera investor. Amazon's market-leading cloud unit has a distribution of Hadoop software, and another competitor of the companies, MapR, is privately held. Cloudera, which was founded in 2008, raised over a billion dollars before going public, the vast majority coming in one major $740 million burst from Intel Capital in 2014. Hortonworks, founded three years later, raised $248 million.


Tom Reilly, the long-time CEO at Cloudera, certainly sees the two companies as complementary, offering customers something together that they couldn’t separately. “Our businesses are highly complementary and strategic. By bringing together Hortonworks’ investments in end-to-end data management with Cloudera’s investments in data warehousing and machine learning, we will deliver the industry’s first enterprise data cloud from the Edge to AI,” Reilly said in a statement. The companies commercialize the Hadoop open-source big data software, which companies can use to store, process and analyze lots of different types of data.


source
 Cloudera stock jumped as much as 25 percent  after it announced an all-stock merger of equals with competitor Hortonworks. Hortonworks stock was halted just prior to the announcement and jumped as much as 29 percent.. The combined equity value of the two companies is $5.2 billion  of their stocks . The deal is subject to U.S. antitrust clearance, and the companies expect it to close in the first quarter of 2019. Under the terms of the transaction agreement, Cloudera stockholders will own approximately 60% of the equity of the combined company and Hortonworks stockholders will own approximately 40% . Hortonworks shareholders will get 1.305 Cloudera shares for each share owned.
The two companies are committed to supporting existing offerings from the two companies for at least three years but will work on a "unity release" of software, drawing on technologies from both companies' portfolios, Reilly said. The unified company wants to honor Hortonworks' commitment around providing all of its software under open-source licenses, but over time there will also be a proprietary option that offer additional features, including in the cloud.

Tom Reilly, will serve as CEO of the combined company. Hortonworks' Chief Operating Officer, Scott Davidson, will serve as Chief Operating Officer; Hortonworks' Chief Product Officer, Arun C. Murthy, will serve as Chief Product Officer; and Cloudera's Chief Financial Officer, Jim Frankola, will serve as Chief Financial Officer, of the combined company.
Rob Bearden will join the board of directors. Current Cloudera board member, Marty Cole, will become Chairman of the board of directors. They plan to merge, creating a single company under the Cloudera banner that will focus on “edge to AI” opportunities.

References:
1) https://vision.cloudera.com/cloudera-hortonworks-from-the-edge-to-ai/
2)  https://www.zdnet.com/article/cloudera-hortonworks-merge-in-deal-valued-at-5-2-billion/

 

Sunday, September 30, 2018

Blockchain Technology

Blockchain is one of the biggest buzzwords in technology . Blockchain technologies are at an early stage of adoption, but they are proliferating rapidly.  Blockchain was invented by Satoshi Nakamoto in 2008 to serve as the public transaction ledger of the cryptocurrency bitcoin. And many alternative cryptocurrencies have their own block chain.


A blockchain is a growing list of records, called blocks, which are linked using cryptography.Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data (generally represented as a merkle tree root hash).


A blockchain is a decentralized, distributed and public digital ledger that is used to record transactions across many computers so that the record cannot be altered retroactively without the alteration of all subsequent blocks and the consensus of the network. This allows the participants to verify and audit transactions inexpensively. A blockchain database is managed autonomously using a peer-to-peer network and a distributed timestamping server. They are authenticated by mass collaboration powered by collective self-interests. Some basic Bitcoin terms to help you understand it all better.

  • Decentralized: It isn’t stored in a single location, but on millions of nodes simultaneously
  • Distributed: It’s shared and continually reconciled on the network
  • Blockchain — Bitcoin’s ledger. The Blockchain is a public record and can be stored by anyone. The Blockchain stores all transaction data.
  • Mining — the process by which Bitcoin transactions are validated using special processors. The people who do this are called miners.
  • Node — A server or storage device which stores the entire Blockchain and runs a Bitcoin client software that peruses all transaction data and the Blockchain to check if they conform to Bitcoin protocol.
  • Bitcoin wallet — A software application in which you can view your Bitcoin holdings, and send or receive Bitcoins.
  • Bitcoin wallet address — Your wallet address is equivalent to your bank account number. 
  • Bitcoins are stored against this address/ID. Each wallet address is associated with two unique keys, called public and private keys.
  • Public key — The public key is used to send Bitcoins to you and can be seen by anyone. The private key is your password and you need it to spend your Bitcoins.


 How does cryptocurrencies like bitcoin work?

The bitcoin blockchain is “decentralized,” meaning it is not controlled by one central authority. While traditional currencies are issued by central banks, bitcoin has no central authority. Instead, the bitcoin blockchain is maintained by a network of people known as miners.

These “miners,” sometimes called “nodes” on the network, are people running purpose-built computers that are actually competing to solve complex mathematical problems in order to make a transaction go through. This is know as Proof-of-Work.

For example, say lots of people are making bitcoin transactions. Each transaction originates from a wallet which has a “private key.” This is a digital signature and provides mathematical proof that the transaction has come from the owner of the wallet.

Now imagine lots of transactions are taking place across the world. These individual transactions are grouped together into a block, organized by strict cryptographic rules. The block is sent out to the bitcoin network, which are made up of people running high-powered computers. These computers compete to validate the transactions by trying to solve complex mathematical puzzles. The winner receives an award in bitcoin.This validated block is then added onto previous blocks creating a chain of blocks called a blockchain.

Proof-of-work is a process of producing data that’s hard to get,  but easy to verify. In the context of a blockchain, proof-of-work is about solving mathematical problems. If a problem is successfully solved, then a new block can be added to the blockchain. On average, performing proof-of-work calculations and adding a new block to the chain takes about 10 minutes.

What’s behind the proof-of-work process  or Solving the puzzle?. How do they find this number? By guessing at random. The hash function makes it impossible to predict what the output will be. So, miners guess the mystery number and apply the hash function to the combination of that guessed number and the data in the block. The resulting hash has to start with a pre-established number of zeroes. There's no way of knowing which number will work, because two consecutive integers will give wildly varying results. What's more, there may be several nonces that produce the desired result, or there may be none (in which case the miners keep trying, but with a different block configuration).

The first miner to get a resulting hash within the desired range announces its victory to the rest of the network. All the other miners immediately stop work on that block and start trying to figure out the mystery number for the next one. As a reward for its work, the victorious miner gets some new bitcoin.

source
 Technically, a blockchain is a chain of blocks ordered in a network of non-trusted peers. Each block references the previous one and contains data, its own hash, and the hash of the previous block.
source


source
There are different types of blockchains, which rely on different configurations as well as consensus mechanism depending on the type and size of a network. Bitcoin which is the popular blockchain is permission less, which means anyone, can participate and access the content in the chain.

Every time a person wants to initiate a transaction on a blockchain, a block is created detailing or the details of the transaction which must be broadcast’ to all nodes in the network. The block, in this case, comes with a timestamp that helps establish a sequence of events.

Once all the nodes agree, and the authenticity of the block is established, the new block is linked’ to the previous block which is also linked to the previous block, resulting in a chain of a sought, commonly referred to as blockchain.

The blockchain is normally replicated on an entire network where everyone in the network can see and access it. Cryptography is used to secure the chain which makes it impossible for a single person to manipulate its contents. For any change to take place, all the people which in this case are represented by nodes must agree to the proposed changes which are initiated in the next block without altering the previous block.

Once a piece of information is added on a block and then recorded on the blockchain ledger, nobody can change or remove it. The tamper-proof aspect is what is fuelling suggestion as to why Blockchain CIOs should take a keen interest in the technology at a time when data security and preservation is of utmost importance.                             

 The transactions may be payments on a loan. It may also be a disbursement of funds from a loan or line of credit to a borrower to a third party such as the car dealer who originated the loan. These transactions are assembled into “blocks.” The blocks are loaded into the public record and become the “chain” of events to arrive at a balance. Each block is encrypted and contains control amounts to ensure that blocks cannot be altered.


source





How the blockchain is tamperproof:

  1. One of the advantages of blockchain is that it can’t be tampered with. Each block that is added onto the chain carries a hard, cryptographic reference to the previous block.
  2. That reference is part of the mathematical problem that needs to be solved in order to bring the following block into the network and the chain. Part of solving the puzzle involves working out random number called the “nonce.” The nonce, combined with the other data such as the transaction size, creates a digital fingerprint called a hash. This is encrypted, thus making it secure.
  3. Each hash is unique and must meet certain cryptographic conditions. Once this happens a block is completed and added to the chain. In order to tamper with this, each earlier block, of which there are over half a million, would require the cryptographic puzzles to be re-mined, which is impossible.


Cryptocurrencies also present a problem for governments to control their economies. Fiscal and monetary policies become harder to enforce since the new currencies are outside the traditional government institutions. The electronic movement of funds in a blockchain environment can bypass government institutions and their controls.

To meet modern business demands, IBM has joined with other companies to collaboratively develop an open source, production-ready, business blockchain framework, called Hyperledger Fabric™, one of the Hyperledger® projects hosted by The Linux Foundation®. Hyperledger Fabric supports distributed ledger solutions on permissioned networks for a wide range of industries. Its modular architecture maximizes the confidentiality, resilience, and flexibility of blockchain solutions.
source

The IBM Blockchain Platform is a blockchain software-as-a-service offering on the IBM Cloud. It's the only fully integrated, enterprise-ready blockchain platform designed to simplify the development, governance, and operation of a decentralized, multi-institution business network. The IBM Blockchain Platform accelerates collaboration in this decentralized world by leveraging open source technology from the Hyperledger Fabric framework and Hyperledger Composer tooling.

IBM Launches Food Trust Blockchain For Commercial Use: IBM's blockchain-based food traceability platform is now live for global use by retailers, wholesalers and suppliers across the food ecosystem. In September, retail giant Walmart announced that it would begin requiring its suppliers to implement the system to track bags of spinach and heads of lettuce. Other participants include multinational companies Nestle, Kroger, Tyson Foods, Kroger and Unilever.

IBM Code pattern provides an exemplar solution of IoT Asset Tracking via a Blockchain. Develop an IoT asset tracking app using Blockchain. Use an IoT asset tracking device to improve a supply chain by using Blockchain, IoT devices, and Node-RED. This pattern addresses the very real problem of the safe delivery of perishable goods (food, medicine, livestock, etc.) that are sensitive to environmental conditions during shipment. Every shipment of perishable goods has thresholds (refrigeration requirements, avoidance of shocks or vibration, etc. ) to protect the goods from contamination or damage. If the shipment exceeds these thresholds, the goods are damaged and might become a health hazard. By recording the details (Where, What, and When) of a shipment that experienced extreme conditions (thresholds specified in the smart contract) developers can verify that the goods were delivered successfully (or not). Then, payment is predicated on successful delivery. Tracking the conditions of the shipment across multiple participants using a blockchain provides verification and trust in these processes.



Ultimately, common blockchain standards and protocols may play a key role in enabling interoperability among international payment systems. Meanwhile, new technology usually evolves faster than consensus standards development, so these efforts may have little impact on near-term blockchain deployments. Disruptive technologies such as Blockchain and the Internet of Things, will have a profound impact in the way we live and work


 References :

1) https://rubygarage.org/blog/how-blockchain-works 
2) https://en.wikipedia.org/wiki/Blockchain
3) https://www.americanexpress.com/us/content/foreign-exchange/articles/international-payments-hyperledger-interledger/
4) https://www.coindesk.com/information/how-bitcoin-mining-works/
5) https://www.cnbc.com/2018/06/18/blockchain-what-is-it-and-how-does-it-work.html
6) https://www.forbes.com/sites/astanley/2018/10/08/ready-to-rumble-ibm-launches-food-trust-blockchain-for-commercial-use/#3a1ca5c77439
7) https://www.youtube.com/watch?v=93E_GzvpMA0 
8) https://developer.ibm.com/patterns/develop-an-iot-asset-tracking-app-using-blockchain/?social_post=1780070590&fst=Learn&_lrsc=45077f56-e223-4172-9317-6cec9e301566
9) https://github.com/IBM/IoT-AssetTracking-Perishable-Network-Blockchain
10) https://www.fool.com/investing/2018/10/10/ibms-food-safety-blockchain-picks-up-steam.aspx 
11) https://101blockchains.com/blockchain-cio-executives-guide/ 

Thursday, November 2, 2017

Spectrum LSF multi-cluster Models and Configurations


IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments. Featuring intelligent, policy-driven scheduling and easy to use interfaces for job and workflow management, it helps organizations to improve competitiveness by accelerating research and design while controlling costs through superior resource utilization.

Without a scheduler, an HPC Cluster would just be a bunch of servers with different jobs interfering with each other. When you have a large clusters and multiple users, each user doesn’t know which compute nodes and CPU cores to use, nor how much resources are available on each node. To solve this, cluster batch control systems are used to manage jobs on the system using HPC Schedulers. They are essential for sequentially queuing jobs, assigning priorities, distributing, parallelizing, suspending, killing or otherwise controlling jobs cluster-wide. Spectrum LSF is a  powerful workload management platform, job scheduler, for distributed high performance computing.

Computational multi-clusters are an important emerging class of supercomputing architectures. As multi-cluster systems become more prevalent, techniques for efficiently exploiting these resources become increasingly significant. A critical aspect of exploiting these resources is the challenge of scheduling. In order to maximize job throughput, multi-cluster schedulers must simultaneously leverage the collective computational resources of each of its participating clusters. By doing so, jobs that would otherwise wait for nodes to become available on a single cluster can potentially run earlier by aggregating disjoint resources throughout the multi-cluster. This procedure can result in dramatic reductions in queue waiting times.

Organizations  might have multiple LSF clusters manged by different business units. In this scenario it is good to share the resources across the cluster to reap the benefits of global load sharing.
  • Ease of administration 
  • Different geographic locations 
  • Scalability
There are two Spectrum LSF  Multi-cluster Models :

Job forwarding Model:

In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues.
With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.

Resource leasing model

In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must “export” resources to the consumer, and the consumer cluster must configure a queue to use those resources.
In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.

Selection of Model: 
Consider your own goals and priorities when choosing the best resource-sharing model for your site.

  • The job forwarding model can make resources available to jobs from multiple clusters, this flexibility allows maximum throughput when each cluster’s resource usage fluctuates. The resource leasing model can allow one cluster exclusive control of a dedicated resource, this can be more efficient when there is a steady amount of work.
  • The lease model is the most transparent to users and supports the same scheduling features as a single cluster.
  • The job forwarding model has a single point of administration, while the lease model shares administration between provider and consumer clusters.


[sachin@host1 ~]$ lsid
IBM Spectrum LSF Standard 10.1.0.3
My cluster name is cluster1_p8
My master name is host1
[sachin@host1 ~]$

lsclusters : displays configuration information about LSF clusters


bhosts : Displays hosts and their static and dynamic resources in cluster



-----------------------------------------------------------

Configuration Files:

/nfs_shared_dir/LSF_HOME/conf/lsf.shared

Begin Cluster
ClusterName  Servers
cluster1_p8     (host1)
cluster2_p9     (host6)
cluster3_x86    (host11)
End Cluster
------------------------------------------------------------
/nfs_shared_dir/LSF_HOME/conf/lsbatch/ppc_cluster1/configdir/lsb.resources

Begin HostExport
PER_HOST     =    host1      # export host list
SLOTS        = 20                   # for each host, export 1 job slots
DISTRIBUTION = ([ cluster2_p9 , 1] [cluster3_x86, 1]) # share distribution for remo
MEM          = 100                 # export 100M mem of each host [optional parameter]
SWP          = 100                 # export 100M swp of each host [optional parameter]
End HostExport
In this example, resources are leased to 2 clusters in an even 1:1 ratio. Each cluster gets 1/2 of the resources.
------------------------------------------------------
/nfs_shared_dir/LSF_HOME/conf/lsbatch/CI_cluster_ppc/configdir/lsb.queues

Begin Queue
QUEUE_NAME     = send_queue
SNDJOBS_TO     = receive_queue@cluster3_x86
HOSTS          = none
PRIORITY       = 30
NICE           = 20
End Queue

Begin Queue
QUEUE_NAME = leaseq
PRIORITY  = 20
HOSTS = all allremote
End Queue

Begin Queue
QUEUE_NAME   = cluster1_p8
PRIORITY     = 30
INTERACTIVE  = NO
HOSTS        = host1 host2 host3 host4 host5        # hosts on which jobs in this queue can run
DESCRIPTION  = For submission of jobs to P9 machines
End Queue

Begin Queue
QUEUE_NAME   = cluster2_p9
PRIORITY     = 30
INTERACTIVE  = NO
HOSTS        = host6 host7 host8 host9 host10       # hosts on which jobs in this queue can run
DESCRIPTION  = For submission of jobs to P9 machines
End Queue

Begin Queue
QUEUE_NAME   = cluster3_x86
PRIORITY     = 30
INTERACTIVE  = NO
HOSTS        = host11 host12 host13 host14 host15       # hosts on which jobs in this queue can run
DESCRIPTION  = For submission of jobs to P8 machines
End Queue

-------------------------------------------------------------------------------
In case of job forwarding model you need to have following configuration on Remote cluster
/nfs_shared_dir/LSF_HOME/conf/lsbatch/CI_cluster_ppc/configdir/lsb.queues

Begin Queue
QUEUE_NAME      = receive_queue
RCVJOBS_FROM    = send_queue@cluster1_p8
HOSTS           =   host11 host12 host13 host14 host15
PRIORITY        = 55
NICE            = 10
DESCRIPTION     = Multicluster Queue
End Queue
-------------------------------------------------------------------------------------------------------

Check  Job Forwarding Information  and Resource Lease Information by issuing bclusters command :

Submit  LSF job  - forwarding mechanism

Submit  LSF job  - Resource Leasing mechanism


In this article I wanted to illustrate how someone could get started creating their own LSF multi-cluster setup to run their application that needs more computational resource.

Reference:

  1. http://www.slac.stanford.edu/comp/unix/package/lsf/LSF8.1_doc/8.0/multicluster/index.htm?multicluster_benefits_mc_lsf.html~main
  2. http://www-01.ibm.com/support/docview.wss?uid=isg3T1016097
  3. https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_kc_mc.html
  4. https://tin6150.github.io/psg/3rdParty/lsf4_userGuide/13-multicluster.html

Sunday, July 30, 2017

Getting Started with MongoDB

The NoSQL database movement came about to address the shortcomings of relational databases and the demands of modern software development.  new data is unstructured and semi-structured, so developers also need a database that is capable of efficiently storing it. Unfortunately, the rigidly defined, schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data, and is a poor fit for unstructured and semi-structured data. NoSQL provides a data model that maps better to these needs.

MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling across a configurable set of systems that function as storage nodes.
 
  • database holds a set of collections
  • collection holds a set of documents
  • document is a set of fields
  • field is a key-value pair
  • key is a name (string)
  • value is a - basic type like string, integer, float, timestamp, binary, etc.,
  • a document, or an array of value
MongoDB Architecture
 MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs. MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON. These documents can be simple documents as above and can also be complex documents such as below:

{
    id: x,
    name: y,
    other: z,
    multipleArray: [
        {lab1: "A",  lab2: "B", lab3:"C"},
        {lab1: "AB", lab2: "BB", lab3:"CB"},
        {lab1: "AC", lab2: "BC", lab3:"CC"}
    ]
}

Document Database

A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.

The advantages of using documents are:
  • Documents (i.e. objects) correspond to native data types in many programming languages.
  • Embedded documents and arrays reduce need for expensive joins.
  • Dynamic schema supports fluent polymorphism.
Most user-accessible data structures in MongoDB are documents, including:
-> All database records.
-> Query selectors, which define what records to select for read, update, and delete operations.
-> Update definitions, which define what fields to modify during an update.
-> Index specifications, which define what fields to index.
-> Data output by MongoDB for reporting and configuration, such as the output of the server-status and the replica set configuration document.

Joins and Other Aggregation Enhancements in MongoDB 3.2 on-wards

How to create database and collections with basic examples to query  ?

spb@spb-VirtualBox:~$ mongo
MongoDB shell version: 3.2.12
connecting to: test
Server has startup warnings:
> show dbs
finance  0.000GB
local    0.000GB
mydb     0.000GB
MongoDB didn’t provide any command to create “database“. Actually, you don’t need to create it manually, because, MangoDB will create it on the fly, during the first time you save the value into the defined collection (or table in SQL), and database.

> use hospital
switched to db hospital
> db.patient.save({name:"John",age:"29",gender:"M",disease:"fever",city:"chennai"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Ramesh",age:"55",gender:"M",disease:"blood pressure",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Harish",age:"35",gender:"M",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Namitha",age:"25",gender:"F",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Asha",age:"15",gender:"F",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Ravi",age:"23",gender:"M",disease:"diabetic",city:"chennai"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Lokesh",age:"37",gender:"M",disease:"fever",city:"mumbai"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Sangeetha",age:"37",gender:"F",disease:"fever",city:"mumbai"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Apoorva",age:"27",gender:"F",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Jijo",age:"30",gender:"M",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Mallik",age:"38",gender:"M",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Parashuram",age:"32",gender:"M",disease:"fever",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })
> db.patient.save({name:"Rakesh",age:"35",gender:"M",disease:"cold",city:"bengaluru"})
WriteResult({ "nInserted" : 1 })


> db.patient.find()
{ "_id" : ObjectId("597d83f6a9d2632baed3c076"), "name" : "John", "age" : "29", "gender" : "M", "disease" : "fever", "city" : "chennai" }
{ "_id" : ObjectId("597d8457a9d2632baed3c077"), "name" : "Ramesh", "age" : "55", "gender" : "M", "disease" : "blood pressure", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8488a9d2632baed3c078"), "name" : "Harish", "age" : "35", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84baa9d2632baed3c079"), "name" : "Namitha", "age" : "25", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84efa9d2632baed3c07a"), "name" : "Asha", "age" : "15", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d851aa9d2632baed3c07b"), "name" : "Ravi", "age" : "23", "gender" : "M", "disease" : "diabetic", "city" : "chennai" }
{ "_id" : ObjectId("597d8544a9d2632baed3c07c"), "name" : "Lokesh", "age" : "37", "gender" : "M", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d855ca9d2632baed3c07d"), "name" : "Sangeetha", "age" : "37", "gender" : "F", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d8571a9d2632baed3c07e"), "name" : "Apoorva", "age" : "27", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d858ba9d2632baed3c07f"), "name" : "Jijo", "age" : "30", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d859da9d2632baed3c080"), "name" : "Mallik", "age" : "38", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85afa9d2632baed3c081"), "name" : "Parashuram", "age" : "32", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85c7a9d2632baed3c082"), "name" : "Rakesh", "age" : "35", "gender" : "M", "disease" : "cold", "city" : "bengaluru" }
----------------------------
 > show dbs
finance   0.000GB
hospital  0.000GB
local     0.000GB
mydb      0.000GB

----------------------------------------------------------------------
To query the document on the basis of some condition, you can use following operations.

1) query to get records  where  desease=fever
 > db.patient.find({"disease":"fever"})
{ "_id" : ObjectId("597d83f6a9d2632baed3c076"), "name" : "John", "age" : "29", "gender" : "M", "disease" : "fever", "city" : "chennai" }
{ "_id" : ObjectId("597d8488a9d2632baed3c078"), "name" : "Harish", "age" : "35", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84baa9d2632baed3c079"), "name" : "Namitha", "age" : "25", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84efa9d2632baed3c07a"), "name" : "Asha", "age" : "15", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8544a9d2632baed3c07c"), "name" : "Lokesh", "age" : "37", "gender" : "M", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d855ca9d2632baed3c07d"), "name" : "Sangeetha", "age" : "37", "gender" : "F", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d8571a9d2632baed3c07e"), "name" : "Apoorva", "age" : "27", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d858ba9d2632baed3c07f"), "name" : "Jijo", "age" : "30", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d859da9d2632baed3c080"), "name" : "Mallik", "age" : "38", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85afa9d2632baed3c081"), "name" : "Parashuram", "age" : "32", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
---------------------------------------------- ---------------------
2) To display the results in a formatted way with pretty() method to get records  where  desease=fever 

> db.patient.find({"disease":"fever"}).pretty()
{
    "_id" : ObjectId("597d83f6a9d2632baed3c076"),
    "name" : "John",
    "age" : "29",
    "gender" : "M",
    "disease" : "fever",
    "city" : "chennai"
}
{
    "_id" : ObjectId("597d8488a9d2632baed3c078"),
    "name" : "Harish",
    "age" : "35",
    "gender" : "M",
    "disease" : "fever",
    "city" : "bengaluru"
}
{
    "_id" : ObjectId("597d84baa9d2632baed3c079"),
    "name" : "Namitha",
    "age" : "25",
    "gender" : "F",
    "disease" : "fever",
    "city" : "bengaluru"
}
{
    "_id" : ObjectId("597d84efa9d2632baed3c07a"),
    "name" : "Asha",
    "age" : "15",
    "gender" : "F",
    "disease" : "fever",
    "city" : "bengaluru"
}
{

    "_id" : ObjectId("597d8544a9d2632baed3c07c"),
    "name" : "Lokesh",
    "age" : "37",
    "gender" : "M",
    "disease" : "fever",
    "city" : "mumbai"
}
{
    "_id" : ObjectId("597d855ca9d2632baed3c07d"),
    "name" : "Sangeetha",
    "age" : "37",
    "gender" : "F",
    "disease" : "fever",
    "city" : "mumbai"
}
{
    "_id" : ObjectId("597d8571a9d2632baed3c07e"),
    "name" : "Apoorva",
    "age" : "27",
    "gender" : "F",
    "disease" : "fever",
    "city" : "bengaluru"
}
{
    "_id" : ObjectId("597d858ba9d2632baed3c07f"),
    "name" : "Jijo",
    "age" : "30",
    "gender" : "M",
    "disease" : "fever",
    "city" : "bengaluru"
}
{
    "_id" : ObjectId("597d859da9d2632baed3c080"),
    "name" : "Mallik",
    "age" : "38",
    "gender" : "M",
    "disease" : "fever",
    "city" : "bengaluru"
}
{
    "_id" : ObjectId("597d85afa9d2632baed3c081"),
    "name" : "Parashuram",
    "age" : "32",
    "gender" : "M",
    "disease" : "fever",
    "city" : "bengaluru"
}
-------------------------------------------------------
3) query to get records  where  age=25
> db.patient.find({"age":"25"})
{ "_id" : ObjectId("
597d84baa9d2632baed3c079"), "name" : "Namitha", "age" : "25", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
>
-------------------------------------------------------
4) query to get records  where  age greater than 25

> db.patient.find({"age":{$gt:"25"}})
{ "_id" : ObjectId("597d83f6a9d2632baed3c076"), "name" : "John", "age" : "29", "gender" : "M", "disease" : "fever", "city" : "chennai" }
{ "_id" : ObjectId("597d8457a9d2632baed3c077"), "name" : "Ramesh", "age" : "55", "gender" : "M", "disease" : "blood pressure", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8488a9d2632baed3c078"), "name" : "Harish", "age" : "35", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8544a9d2632baed3c07c"), "name" : "Lokesh", "age" : "37", "gender" : "M", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d855ca9d2632baed3c07d"), "name" : "Sangeetha", "age" : "37", "gender" : "F", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d8571a9d2632baed3c07e"), "name" : "Apoorva", "age" : "27", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d858ba9d2632baed3c07f"), "name" : "Jijo", "age" : "30", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d859da9d2632baed3c080"), "name" : "Mallik", "age" : "38", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85afa9d2632baed3c081"), "name" : "Parashuram", "age" : "32", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85c7a9d2632baed3c082"), "name" : "Rakesh", "age" : "35", "gender" : "M", "disease" : "cold", "city" : "bengaluru" }
------------------------------------------------------------------------
5) query to get records  where  age less than 25
 
> db.patient.find({"age":{$lt:"25"}})
{ "_id" : ObjectId("597d84efa9d2632baed3c07a"), "name" : "Asha", "age" : "15", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d851aa9d2632baed3c07b"), "name" : "Ravi", "age" : "23", "gender" : "M", "disease" : "diabetic", "city" : "chennai" }
---------------------------------------------------------------------
6) query to get records  where  age less than or equal to   25
 
 > db.patient.find({"age":{$lte:"25"}})
{ "_id" : ObjectId("597d84baa9d2632baed3c079"), "name" : "Namitha", "age" : "25", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84efa9d2632baed3c07a"), "name" : "Asha", "age" : "15", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d851aa9d2632baed3c07b"), "name" : "Ravi", "age" : "23", "gender" : "M", "disease" : "diabetic", "city" : "chennai" }
----------------------------------------------------------------------------
7) query to get records  where  age greater than or equal to   25
> db.patient.find({"age":{$gte:"25"}})
{ "_id" : ObjectId("597d83f6a9d2632baed3c076"), "name" : "John", "age" : "29", "gender" : "M", "disease" : "fever", "city" : "chennai" }
{ "_id" : ObjectId("597d8457a9d2632baed3c077"), "name" : "Ramesh", "age" : "55", "gender" : "M", "disease" : "blood pressure", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8488a9d2632baed3c078"), "name" : "Harish", "age" : "35", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84baa9d2632baed3c079"), "name" : "Namitha", "age" : "25", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8544a9d2632baed3c07c"), "name" : "Lokesh", "age" : "37", "gender" : "M", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d855ca9d2632baed3c07d"), "name" : "Sangeetha", "age" : "37", "gender" : "F", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d8571a9d2632baed3c07e"), "name" : "Apoorva", "age" : "27", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d858ba9d2632baed3c07f"), "name" : "Jijo", "age" : "30", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d859da9d2632baed3c080"), "name" : "Mallik", "age" : "38", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85afa9d2632baed3c081"), "name" : "Parashuram", "age" : "32", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85c7a9d2632baed3c082"), "name" : "Rakesh", "age" : "35", "gender" : "M", "disease" : "cold", "city" : "bengaluru" }
-------------------------------------------------------
8) query to get records  where  age NOT equal to   25
 
> db.patient.find({"age":{$ne:"25"}})
{ "_id" : ObjectId("597d83f6a9d2632baed3c076"), "name" : "John", "age" : "29", "gender" : "M", "disease" : "fever", "city" : "chennai" }
{ "_id" : ObjectId("597d8457a9d2632baed3c077"), "name" : "Ramesh", "age" : "55", "gender" : "M", "disease" : "blood pressure", "city" : "bengaluru" }
{ "_id" : ObjectId("597d8488a9d2632baed3c078"), "name" : "Harish", "age" : "35", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d84efa9d2632baed3c07a"), "name" : "Asha", "age" : "15", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }

{ "_id" : ObjectId("597d851aa9d2632baed3c07b"), "name" : "Ravi", "age" : "23", "gender" : "M", "disease" : "diabetic", "city" : "chennai" }
{ "_id" : ObjectId("597d8544a9d2632baed3c07c"), "name" : "Lokesh", "age" : "37", "gender" : "M", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d855ca9d2632baed3c07d"), "name" : "Sangeetha", "age" : "37", "gender" : "F", "disease" : "fever", "city" : "mumbai" }
{ "_id" : ObjectId("597d8571a9d2632baed3c07e"), "name" : "Apoorva", "age" : "27", "gender" : "F", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d858ba9d2632baed3c07f"), "name" : "Jijo", "age" : "30", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d859da9d2632baed3c080"), "name" : "Mallik", "age" : "38", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85afa9d2632baed3c081"), "name" : "Parashuram", "age" : "32", "gender" : "M", "disease" : "fever", "city" : "bengaluru" }
{ "_id" : ObjectId("597d85c7a9d2632baed3c082"), "name" : "Rakesh", "age" : "35", "gender" : "M", "disease" : "cold", "city" : "bengaluru" }
>  
 ----------------------------------------------------------------------------------------------- 
CRUD (Create Read Update Delete) operation we have following commands in the MongoDB 


source

 That’s all for  basic introduction  on MongoDB
Reference:
https://docs.mongodb.com/manual/introduction/
https://www.mongodb.com/leading-nosql-database
https://www.tutorialspoint.com/mongodb/mongodb_query_document.htm
http://theholmesoffice.com/how-to-create-a-mongodb-database/ 
https://www.codeproject.com/Articles/1037052/Introduction-to-MongoDB 
http://www.developer.com/java/data/getting-started-with-mongodb-as-a-java-nosql-solution.html

Thursday, May 11, 2017

Casbah - Scala toolkit for MongoDB

 Casbah is a Scala toolkit for MongoDB  and it  integrates a layer on top of the official mongo-java-driver for better integration with Scala.

The recommended way to get started is with a dependency management system. 

 libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"

Casbah is MongoDB project and will continue to improve the interaction of Scala + MongoDB.

Add import:
import com.mongodb.casbah.Imports._

 ---------------------------------------------

You could get the source from :
https://github.com/alvinj/ScalaCasbahConnections

Then you could modify your
-----------------
spb@spb-VirtualBox:~/
mongoConnector/ScalaCasbahConnections$ cat build.sbt
organization := "com.alvinalexander"

name := "ScalatraCasbahMongo"

version := "0.1.0-SNAPSHOT"

scalaVersion := "2.11.8"

libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"

libraryDependencies += "com.mongodb.casbah" % "casbah-gridfs_2.8.1" % "2.1.5-1"

libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.24"

resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/"
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$

spb@spb-VirtualBox:~/
mongoConnector/ScalaCasbahConnections$ sbt run
[info] Loading project definition from /home/spb/mongoConnector/ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ScalaCasbahConnections/)
[info] Compiling 1 Scala source to /home/spb/mongoConnector/ScalaCasbahConnections/target/scala-2.11/classes...
[warn] there was one deprecation warning; re-run with -deprecation for details
[warn] one warning found
[info] Running casbahtests.MainDriver
debug: a
log4j:WARN No appenders could be found for logger (com.mongodb.casbah.commons.conversions.scala.RegisterConversionHelpers$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
debug: b
debug: c
debug: d
debug: e
debug: f
debug: g
debug: h
debug: i
debug: j
debug: k
debug: l
debug: m
debug: n
debug: o
debug: p
debug: q
debug: r
debug: s
debug: t
debug: u
debug: v
debug: w
debug: x
debug: y
debug: z
sleeping at the end
  sleeping: 1
  sleeping: 2
  sleeping: 3
  sleeping: 4
  sleeping: 5
  sleeping: 6
  sleeping: 7
  sleeping: 8
  sleeping: 9
  sleeping: 10
  sleeping: 11
  sleeping: 12
  sleeping: 13
  sleeping: 14
  sleeping: 15
  sleeping: 16
  sleeping: 17
  sleeping: 18
  sleeping: 19
  sleeping: 20
  sleeping: 21
  sleeping: 22
  sleeping: 23
  sleeping: 24
  sleeping: 25
  sleeping: 26
  sleeping: 27
  sleeping: 28
  sleeping: 29
  sleeping: 30
game over
[success] Total time: 62 s, completed 13 Mar, 2017 5:37:31 PM
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$ sbt package
[info] Loading project definition from /home/spb/mongoConnector/ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ScalaCasbahConnections/)
[info] Packaging /home/spb/mongoConnector/ScalaCasbahConnections/target/scala-2.11/scalatracasbahmongo_2.11-0.1.0-SNAPSHOT.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed 13 Mar, 2017 5:54:42 PM
spb@spb-VirtualBox:~/mongoConnector/ScalaCasbahConnections$
------------------------------------------------------------


spb@spb-VirtualBox:~/Scala_project$ mongo
MongoDB shell version: 3.2.12
connecting to: test
Server has startup warnings:
> show dbs
local  0.000GB
mydb   0.000GB
> show dbs
finance  0.000GB
local    0.000GB
mydb     0.000GB
> show collections
> use finance
switched to db finance
> show collections
stocks
> db.stocks.find()
{ "_id" : ObjectId("58cd184edffa1f1829bfbc94"), "name" : "a", "symbol" : "a" }
{ "_id" : ObjectId("58cd184fdffa1f1829bfbc95"), "name" : "b", "symbol" : "b" }
{ "_id" : ObjectId("58cd1850dffa1f1829bfbc96"), "name" : "c", "symbol" : "c" }
{ "_id" : ObjectId("58cd1851dffa1f1829bfbc97"), "name" : "d", "symbol" : "d" }
{ "_id" : ObjectId("58cd1852dffa1f1829bfbc98"), "name" : "e", "symbol" : "e" }
{ "_id" : ObjectId("58cd1853dffa1f1829bfbc99"), "name" : "f", "symbol" : "f" }
{ "_id" : ObjectId("58cd1854dffa1f1829bfbc9a"), "name" : "g", "symbol" : "g" }
{ "_id" : ObjectId("58cd1855dffa1f1829bfbc9b"), "name" : "h", "symbol" : "h" }
{ "_id" : ObjectId("58cd1856dffa1f1829bfbc9c"), "name" : "i", "symbol" : "i" }
{ "_id" : ObjectId("58cd1857dffa1f1829bfbc9d"), "name" : "j", "symbol" : "j" }
{ "_id" : ObjectId("58cd1858dffa1f1829bfbc9e"), "name" : "k", "symbol" : "k" }
{ "_id" : ObjectId("58cd1859dffa1f1829bfbc9f"), "name" : "l", "symbol" : "l" }
{ "_id" : ObjectId("58cd185adffa1f1829bfbca0"), "name" : "m", "symbol" : "m" }
{ "_id" : ObjectId("58cd185bdffa1f1829bfbca1"), "name" : "n", "symbol" : "n" }
{ "_id" : ObjectId("58cd185cdffa1f1829bfbca2"), "name" : "o", "symbol" : "o" }
{ "_id" : ObjectId("58cd185ddffa1f1829bfbca3"), "name" : "p", "symbol" : "p" }
{ "_id" : ObjectId("58cd185edffa1f1829bfbca4"), "name" : "q", "symbol" : "q" }
{ "_id" : ObjectId("58cd185fdffa1f1829bfbca5"), "name" : "r", "symbol" : "r" }
{ "_id" : ObjectId("58cd1860dffa1f1829bfbca6"), "name" : "s", "symbol" : "s" }
{ "_id" : ObjectId("58cd1861dffa1f1829bfbca7"), "name" : "t", "symbol" : "t" }
Type "it" for more
>
-------------------------

----------------

There are two ways of getting the data from MongoDB to Apache Spark.
Method 1: Using Casbah (Layer on MongDB Java Driver)
val uriRemote = MongoClientURI("mongodb://RemoteURL:27017/")
val mongoClientRemote =  MongoClient(uriRemote)
val dbRemote = mongoClientRemote("dbName")
val collectionRemote = dbRemote("collectionName")
val ipMongo = collectionRemote.find
val ipRDD = sc.makeRDD(ipMongo.toList)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")

Method 2: Spark Worker at our use
Better version of code: Using Spark worker and multiple core to use to get the data in short time.

val config = new Configuration()
config.set("mongo.job.input.format","com.mongodb.hadoop.MongoInputFormat")
config.set("mongo.input.uri", "mongodb://RemoteURL:27017/dbName.collectionName")
val keyClassName = classOf[Object]
val valueClassName = classOf[BSONObject]
val inputFormatClassName = classOf[com.mongodb.hadoop.MongoInputFormat]
val ipRDD = sc.newAPIHadoopRDD(config,inputFormatClassName,keyClassName,valueClassName)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")

---------------------------------------------------------------
Reference:

https://web.archive.org/web/20120402085626/http://api.mongodb.org/scala/casbah/current/setting_up.html#setting-up-sbt



Friday, May 5, 2017

Spectrum LSF -Useful Job Submission Scripts

IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments. Featuring intelligent, policy-driven scheduling and easy to use interfaces for job and workflow management, it helps organizations to improve competitiveness by accelerating research and design while controlling costs through superior resource utilization.
source:https://portal.ictp.it/icts/manuals/lsf6/A_terms.html

After  installation of Spectrum LSF  as  per the instructions at link , you could verify the cluster by issuing  lsid , lshosts  and bhosts commands.

[lsfadmin@host_machine2 test]$ lsid
IBM Spectrum LSF Community Edition 10.1.0.0, Jun 15 2016
Copyright IBM Corp. 1992, 2016. All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
My cluster name is CI_cluster1
My master name is host_machine2
----------------------------------------------------
[lsfadmin@host_machine2 test]$ lshosts
HOST_NAME      type               model  cpuf ncpus maxmem maxswp server RESOURCES
host_machine2   LINUXPP   POWER8 250.0    20   256G      -             Yes (mg)
host_machine3   LINUXPP   POWER8 250.0    20   256G      -             Yes ()
host_machine1   LINUXPP   POWER8 250.0    20   256G      -             Yes ()

[lsfadmin@host_machine2 test]$ bhosts
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
host_machine1                         ok              -     20                   0      0               0          0      0
host_machine2                         ok              -     20                   0      0               0          0      0
host_machine3                         ok              -     20                   0      0               0          0      0
------------------------------------------------------
Now LSF cluster is ready to submit a job .  bsub - Submits a job for execution and assigns it a unique numerical job ID. You can build a job file one line at a time, or create it from another file, by running bsub without specifying a job to submit. When you do this, you start an interactive session in which bsub reads command lines from the standard input and submits them as a single batch job. You are prompted with bsub> for each line.
Example :
[lsfadmin@host_machine2 test]$ bsub
bsub> sleep 100
bsub> Job <1588> is submitted to default queue <normal>.

[lsfadmin@host_machine2 test]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1588    lsfadmi      RUN   normal    host_machine2   host_machine1          sleep 100        May  5 05:59
[lsfadmin@host_machine2 test]$

NOTE: where job <1588> was submitted to the normal queue (default)
--------------------------------------------------
Next,  submit a job  to  another queue  (option -q)  from the list of bqueues with  job name (option -J)  as "job_sachin"  on  host1 (option -m) 

[lsfadmin@host_machine2 test]$ bsub -J job_sachin -q short -m host_machine2
bsub> sleep 100
bsub> Job <1590> is submitted to queue <short>.

[lsfadmin@host_machine2 test]$ bjobs -w
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1590    lsfadmin   RUN   short       host_machine2     host_machine2         job_sachin      May  5 06:07
[lsfadmin@host_machine2 test]$

NOTE: where job_sachin (JOBID 1590)  was submitted  to the queue "short"  and  running on the host2
----------------------------------------------------------
Next , lets create a output file with option "-o"

[lsfadmin@host_machine2 new]$ bsub -J job_sachin -q short -m host_machine2 -o output_file
bsub> sleep 60
bsub> hostname
bsub> Job <1591> is submitted to queue <short>.
[lsfadmin@host_machine2 new]$ bjobs -w
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1591    lsfadmin RUN   short         host_machine2      host_machine2       job_sachin    May  5 06:18

NOTE: after execution of job , it will create the output_file with all details shown below :
[lsfadmin@host_machine2 new]$ ls
output_file
[lsfadmin@host_machine2 new]$ cat output_file
host_machine2
------------------------------------------------------------
Sender: LSF System <lsfadmin@host_machine2>
Subject: Job 1591: <job_sachin> in cluster <CI_cluster1> Done
Job <job_sachin> was submitted from host <host_machine2> by user <lsfadmin> in cluster <CI_cluster1>.
Job was executed on host(s) <host_machine2>, in queue <short>, as user <lsfadmin> in cluster <CI_cluster1>.
</home/lsfadmin> was used as the home directory.
</home/lsfadmin/test/new> was used as the working directory.
Started at Results reported on
Your job looked like:
# LSBATCH: User input
sleep 60
hostname
Successfully completed.
Resource usage summary:
    CPU time :                                   0.34 sec.
    Max Memory :                             12 MB
    Average Memory :                       12.00 MB
    Total Requested Memory :           -
    Delta Memory :                            -
    Max Swap :                                 344 MB
    Max Processes :                              4
    Max Threads :                                 5
    Run time :                                     62 sec.
    Turnaround time :                         61 sec.
The output (if any) is above this job summary.
-----------------------------------------------------------------------------------
 

Lets see both output_file and err_file  by submitting wrong command "hostgame" instead of "hostname"
[lsfadmin@host_machine2 new]$ bsub -J job_sachin -q short -m host_machine2 -o output_file -e err_file
bsub> sleep 60
bsub> hostgame
bsub> Job <1592> is submitted to queue <short>.

[lsfadmin@host_machine2 new]$ bjobs -w
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1592    lsfadmin RUN   short      host_machine2   host_machine2   job_sachin May  5 06:24

[lsfadmin@host_machine2 new]$ ls
err_file  output_file

[lsfadmin@host_machine2 new]$ cat err_file
/home/lsfadmin/.lsbatch/1493979887.1592.shell: line 2: hostgame: command not found
[lsfadmin@host_machine2 new]$

[lsfadmin@host_machine2 new]$ cat output_file
Sender: LSF System <lsfadmin@host_machine2>
Subject: Job 1592: <job_sachin> in cluster <CI_cluster1> Exited

Job <job_sachin> was submitted from host <host_machine2> by user <lsfadmin> in cluster <CI_cluster1>.
Job was executed on host(s) <host_machine2>, in queue <short>, as user <lsfadmin> in cluster <CI_cluster1>.
</home/lsfadmin> was used as the home directory.
</home/lsfadmin/test/new> was used as the working directory.
Started at Results reported on
Your job looked like:
# LSBATCH: User input
sleep 60
hostgame
Exited with exit code 127.
Resource usage summary:
    CPU time :                                   0.34 sec.
    Max Memory :                              12 MB
    Average Memory :                        12.00 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   344 MB
    Max Processes :                              4
    Max Threads :                                5
    Run time :                                       74 sec.
    Turnaround time :                           61 sec.

The output (if any) is above this job summary.
PS:
Read file <err_file> for stderr output of this job.


NOTE: If you don't mention the error file , you will see both output & error in same output file . If LSF  job runs on remote host , then you need to redirect the output from other machine by using the job directive   #BSUB -f " outputfile < outputfile"
---------------------------------------------------------
Now ,  we can write a small script  which could be submitted as a LSF job. You can redirect a script to the standard input of the bsub command as shown here:  Create a file submit.lsf  
-------------------------------------------------------------
[lsfadmin@host_machine2 new]$ cat submit.lsf
#BSUB -n 12
#BSUB -R "span[ptile=4]"                 # Where X is in the set {1..X}
#BSUB -J job_sachin          # Job Name
#BSUB -outdir "outputdir/%J_%I"
#BSUB -o outputfile
#BSUB -f " outputfile < outputfile"
#BSUB -q short               # Which queue to use {short, long, parallel, GPU, interactive}
#BSUB -W 0:55                # How much time does your job need (HH:MM)
#BSUB -L /bin/sh             # Shell to use
sleep 30
/opt/xyz/spectrum_mpi/bin/mpirun hostname
----------------------------------------------------------------------------------------------
NOTE: after the job submission  you will see 4 tasks running on each node due to ptile=4  for total 12 processes. It will create an output directory  with outfile

[lsfadmin@host_machine2 new]$ bsub < submit.lsf
Job <1596> is submitted to queue <short>.
[lsfadmin@host_machine2 new]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1596    lsfadmi       RUN   short      host_machine2   host_machine1   job_sachin May  5 06:48
                                                                                    host_machine1
                                                                                    host_machine1
                                                                                    host_machine1
                                                                                    host_machine2
                                                                                    host_machine2
                                                                                    host_machine2
                                                                                    host_machine2
                                                                                    host_machine3
                                                                                    host_machine3
                                                                                    host_machine3
                                                                                    host_machine3
[lsfadmin@host_machine2 new]$ ls
outputdir  submit.lsf
[lsfadmin@host_machine2 new]$

[lsfadmin@host_machine2 new]$ cd outputdir/1596_0/
[lsfadmin@host_machine2 1596_0]$ ls
outputfile
[lsfadmin@host_machine2 1596_0]$ cat outputfile
Sender: LSF System <lsfadmin@host_machine1>
Subject: Job 1596: <job_sachin> in cluster <CI_cluster1> Done

Job <job_sachin> was submitted from host <host_machine2> by user <lsfadmin> in cluster <CI_cluster1>.
Job was executed on host(s) <4*host_machine1>, in queue <short>, as user <lsfadmin> in cluster <CI_cluster1>.
                                                <4*host_machine2>
                                                <4*host_machine3>

</home/lsfadmin> was used as the home directory.
</tmp> was used as the working directory.
Started at Results reported on
Your job looked like:

host_machine1
host_machine1
host_machine1
host_machine1
host_machine3
host_machine3
host_machine3
host_machine3
host_machine2
host_machine2
host_machine2
host_machine2

----------------
# LSBATCH: User input
#BSUB -n 12
#BSUB -R "span[ptile=4]"                 # Where X is in the set {1..X}
#BSUB -J job_sachin          # Job Name
#BSUB -outdir "outputdir/%J_%I"
#BSUB -o outputfile
#BSUB -f " outputfile < outputfile"
#BSUB -q short               # Which queue to use {short, long, parallel, GPU, interactive}
#BSUB -W 0:55                # How much time does your job need (HH:MM)
#BSUB -L /bin/sh             # Shell to use
sleep 30
/opt/xyz/spectrum_mpi/bin/mpirun hostname
Successfully completed.
Resource usage summary:
    CPU time :                                   0.73 sec.
    Max Memory :                             12 MB
    Average Memory :                       11.67 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   344 MB
    Max Processes :                              4
    Max Threads :                                5
    Run time :                                   31 sec.
    Turnaround time :                            43 sec.
The output (if any) is above this job summary.

-------------------------------------------------------------------------------------

You want LSF to wait until a job finishes before accepting new submissions.Submt the job with bsub -K. This will have LSF wait until the job finishes before it accepts another job.

[lsfadmin@host_machine2 test]$ cat our_wait.sh
#!/bin/bash
bsub -K -L /bin/bash < job2.sh &
bsub_pid=$!

if [ -e /NFSshare/lsf_logs ]
then
     echo "FileName - Found"
else
   echo "FileName - Not found,  wait for sometime"
   sleep 3
fi
echo "Finally,  lsf_logs now exists!!!"

tail -f /NFSshare/lsf_logs &
tail_pid=$!
wait $bsub_pid
kill $tail_pid
--------------------------------------------
where:  job2.sh
#BSUB -n 10
#BSUB -R "span[ptile=4]"
#BSUB -J job_sachin
#BSUB -o /NFSshare/lsf_logs
#BSUB -R "rusage[mem=6000]"
 hostname
----------------------------------------
The “bpeek” command displays the stdout and stderr of a job while it is running. Usually this is only the most recent 10 lines of output. If you use the “-f” option, “bpeek” will continue to show additional lines as they are produced. It uses the “tail –f” command to do this, so you can stop the display of the output at any time by using <Ctrl-C>.

[lsfadmin@host_machine2 test]$ cat my_wait.sh
#!/bin/bash
bsub -K -L /bin/bash < my_script.sh &
bsub_pid=$!
bpeek -f -J job_sachin &
tail_pid=$!
wait $bsub_pid
kill $tail_pid

---------------------------------------------

How to submit the interactive jobs  ?
You can also submit an interactive job using a pseudo-terminal with shell mode support. When you specify the -Ip option, bsub submits a batch interactive job and creates a pseudo-terminal when the job starts. 

bsub -Ip -q interactive_queue  -J job_sachin -n 8 -L /bin/bash -R "span[ptile=4]" "sh $HOME/bin/my_test.sh $1"

-----------------------------THE END ----------------------------------------
Reference :
https://www-03.ibm.com/systems/uk/spectrum-computing/products/lsf/
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_welcome.html
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/New%20IBM%20Platform%20LSF%20Wiki