Thursday, October 4, 2018

Cloudera to merge with Hortonworks, creating a $5.2 billion company

source
  
 Cloudera and Hortonworks, the two leading enterprise Hadoop providers, announce Merger to Create World’s Leading Next Generation Data Platform and Deliver Industry’s First Enterprise Data Cloud. According to the companies, the combined entity has a better chance to be a next-gen data platform across multiple clouds, on-premises and Edge computing. Hortonworks and Cloudera also have complementary approaches, customers and industries. They will combine in a merger of equals in a deal valued at $5.2 billion. Both companies were pioneers in Hadoop, an open-source platform that could analyze data in ways that scaled up easily—a necessity during a time when the availability of data was increasing exponentially each year.

source

First, remember the history of Apache Hadoop.

Google built an innovative scale-out platform for data storage and analysis in the late 1990s and early 2000s, and published research papers about their work. Doug Cutting and Mike Cafarella were working together on a personal project, a web crawler, and read the Google papers. The two of them started the Hadoop project to build an open-source implementation of Google’s system.

Yahoo quickly recognized the promise of the project. It staffed up a team to drive Hadoop forward, and hired Doug Cutting. That team delivered the first production cluster in 2006 and continued to improve it in the years that followed.

In 2008,
Rob Bearden co-founded Cloudera with folks from Google, Facebook, and Yahoo to deliver a big data platform built on Hadoop to the enterprise market. We believed then, and we still believe today, that the rest of the world would need to capture, store, manage and analyze data at massive scale. We were the first company to commercialize the software, building on the work done at Yahoo and other consumer internet companies.

Three years later, the core team of developers working inside Yahoo on Hadoop spun out to found Hortonworks. They, too, saw the enormous potential for data at scale in the enterprise. They had proven their ability to build and deliver the technology at Yahoo.


Hortonworks, which spun out of Yahoo, went public in 2014, and Cloudera, which is larger than Hortonworks in terms of market capitalization and revenue, went public in 2017. Intel was a major Cloudera investor. Amazon's market-leading cloud unit has a distribution of Hadoop software, and another competitor of the companies, MapR, is privately held. Cloudera, which was founded in 2008, raised over a billion dollars before going public, the vast majority coming in one major $740 million burst from Intel Capital in 2014. Hortonworks, founded three years later, raised $248 million.


Tom Reilly, the long-time CEO at Cloudera, certainly sees the two companies as complementary, offering customers something together that they couldn’t separately. “Our businesses are highly complementary and strategic. By bringing together Hortonworks’ investments in end-to-end data management with Cloudera’s investments in data warehousing and machine learning, we will deliver the industry’s first enterprise data cloud from the Edge to AI,” Reilly said in a statement. The companies commercialize the Hadoop open-source big data software, which companies can use to store, process and analyze lots of different types of data.


source
 Cloudera stock jumped as much as 25 percent  after it announced an all-stock merger of equals with competitor Hortonworks. Hortonworks stock was halted just prior to the announcement and jumped as much as 29 percent.. The combined equity value of the two companies is $5.2 billion  of their stocks . The deal is subject to U.S. antitrust clearance, and the companies expect it to close in the first quarter of 2019. Under the terms of the transaction agreement, Cloudera stockholders will own approximately 60% of the equity of the combined company and Hortonworks stockholders will own approximately 40% . Hortonworks shareholders will get 1.305 Cloudera shares for each share owned.
The two companies are committed to supporting existing offerings from the two companies for at least three years but will work on a "unity release" of software, drawing on technologies from both companies' portfolios, Reilly said. The unified company wants to honor Hortonworks' commitment around providing all of its software under open-source licenses, but over time there will also be a proprietary option that offer additional features, including in the cloud.

Tom Reilly, will serve as CEO of the combined company. Hortonworks' Chief Operating Officer, Scott Davidson, will serve as Chief Operating Officer; Hortonworks' Chief Product Officer, Arun C. Murthy, will serve as Chief Product Officer; and Cloudera's Chief Financial Officer, Jim Frankola, will serve as Chief Financial Officer, of the combined company.
Rob Bearden will join the board of directors. Current Cloudera board member, Marty Cole, will become Chairman of the board of directors. They plan to merge, creating a single company under the Cloudera banner that will focus on “edge to AI” opportunities.

References:
1) https://vision.cloudera.com/cloudera-hortonworks-from-the-edge-to-ai/
2)  https://www.zdnet.com/article/cloudera-hortonworks-merge-in-deal-valued-at-5-2-billion/

 

Monday, October 1, 2018

Blockchain Technology

Blockchain is one of the biggest buzzwords in technology . Blockchain technologies are at an early stage of adoption, but they are proliferating rapidly.  Blockchain was invented by Satoshi Nakamoto in 2008 to serve as the public transaction ledger of the cryptocurrency bitcoin. And many alternative cryptocurrencies have their own block chain.


A blockchain is a growing list of records, called blocks, which are linked using cryptography.Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data (generally represented as a merkle tree root hash).


A blockchain is a decentralized, distributed and public digital ledger that is used to record transactions across many computers so that the record cannot be altered retroactively without the alteration of all subsequent blocks and the consensus of the network. This allows the participants to verify and audit transactions inexpensively. A blockchain database is managed autonomously using a peer-to-peer network and a distributed timestamping server. They are authenticated by mass collaboration powered by collective self-interests. Some basic Bitcoin terms to help you understand it all better.

  • Decentralized: It isn’t stored in a single location, but on millions of nodes simultaneously
  • Distributed: It’s shared and continually reconciled on the network
  • Blockchain — Bitcoin’s ledger. The Blockchain is a public record and can be stored by anyone. The Blockchain stores all transaction data.
  • Mining — the process by which Bitcoin transactions are validated using special processors. The people who do this are called miners.
  • Node — A server or storage device which stores the entire Blockchain and runs a Bitcoin client software that peruses all transaction data and the Blockchain to check if they conform to Bitcoin protocol.
  • Bitcoin wallet — A software application in which you can view your Bitcoin holdings, and send or receive Bitcoins.
  • Bitcoin wallet address — Your wallet address is equivalent to your bank account number. 
  • Bitcoins are stored against this address/ID. Each wallet address is associated with two unique keys, called public and private keys.
  • Public key — The public key is used to send Bitcoins to you and can be seen by anyone. The private key is your password and you need it to spend your Bitcoins.


 How does cryptocurrencies like bitcoin work?

The bitcoin blockchain is “decentralized,” meaning it is not controlled by one central authority. While traditional currencies are issued by central banks, bitcoin has no central authority. Instead, the bitcoin blockchain is maintained by a network of people known as miners.

These “miners,” sometimes called “nodes” on the network, are people running purpose-built computers that are actually competing to solve complex mathematical problems in order to make a transaction go through. This is know as Proof-of-Work.

For example, say lots of people are making bitcoin transactions. Each transaction originates from a wallet which has a “private key.” This is a digital signature and provides mathematical proof that the transaction has come from the owner of the wallet.

Now imagine lots of transactions are taking place across the world. These individual transactions are grouped together into a block, organized by strict cryptographic rules. The block is sent out to the bitcoin network, which are made up of people running high-powered computers. These computers compete to validate the transactions by trying to solve complex mathematical puzzles. The winner receives an award in bitcoin.This validated block is then added onto previous blocks creating a chain of blocks called a blockchain.

Proof-of-work is a process of producing data that’s hard to get,  but easy to verify. In the context of a blockchain, proof-of-work is about solving mathematical problems. If a problem is successfully solved, then a new block can be added to the blockchain. On average, performing proof-of-work calculations and adding a new block to the chain takes about 10 minutes.

What’s behind the proof-of-work process  or Solving the puzzle?. How do they find this number? By guessing at random. The hash function makes it impossible to predict what the output will be. So, miners guess the mystery number and apply the hash function to the combination of that guessed number and the data in the block. The resulting hash has to start with a pre-established number of zeroes. There's no way of knowing which number will work, because two consecutive integers will give wildly varying results. What's more, there may be several nonces that produce the desired result, or there may be none (in which case the miners keep trying, but with a different block configuration).

The first miner to get a resulting hash within the desired range announces its victory to the rest of the network. All the other miners immediately stop work on that block and start trying to figure out the mystery number for the next one. As a reward for its work, the victorious miner gets some new bitcoin.

source
 Technically, a blockchain is a chain of blocks ordered in a network of non-trusted peers. Each block references the previous one and contains data, its own hash, and the hash of the previous block.
source


source
There are different types of blockchains, which rely on different configurations as well as consensus mechanism depending on the type and size of a network. Bitcoin which is the popular blockchain is permission less, which means anyone, can participate and access the content in the chain.

Every time a person wants to initiate a transaction on a blockchain, a block is created detailing or the details of the transaction which must be broadcast’ to all nodes in the network. The block, in this case, comes with a timestamp that helps establish a sequence of events.

Once all the nodes agree, and the authenticity of the block is established, the new block is linked’ to the previous block which is also linked to the previous block, resulting in a chain of a sought, commonly referred to as blockchain.

The blockchain is normally replicated on an entire network where everyone in the network can see and access it. Cryptography is used to secure the chain which makes it impossible for a single person to manipulate its contents. For any change to take place, all the people which in this case are represented by nodes must agree to the proposed changes which are initiated in the next block without altering the previous block.

Once a piece of information is added on a block and then recorded on the blockchain ledger, nobody can change or remove it. The tamper-proof aspect is what is fuelling suggestion as to why Blockchain CIOs should take a keen interest in the technology at a time when data security and preservation is of utmost importance.                             

 The transactions may be payments on a loan. It may also be a disbursement of funds from a loan or line of credit to a borrower to a third party such as the car dealer who originated the loan. These transactions are assembled into “blocks.” The blocks are loaded into the public record and become the “chain” of events to arrive at a balance. Each block is encrypted and contains control amounts to ensure that blocks cannot be altered.


source





How the blockchain is tamperproof:

  1. One of the advantages of blockchain is that it can’t be tampered with. Each block that is added onto the chain carries a hard, cryptographic reference to the previous block.
  2. That reference is part of the mathematical problem that needs to be solved in order to bring the following block into the network and the chain. Part of solving the puzzle involves working out random number called the “nonce.” The nonce, combined with the other data such as the transaction size, creates a digital fingerprint called a hash. This is encrypted, thus making it secure.
  3. Each hash is unique and must meet certain cryptographic conditions. Once this happens a block is completed and added to the chain. In order to tamper with this, each earlier block, of which there are over half a million, would require the cryptographic puzzles to be re-mined, which is impossible.


Cryptocurrencies also present a problem for governments to control their economies. Fiscal and monetary policies become harder to enforce since the new currencies are outside the traditional government institutions. The electronic movement of funds in a blockchain environment can bypass government institutions and their controls.

To meet modern business demands, IBM has joined with other companies to collaboratively develop an open source, production-ready, business blockchain framework, called Hyperledger Fabric™, one of the Hyperledger® projects hosted by The Linux Foundation®. Hyperledger Fabric supports distributed ledger solutions on permissioned networks for a wide range of industries. Its modular architecture maximizes the confidentiality, resilience, and flexibility of blockchain solutions.
source

The IBM Blockchain Platform is a blockchain software-as-a-service offering on the IBM Cloud. It's the only fully integrated, enterprise-ready blockchain platform designed to simplify the development, governance, and operation of a decentralized, multi-institution business network. The IBM Blockchain Platform accelerates collaboration in this decentralized world by leveraging open source technology from the Hyperledger Fabric framework and Hyperledger Composer tooling.

IBM Launches Food Trust Blockchain For Commercial Use: IBM's blockchain-based food traceability platform is now live for global use by retailers, wholesalers and suppliers across the food ecosystem. In September, retail giant Walmart announced that it would begin requiring its suppliers to implement the system to track bags of spinach and heads of lettuce. Other participants include multinational companies Nestle, Kroger, Tyson Foods, Kroger and Unilever.

IBM Code pattern provides an exemplar solution of IoT Asset Tracking via a Blockchain. Develop an IoT asset tracking app using Blockchain. Use an IoT asset tracking device to improve a supply chain by using Blockchain, IoT devices, and Node-RED. This pattern addresses the very real problem of the safe delivery of perishable goods (food, medicine, livestock, etc.) that are sensitive to environmental conditions during shipment. Every shipment of perishable goods has thresholds (refrigeration requirements, avoidance of shocks or vibration, etc. ) to protect the goods from contamination or damage. If the shipment exceeds these thresholds, the goods are damaged and might become a health hazard. By recording the details (Where, What, and When) of a shipment that experienced extreme conditions (thresholds specified in the smart contract) developers can verify that the goods were delivered successfully (or not). Then, payment is predicated on successful delivery. Tracking the conditions of the shipment across multiple participants using a blockchain provides verification and trust in these processes.

The primary difference between a blockchain and a database is centralization. While all records secured on a database are centralized, each participant on a blockchain has a secured copy of all records and all changes so each user can view the provenance of the data. The magic happens when there’s an inconsistency — since each participant maintains a copy of the records, blockchain technology will immediately identify and correct any unreliable information.

Trust the data : An interesting thing happens when competitors can trust the data being shared, it creates opportunities for more participants within the vertical to join the blockchain network and increase the visibility into the data. Expanding on the previous example, if Samsung and Apple are sharing technology and data on a blockchain network, and a transportation company joins the network, that data the transportation company wants to share on the network is immediately accessible to each of the other participants and then replicated to their records. Any time one of the participants makes a change, a new version of the record is validated by all participants. In this case, Apple could track the shipment from Samsung’s factory to Apple’s manufacturing center. Additionally, if a bank is added to the network,  payment to the bank and to each participant after a transaction can be triggered automatically when a condition in the data is met, and because this data is secured and validated by all the participants, no single participant can fraudulently, or accidentally, alter the data to meet the conditional trigger within the data.

Ultimately, common blockchain standards and protocols may play a key role in enabling interoperability among international payment systems. Meanwhile, new technology usually evolves faster than consensus standards development, so these efforts may have little impact on near-term blockchain deployments. Disruptive technologies such as Blockchain and the Internet of Things, will have a profound impact in the way we live and work


 References :

1) https://rubygarage.org/blog/how-blockchain-works 
2) https://en.wikipedia.org/wiki/Blockchain
3) https://www.americanexpress.com/us/content/foreign-exchange/articles/international-payments-hyperledger-interledger/
4) https://www.coindesk.com/information/how-bitcoin-mining-works/
5) https://www.cnbc.com/2018/06/18/blockchain-what-is-it-and-how-does-it-work.html
6) https://www.forbes.com/sites/astanley/2018/10/08/ready-to-rumble-ibm-launches-food-trust-blockchain-for-commercial-use/#3a1ca5c77439
7) https://www.youtube.com/watch?v=93E_GzvpMA0 
8) https://developer.ibm.com/patterns/develop-an-iot-asset-tracking-app-using-blockchain/?social_post=1780070590&fst=Learn&_lrsc=45077f56-e223-4172-9317-6cec9e301566
9) https://github.com/IBM/IoT-AssetTracking-Perishable-Network-Blockchain
10) https://www.fool.com/investing/2018/10/10/ibms-food-safety-blockchain-picks-up-steam.aspx 
11) https://101blockchains.com/blockchain-cio-executives-guide/  
12) https://www.ibm.com/blogs/blockchain/2019/01/whats-the-difference-between-a-blockchain-and-a-database/