「ビッグデータの覇者たち」紙版と電子書籍版出揃いました

新著「ビッグデータの覇者たち」は、紙版と電子書籍版がほぼ同時発売となり、本日両方とも揃いました。 上が紙版、下が電子書籍/キンドル版です。感想をぜひこちらの書籍ページのお書きください。http://www.enotechconsulting.com/publications/big-data/

新刊記念対談のご案内(4/25 11am-)

Kokuryo4.jpg

新著「ビッグデータの覇者たち」が日本時間で本日発売です。新刊を記念して、来週木曜日に、慶應大学の國領二郎先生とニコ生にて対談いたします。 國領先生は、私にとってはNTT時代の先輩です。日本の通信や情報産業の政策に直接関われる力と立場があり、かつ技術や産業の本質を深く理解して、私利私欲でなく「志」で行動している、私の尊敬する先輩です。先生が3月に出された「ソーシャルな資本主義」という本は、私の「ビッグデータの覇者たち」と相通じる部分が多く、一つの同じ大きな現象を別の角度から見ているように感じました。対談では、そのあたりのお話をお聞きしたいと思っています。

http://live.nicovideo.jp/watch/lv134411778

 

 

 

 

AWS S3 behind Netflix success

“Big Data and Cloud Storage” series Vol. 5:   Event and Company #3

AWS S3 behind Netflix success

Netflix as the big data tycoon

Netflix is known as one of the most sophisticated user player in big data community.  They appear regularly in big data conferences like Strata and discuss how they utilize the data analytics in their business, and what their infrastructure is like.

My theory why Netflix is successful while many others are not, is that their sophisticated big data power enables them to deliver better service and wider margin.  Media industry people often see online video delivery as just another distribution means and do not pay too much attention to this “brain” part of the cloud, but it is the secret source of their success.

From the user data to recommendations

I have tried all major movie services for years, including Netflix, Hulu, Apple, Amazon, cable’s TVEverywhere, as well as Joost, CinemaNow and MovieLink (remember them?).  Among them, Netflix stands out in the power of recommendation. Other services push the ones that they want to show such as new shows, while Netflix top page is filled by personalized recommendations.

At the discussions in big data conferences, Netflix shows off how they utilize the amazing details of the usage data to come up with such recommendations.

With streaming, Netflix knows what you watch at which date and what time, if you quit watching, where you stop and whether you restart watching or not, on what device.  It is not a simple “people who watch this movie also watch these” factor.

In my household, I have Netflix account and everyone else in the family share my account.  Each have very different taste, so I was feeling pity for confusing Netflix, but they are actually one step ahead.  They already roughly know the profile of my family members through the analysis of such usage data.  And they show it in a subtle way, such as “SF Action” or “Foreign Art Films”, not creepy way such as “one for your teenage son” or “for mom”.

Scale out on Amazon S3

Netflix is the most well known user of Amazon Web Service (AWS) as their infrastructure to support this massive data analytics operation.  They state that “data center management is not our main business” as the reason to use AWS.

They used to have their own data center and was running Oracle database early in their history, but the data amount exploded as their online streaming service was catching on, to the point where they cannot catch up by building the new one anymore.  So they moved to almost 100% cloud-based in 2009-10 both in processing and storage, to be able to scale rapidly.

Currently, AWS’s S3 is used to store both video and user behavior data.  User order gets processed in NoSQL database Cassandra, and then the data is dumped into S3 once a day.  According to an engineer’s confession in Strata speech, they had so much trouble in this transfer process, so they developed their own software to do this and named it Aegisthus.  Aegisthus is a figure who killed the princess Cassandra of Troy in a famous tragedy of Greek mythology.

User data stored in S3 is analyzed with Hadoop tools, and the results are also stored in S3 again.  S3 is generally known as "Pay as you go" service, but big customers like Netflix usually are assigned with a fixed capacity, so they use the slack capacity for user data analytics after midnight of the West Coast, when video stream volume decrease sharply.

The speaker emphasized the concept "the right tools for the right job" in his speech.  Depending what your business model is, you have to choose where to put your own resources and what you buy from outside.  The big data strategy is not solely defined by the amount of data or company size.  Strategic priorities often are more important in your decision of “build or buy”.  Cloud storage provide advantages for enterprise of all sizes.

新著「ビッグデータの覇者たち」予約受付中です

4月18日発売予定の講談社現代新書「ビッグデータの覇者たち」、アマゾンで予約受付中です。

電子書籍もほぼ同時に発売予定ですが、紙バージョンご希望の方はどんどん予約してください!

また、4月25日には、新刊記念のニコ生対談イベントが予定されています。詳細は追ってお知らせします。

Cloud Expo Europe and Citrix

“Big Data and Cloud Storage” Vol. 4

Event and Company #2

Cloud Expo Europe and Citrix

In London, “Cloud Expo Europe” took place on January 29th and 30th, 2013. Cloudian exhibited at the Expo, so I asked Giorgio Propersi, General Manager, Americas and EMEA at Cloudian, how the Expo and the cloud industry look like in London.

◆ “Cloud” as an international phenomenon

I attended another “Cloud Expo” in Santa Clara last fall. In that conference, I felt that the main focus was on OpenStack – an open source software for IaaS (Infrastructure as a Service) – and was wondering if it was the same in London, but Giorgio has a different impression.

Giorgio: Both in Santa Clara and in London, I think the shows were neutral, rather than focused on OpenStack. Surely, OpenStack foundation and other big OpenStack supporters such as Rackspace were big sponsors, but there were also many companies supporting other types of cloud such as CloudStack.

What amazed me was how much the (London) show grew bigger than last year. 2012 was much smaller and it was more “hosting” focused. This year, finally the show was really “cloud” centered, and companies were showing cloud computing or cloud storage technology, and all technologies around the cloud (such as how to manage the cloud, keep track of what is happening in the cloud, debug the cloud and so on). The floor was very full. The show organizer was expecting 5000 attendants, but I thought there were much more people.

Once inside the venue, many US companies were exhibiting, (NTT was also there), and it was really hard to see the difference from an U.S. - based show. It was indeed a very international show. There were some European companies but these Companies were at the Santa Clara show too. I think the balance was the same as in the U.S. Customer profile was also not much different, just geographically different, with more representation from companies centered in Europe and Asia (such as BT and Tata Communications). There was also a nice representation of small European service providers from many countries in Europe.

◆ CloudStack and Citrix

CloudStack is another open-source IaaS software. CloudStack was developed by Cloud.com, which was acquired by Citrix in 2011. Both Cloud.com and Citrix were OpenStack members, but after the acquisition, Citrix released CloudStack in 2012 and donated it to Apache Software Foundation, then decided to leave OpenStack group. A bit complicated, but anyway CloudStack is now a separate project from OpenStack. Joe Onisick writes in his article in Network Computing that CloudStack is better packaged for enterprise adoption, while OpenStack is more like framework and has strong supporters.

Giorgio: We need to keep in mind that Cloudian integrates with both OpenStack and CloudStack. It is hard to simplify the differences between them, and customers may choose one or the other for completely different reasons. There are plenty of technical papers describing the differences between the two open source approaches, and the merits/demerits of each one.

Quite often, if I am an enterprise or a service provider and am moving my system to the cloud, I would look for a solution that is proven, and fully supported by my technology supplier. If I get the open source code directly, I will need to commit a lot of my internal engineering resources; and later on I will be responsible to support my cloud. This may not be good for many companies. Assuming for example I am a bank, I would rather spend my time and money in doing what I do best, such as banking, so I would rather go talk to my trusted technology supplier, who will take care of my cloud. I buy the cloud from my technology provider not because of which open source software they use inside, but because the cloud solution they will be proposing to me works for me, and is optimized with my existing system and the price is right. In some cases, customers don’t even know what technology their cloud is based on.

And open source is not the only cloud technology. Microsoft, VMware and others have their own cloud solutions.

Citrix provides Cloud Platform, their commercial version based on CloudStack, and this the cloud platform people buy from Citrix today.

◆ STaaS and Secondary Storage

Citrix’ Cloud Platform is such a solution for enterprise customers, but they don’t have the object storage piece, so they integrate with Cloudian.

Giorgio: Citrix can provide two additional functionalities by integrating Cloudian (the object storage infrastructure provided by Cloudian, Inc.) to their Cloud Platform, functionalities which they don’t have at this moment. One is STaaS (Storage as a Service) based on S3. STaaS means the capability of storing objects in the cloud, and use the cloud as storage. And the S3 compatibility allows the concept of the hybrid cloud. Many companies have adopted the concept of hybrid cloud. For example, I want to store specific files in the public cloud (such as Amazon), and specific files to my private cloud; and I keep changing my mind with regard to the destination of my data. The only way to handle this situation (that I want “some” data in the private cloud, and “some” data in the public cloud) is to have the same interface to the public and to the private cloud. So I can easily switch between the two. This interface is S3, which is fully supported by Cloudian. So the STaaS functionality with S3 compatibility is very beneficial; and this is what Cloudian adds to Cloud Platform.

The other functionality has to do with the way Secondary Storage is stored. While primary storage is the immediate disc for items that need to be accessed very quickly and used directly by the application (such as an excel spreadsheet), Secondary storage is used for snapshots, templates, ISOs, VMs, etc. If you use Cloudian to store Secondary Storage, then Secondary Storage becomes available to every zone within a CloudStack cloud. In non-Cloudian environment, typically secondary storage is stored in the local NAS, and because of that it can only be accessed locally; and if that zone is down, these templates, VMs, snapshots, etc. are not available – which is bad. This functionality is very important. Visitors to our booth at Expo really liked this, since maintaining visibility to all critical Secondary Storage from every zone is of paramount importance.

◆ “Object storage”, rather than big data or cloud

I have been writing this column on the theme of “big data and cloud storage,” but Giorgio prefers to describe what we are dealing with here as “object storage”, rather than “big data” or “cloud storage.”

Giorgio: We prefer to refer our product as the latest and greatest object storage technology (rather than big data).

The term “big data” can be misleading, because the size really is not always the motivation for object storage. Many companies start using object storage in a small way, such as with 5 or 10 terabyte, but they store data in object storage in the cloud (instead than using most traditional storage technology) because of the a. cost, b. efficiency and c. scalability, so they can scale to big data later on. People like object storage because of its simplicity, its affordability and its scalability.

And object storage fits so well to the cloud. Cloud is important because I don’t have to buy my storage anymore, or I don’t have to hire people to manage it either. I can outsource my storage to the cloud.

◆ Cloudian for Citrix

So what is important for customers in choosing object storage?

Giorgio: Compared to other object storage partners of Citrix, Cloudian’s strengths are the full S3 compatibility, and the ability to support multiple datacenters. Multi-datacenter support is not easy. When our first European customer Lunacloud wanted to add a second datacenter in France (on top of the existing datacenter in Portugal), it was a big factor. We support several configurations with regard to how many replicas can be kept, and where these replicas are kept. Other companies cannot do that. And keep in mind that Cloudian was Citrix only storage partner at the Citrix booth in London.

At London Expo, we announced Cloud Portal Business Manager (CPBM) with Citrix as well. It is a dashboard to manage cloud services on the web, so now a customer – using the CPBM, can add (or modify) a cloud storage service, that is provided by Cloudian through this portal.

「ビッグデータ文明論第四回 グーグルとソーシャル」記事公開

ビッグデータ文明論 第四回 「塀の中に逃げ込む高級素材」 (現代ビジネス) が掲載されました。今回は、第三回に続き「グーグル編」ですが、グーグルとモバイルの関わり、ソーシャルという弱み、「紫の階調」とプライバシーといった、多彩な話題です。

「先進国の経済成長はもう終わったのか?」記事公開

日経ビジネスオンラインの新記事が公開されましたのでお知らせします。年初なので、ニュースというより、この先何十年の話をちょっと考えてみました。写真のノースウェスタン大学経済学教授、ロバート・ゴードン氏の論文についての感想です。

「武器商人アマゾンに竹槍で挑戦するグーグル」記事公開

年末にお知らせしたとおり、ENOTECHの公式サイトに日本語ブログも統合し、新しいブログをオープンいたしました。 それで、テストを兼ねて早速お知らせです。

ZDNet「ビッグデータとクラウド・ストレージ」第八回 がアップされました。

同じ記事は、クラウディアンのブログでもご覧になれます。

宜しくお願いいたします!

Why DoCoMo wants to sell radish

RadishBoya.gif

Why DoCoMo wants to sell radish

NTT DoCoMo, the largest mobile carrier in Japan, recorded the net loss in subscriber number in November, negative 40800, first time since August 2007 and the largest loss in its history. Many explanations have been made.  The easiest to understand is that they don't have iPhone in their line up while their competitors do, and some also suspect that the string of large network troubles in recent months disappointed customers who believed that DoCoMo provides the best service despite their higher price.

And Mr. Kaoru Kato, President and CEO of DoCoMo, is saying that their currently strategy is to become Amazon, according to this Sankei Biz article from last week.

WHAAAATTT????

Well, to be sure, DoCoMo has acquired Tower Records several years ago.  Earlier this year, they acquired Radish Boya, an online organic vegetable delivery service.  So... they want to become a company that sells radish?  In the industry that is dominated by this largest and smartest company?  What the xxxx is their competitive advantage as a new entrant??

It is totally impossible to understand if you look at the situation from outside, in management language.  As a former NTT employee, I can sense where it comes from.  NTT often behaves like a family, not a for-profit company or group where people gather to achieve a common goal.  NTT's purpose is often to earn living for the existing employees (=family members), so if the day job is not enough to make ends meet, they have to get double or triple jobs to put food on the table, rather than lay off employees.  In that world, family members are often more important than customers.

I hope it is not the case - that Mr. Kato actually knows what he is doing.  Uggh....

 

 

Memory of the cloud brain – what is cloud storage?

“Big Data and Cloud Storage” Trend 2:  “Big Data and Cloud” Vol. 3

What is Cloud Storage?

Memory of the cloud brain

In my previous article, I wrote that “cloud” is becoming the "brain" of the Internet world and its “thinking” activities correspond to “big data”. This time, I will talk about another brain function “memory”, which is “cloud storage”. The word “STaaS (Storage as a Service)” is used interchangeably.

Dropbox is an easy-to-understand example. To be precise, Dropbox is an end user application and cloud storage is an infrastructure for applications, but consider it as a metaphor to understand its role.

Documents are stored in the Dropbox server in the cloud. It gained popularity as the document sharing tool between the desktop and mobile devices, as a part of the web world transition to "mobile and cloud" era, as I mentioned in the first article. It is also used as a groupware to share files team members, and similar service Box is widely used by enterprise users.

These are particularly storage-centered services, but virtually all web services need storage, such as mail storage in Gmail and photo storage in Facebook

“Kanban sysytem” cloud storage

Cloudian distinguishes Dropbox-like upper later file share as “online storage” and lower layer infrastructure for application as “cloud storage” for app providers. The following discussion is about the latter.

Major players such as Facebook and Google own and operate in-house storage infrastructure. However, many other online service providers strategically choose to outsource it. The major online movie streaming provider Netflix, who owns a huge amount of video and customer data, is a good example of such “cloud storage”.

Specialized consulting firm 451Group forecasts global market of cloud storage grows to $ 6.0 billion in 2015 from $ 1.3 billion in 2011. Majority is the storage-centric services ($750M → $4.7B), with backup and archiving ($550M → $1.3B) consist the rest.

451 Group defines cloud storage with two factors as follow;

1) Storage capacity can be obtained in on-demand basis. 2) Data is in a hosted environment and can be accessed via Internet.

If data amount drastically fluctuates from time to time, it is too expensive to own the storage capacity enough for the peak time, like an empty highway in the countryside. Instead, cloud storage (STaaS) can work as the Kanban system. Among the above two items, (1) is the major characteristic of cloud storage, whereas (2) is also for a traditional hosting service. This Kanban-like scalability is called "scale-out” in the cloud industry.

As mentioned in my last article, Amazon is the giant in this world. There are practically no start-ups inSilicon Valleywho don’t use the Amazon cloud service. Amazon’s cloud storage is ideal for them, as it is hard to predict the capacity requirement over time and the budget is tight.

Amazon customers include some large enterprises like Netflix, as well as those start-ups, and it is the only cloud storage vendor that their annual revenue exceeds $100M. In the 451 report, Amazon owns almost 50% market share, although there is no exact data available at hand. Salesforce.com, Rackspace, Microsoft and HP are followers.

Storage system of Amazon

Amazon’s cloud storage S3 (Simple Storage Service) is a part of Amazon Web Services (AWS). “S3” has becomes de facto standard of cloud storage.

S3 uses the technology called Object Storage, one of the three storage methods:

(1) Block Storage:

Data is cut into a certain size, and mechanically stored as 1s and 0s. It is used in SAN (Storage Area Network) that requires fast access over a very short distance.

(2) File Storage:

A collection of data is stored in file format, carries metadata such as file name and file format, in a hierarchical structure of directory or folder, much like on the PC desktop. It is used in NAS (Network Attached Storage).

(3) Object Storage:

A big chunk of data is packaged like a box, including metadata, which is called an object. Each box is given an OID (Object ID), and all objects are saved in a flat manner.

File storage is easy to understand by analogy with the paper folders, but is inefficient due to several problems. The data access operation requires following the folder structure from the top to the bottom, and needs to go back to the top to move to a different folder. Metadata is located outside of a holder, and concurrent operation is problematic because the name of the upper folder is shared by multiple files

In contrast, with object storage, OID is the only key necessary to access an object, much like pulling out a whole box by looking at a tag attached to it. It is not necessary to go up and down the hierarchy and all metadata is also stored in a box.

Only one object is tied to one OID, so parallel data accessing is easy. This higher efficiency results in lower cost and high scalability, as long as the contents of the box are not changed.

With these characteristics, object storage is a preferred method for cloud storage which requires storing massive static data, such as images, videos and e-mails, and cost efficiency and scale-out ability are quite important.

Challengers

Not many players challenge to the dominance of Amazon at the moment. In theUS, some companies such as Microsoft and HP serves their existing enterprise customers, slightly different customer base. Google is sometimes mentioned as a direct competitor to Amazon, but their target is small and medium-sized customers and their market share is still small. InEurope, LunaCloud has emerged as an Amazon style competitor.

InJapan, Nifty Cloud and Yahoo! Cloud have been providing similar services, and recently NTT Communications entered this field. Please see below for more details.