TJU Alumni Forum 首页 TJU Alumni Forum
天津大学校友论坛
 
 常见问题与解答 (FAQ)常见问题与解答 (FAQ)   搜索搜索   成员列表成员列表   成员组成员组   注册注册 
 个人资料个人资料   登陆查看您的私人留言登陆查看您的私人留言   登陆登陆 

Cloud Storage the ParaScale Way

 
发表新帖   回复帖子    TJU Alumni Forum 首页 -> 学术研究
阅读上一个主题 :: 阅读下一个主题  
作者 留言
admin
Site Admin


注册时间: 2009-12-10
帖子: 269

帖子发表于: 星期五 十二月 11, 2009 2:27 am    发表主题: Cloud Storage the ParaScale Way 引用并回复

By Kevin Davies

December 1, 2009 | Cloud computing and cloud storage are two sides of the same coin,” explains Sajai Krishnan, CEO of Silicon Valley’s venture-funded ParaScale. Cloud computing has had virtualization tools such as VMware and Xen to do most of the heavy lifting, eventually growing until it necessitated a cloud service. “The equivalent to VMware in the cloud storage space is ParaScale,” says Krishnan. Of course, storage competitors including EMC, HP, NetApp, and Nirvanix might dispute that.

ParaScale software “takes you out of the box model of storage, equivalent to the one app/one server or mainframe in a proprietary computing stack, and puts you in the space of harnessing an aggregate pool of commodity storage that is managed easily as if it were one single appliance, with storage capacity being used as needed by different sets of applications,” says Krishnan. “Whether it’s inside Lilly or inside Amazon, that’s just a business model.” ParaScale uses Linux servers, the cheapest enterprise servers. “The cheapest storage you can find is inside a Linux server,” says Krishnan. Aggregating the disks inside a hundred Linux servers into one massive pool of storage, users can “slice and dice” as they see fit.

While VMware saves a lot of money on the compute side, Krishnan argues it significantly increases complexity on the storage side, because users typically need a shared SAN (storage area network) and IT experts who understand not only virtualization but also SAN storage management. “Cloud storage not only addresses the complexity issue but it also addresses the exploding need around managing storage associated with the latest sequencers,” he says, which pump out more than 10 Terabytes/week. “It’s an ungodly amount of data! You have a set of scientists being turned into storage administrators. I know one Ph.D. in the U.K. who is single-handedly managing more than 1 Petabyte data. Somebody should give the guy a medal!”

For storage, users can benefit from having a private cloud, says Krishnan (see, “Stanford Goes Private”). In many cases, the economies of owning it are superior to the economies of renting from Amazon. Whereas many cloud compute users want on-demand flexibility, cloud storage is a more steady-state phenomenon. Cloud computing suits users with dynamic usage patterns, people who might need 1,000 nodes to do some analysis over 3 weeks and then over the next 3 months you don’t. “It’s the perfect situation to go to an Amazon compute cloud,” says Krishnan.

Storage, on the other hand, doesn’t have the same dynamism and tends to steadily grow in capacity. “Without the requirement to handle huge bursts in storage capacity over short periods, the public storage cloud may not provide you that same sort of peak-demand benefit. But the private storage cloud will be the right answer for many companies since it gives you affordability, performance and in-datacenter security. Why should only Amazon have the technology leveraging commodity storage? Why shouldn’t Lilly? Why shouldn’t Home Depot etc. own their own private clouds? Clearly their IT teams have the expertise and scale to manage vast IT installations.”

NAS or SAN
Krishnan says the vendor community is “glomming onto cloud storage in a big way,” but confusing the issue. “There was until recently only one type of cloud storage,” he says. The traditional kind of cloud storage is NAS (network attached storage), or file storage, as exemplified by Amazon S3 and the Google cloud system. But increasingly, people are talking about SAN (block) storage. This terminology is given to the storage directly associated with cloud computing. Companies such as 3PAR, EMC and NetApp provide storage that is directly associated with cloud computing.

SAN-type storage, similar to Amazon EC2, is not large in terms of Terabytes, maybe 100-200 TB. “That’s what my friend in the U.K. chomps through in a couple of months. So that’s not the cloud storage that’s going to solve the data-sequencing problem. That’s where you need the NAS-type cloud storage.”

“We’re just easy to manage, inexpensive file storage,” Krishnan continues. “Wherever someone has NAS, you end up where 90% data, while valuable, is not necessarily the kind that needs to sit on expensive tier 1 storage. Moving that to tier 2 cloud storage saves everyone a ton of money.”

Krishnan says customers have two major choices—ParaScale and EMC Atmos. ParaScale is software only, whereas EMC is both hardware and software. Krishnan points to parallels with VMware. A company such as Eli Lilly, for example, can buy ParaScale’s software and load it onto standard commodity servers. “The beauty of this thing is you’re able to use whatever hardware the companies have. Some are big with Dell, others HP—we’ll go with it. Many bioscience companies have large compute clusters that keep getting refreshed as bioscientists are looking for that edge in performance.” Rather than throw away those older servers, Krishnan says ParaScale can turn them into valuable, scalable storage, simply for the cost of the software.”

“We’re in the business of enabling 100 Amazons... We sell our software to customers who want to get into the business of competing with Amazon.”

One of the virtues of cloud storage is reduced administrative costs. A user might manage a cluster of petabytes with half a FTE. “A ParaScale cloud might be composed of hundreds of servers, but it will feel like one appliance. It’s fully automated, whether servers die or you add new servers. The software has enough redundancy that there’s nothing other than scheduled downtime. Every Friday afternoon, your IT admin gets on his rollerblades and goes around the datacenter. He sees the racks that have red lights—failed discs, servers gone—and goes ahead and replaces them, and puts them back into the cloud. All the while users continue to access their data, as nothing is amiss in their view.”

It’s clear to Krishnan that these are early days. Access to hundreds of servers in a cloud provides huge parallel bandwidth, in addition to mere cost or capacity benefits. “This is something you don’t easily get at Amazon,” argues Krishnan. “This is what you get when you understand the cloud, and you have a private storage cloud.” Krishnan says ParaScale is in the early stages with a couple of big pharma companies, as well as a next-gen sequencing company.

Cloud storage is not necessary for cloud computing, but its good economic practice, says Krishnan. Cloud computing can save money, but the associated SAN storage is expensive. “Now you have a choice,” says Krishnan. “For the few Terabytes you actually need, use a SAN. Many vendors will call that ‘cloud storage,’ but that’s a few TB. For your other file data, be smart about it. Put it on the file type of cloud storage, like Amazon S3. Buying software from ParaScale allows you to do that. Now you can fully realize the savings you thought you were going to have when you went to cloud compute.”

Stanford Goes Private

ParaScale has built a private cloud compute environment for the Stanford Genome Technology Center. Until recently, the Stanford team scrapped next-gen sequencer images, even though archiving allows for re-analysis as new algorithms are developed. Stanford’s private cloud produces an easily managed, scalable storage pool. The sequencers now write directly to the cloud.

The Stanford staff built their cloud using older machines retired from the HPC cluster—a mélange of different hardware, memory and disk sizes. ParaScale’s cloud storage “has the scale and economy to handle our genomic data, and it is easy enough for our research scientists to manage,” says Baback Gharizadeh, research associate at the SGTC.

To add storage, the Stanford staff installs ParaScale software on any commodity hardware. The cloud detects the new machine and adds its storage to the cloud. As the cloud detects this additional storage, it starts replicating data. Using thin provisioning, the Stanford team can create file systems much larger than the physical storage space available in the cloud. Once they reach capacity, the cloud alerts the staff to add more storage nodes to the cloud.
The entire cloud can be administered from a single point of management, regardless of the size of the storage or the number of nodes. The cloud also supports policy-based replication, which ensures that enough copies of the data exist within the storage cloud. Once the analyzed data have been written to the cloud, it can be accessed using a simple web browser and shared with collaborators at other institutions.
返回页首
阅览成员资料 (Profile) 发送私人留言 (PM)
admin
Site Admin


注册时间: 2009-12-10
帖子: 269

帖子发表于: 星期五 十二月 11, 2009 2:30 am    发表主题: Amylin, Amazon, and the Cloud 引用并回复

By Kevin Davies

November 30, 2009 | It was last year when Amylin CIO Steve Phillpott knew he had a space problem. With his two San Diego datacenters perilously close to capacity, housing close to 700 servers and about 200 Terabytes of data, he was contemplating building a third data center in Las Vegas or Phoenix. But the notion of enabling a virtual organization appealed to the IT industry veteran, hence he chose to leverage the cloud.

Unlike most other big pharma companies, which might have one specific application in the cloud, Phillpott and colleagues took a broader view. “We wanted to build capabilities in several different areas,” says Phillpott. As the biotech’s business needs evolve, Phillpott’s staff can decide on the best tool for the particular business need.

Over the past year, Phillpott has built up Virtual Data Center capabilities in four major areas using Amazon EC2 and other providers. First is a pure infrastructure for quick provisioning (adding and subtracting) servers. “We actually have connectivity to Amazon,” says Phillpott. “Amazon is just an extension of my data center here. I can add applications up there.” Phillpott currently has four business apps running on EC2. “The beauty of it is I have secure connectivity so it looks like my data center.”

Second is software development tools, or what Phillpott calls “the Lego building blocks of building an application.” Often he doesn’t want to build an application from scratch, but rather pick apps off the shelf, such as Google Apps and Salesforce.com platform (Force.com).
The third area is support for disaster recovery—building capabilities to maintain images of Amylin’s servers and data using a cloud storage company called Nirvanix. “Now I’ve got a snapshot of every server in the organization stored within Nirvanix’ datacenter, such that if something was to happen to one of those servers, I could quickly rebuild it someplace else, like Amazon!” (Nirvanix competes with Amazon’s S3.)

The fourth area is the actual application of software-as-a-service (SaaS). Phillpott considers what applications are sitting on Amylin’s systems where he doesn’t want to support the hardware and software going forward. “All I want to do is rent to be a more variable model. If I add 100 people, I add 100 people times X dollars a month.” For example, Phillpott is migrating Amylin’s email solution, and considering some other large applications so that they’d be provided by SaaS.

Secure Environment
A lot of companies have shied away from the cloud citing security concerns. Phillpott took the opposite tack: “The cloud’s here, it’s going to be here to stay, and I’m looking for applications that make sense out there.” Not every application fits the cloud, he says, but he’s already proven that a lot of apps do fit. “We’re starting to build maturity, capability, and a knowledge base, knowing that as we build that out, the industry matures and I can then keep moving more and more.”

The first Amylin apps running on EC2 are internal personal apps outside the R&D space. “We wanted to build a very solid security model before we tackled some of the R&D use cases,” says Phillpott. Those early examples include commercial sales (territory management), human resources (performance management), and some security apps. The savings are typically about 50% he says. Speed and performance considerations will inevitably become more important as Amylin runs more R&D apps.

Phillpott provisioned the first server at Amazon personally. It took him all of 15 minutes. “Now I haven’t provisioned a server in 10 years—that’s how easy it is!” he says. He showed his results to Amylin’s director of applications, and from that point on, his team was determined to outperform the CIO.

Lately, for some of the more complex scheduling, Phillpott has been working with Cycle Computing (see p. 2Cool in the research and genomics space. Another partner is a San Diego firm called Cirrhus 9, which has been helping Amylin on a couple of other cloud aspects.

Asked to comment on EC2’s performance, Phillpott says simply, “we’re very pleased. It’s taken us a long time to get to this place, because we wanted a solid security model.” Indeed, Phillpott says the EC2 security is greater than in most corporate environments he’s seen, although it is still improving.

However, Amazon is not the only game in town. “There are lots of companies coming into this space right now, but I don’t think all of them will be around in three years,” says Phillpott. “We want to ensure that, as we’re in the learning stage, we’re learning with someone who we’re confident will be around.” In a few years, once he has more expertise and the market matures, then “you bet,” he’ll entertain other options.

Reframing the IT Mindset
Aside from the obvious security questions, Phillpott says another key issue surrounds the mindset of the IT group, which worried that the cloud would limit or reduce career opportunities. “People at first thought, if we move everything out to the cloud, how will that affect me?” Phillpott recalls. “Over the last year, we’ve really educated them that, ‘No, it changes your skill set, it doesn’t do away with your job. I still need you to do things, it’s just that I need you to do higher-value tasks, focused on innovation and core business competencies, rather than commodity, keeping-the-lights-on activities.’”

Phillpott cites the example of applications being built up on Force.com. “The developers initially thought Force.com would replace the need for internal development. Once they tried building their first Force.com application, they said, ‘Oh, this just makes us more productive.’” Reframing is a real issue, like recoding a workflow application, he says.

Another issue is performance and latency. “For the most part, we’ve been setting up good solid connectivity with Amazon. I haven’t had to send massive datasets, but I’m identifying a couple of other applications that require sending large amounts of data in a couple of months. That will really test the connectivity.”

Like many others, Phillpott is also looking at internal virtualization and private clouds using VMware. This would be a complement to EC2, he says. “My goal is to have capabilities in these different areas, so when the business comes, I can decide: What is the best tool for that particular job? I don’t want to be forced into having a hammer and everything’s a nail—I have EC2 so everything has to fit there. I think we’re in a nice position where we have half-a-dozen different tools, and I can do the best selection based on cost, control, performance and security, as to which is the best fit.”
返回页首
阅览成员资料 (Profile) 发送私人留言 (PM)
从以前的帖子开始显示:   
发表新帖   回复帖子    TJU Alumni Forum 首页 -> 学术研究 论坛时间为 GMT
1页/共1

 
转跳到:  
不能发布新主题
不能在这个论坛回复主题
不能在这个论坛编辑自己的帖子
不能在这个论坛删除自己的帖子
不能在这个论坛发表民意调查


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com

Free Web Hosting | File Hosting | Photo Gallery | Matrimonial


Powered by PhpBB.BizHat.com, setup your forum now!
For Support, visit Forums.BizHat.com