March 4, 2009

IT can't see through the Cloud

I try to stay out of politi-technical discussions, but I have to say my piece on this whole "cloud" thing that everyone seems to be so excited about. In case you have somehow managed to avoid getting roped into these discussions, they are usually centered around the fact that could computing is "new" "cool" and "might save you money..." Unfortunately, this whole concept has created quite a disagreement between IT departments and "everyone else".

"Cloud Computing" is a generic term used to describe "Infrastructure-as-a-Service", such as Amazon Web Services (EC2), Microsoft Azure, Google App Engine, and many others. The basic idea is that you can move your entire network - servers, routers, firewalls, load balancers, all of it - into a virtual "bucket" or buckets and just let it run forever and ever without any attention to the underlying technical stuff. You just develop your application, pay for "server uptime" by the hour, and ignore all of the fine print and the additional charges on your bill. What could possibly go wrong?

First, let me say that I have no vendetta against any cloud company or technology, nor do I object to the idea of reducing IT bandwidth spent on physical hardware replacement, auditing, and configuring new server instances. And I don't think cloud computing is a horrible technology that is bound to fail.

What I do think is that people are extremely confused about how this all works, and cloud vendors are all too willing to say "Sure, we can do that." before consulting the tech team or extensively testing the product/service/code/whatever. Add to that the fact that anyone can run and even distribute code on the cloud, and we're in for a bumpy ride.

My main point is that cloud computing is amazing new technology that works extremely well when it is used for its intended purpose - highly parallel multi-threaded applications, such as video encoding or scientific modeling. Remember: cloud computing was originally just a way to rent CPU time in convenient blocks.

So what's the difference between that and what we are all trying to do now on the cloud? Lots:

  • We screamed so loud for "disks" that Amazon gave us exactly what we wanted - and other companies followed. Think of the difference - if I hack a single thread (even 100 threads) running arbitrary analysis on a protein sequence, or encoding single frames of video, not only will I have an extremely limited and practically useless piece of your data, you will likely catch and auto-correct the problem when you put the responses back together in your own datacenter.
  • Now, however, we are putting our end-to-end request/response cycle entirely in the "cloud" - which seems to me like doing your taxes on Wikipedia just so you don't have to store a copy of the forms. Think about it - what IT principal responsible for the complete, end-to-end cycle of your application would allow 100% of that application to be outside of their control?
  • In the aforementioned CPU-only model, there is very little additional action required to incorporate the process into your existing security infrastructure - your data itself is visible and controlled inside of your datacenter. Worst case - total compromise of EC2 - and all you've lost is some processing time, and you can quickly prevent the spread of damage and move processes to available local nodes.
  • When your whole cycle is in the cloud, there are many potential issues that others have presented adequately (see the included links for extensive treatments of security and stability in the cloud), so I won't rehash them here. The main point is that you are no longer losing CPU cycles in the event of [failure|downtime|hacking|natural disaster|humans] - you are losing data.

Even Amazon's "official position" on security is not convincing. Within their Overview of Security Processes they make several obvious contradictions that will be (have been) duly noted. Within the same page they maintain that essentially [your data is safe, we don't touch it] and [we audit everything, so our data is safe]. Do you see the problem? How do I know what's "my data" and what's "Amazon's data" - the virtual disk? the binary JSON/AJP/AMF3 requests I make between "zones"? And so on...

Bottom line:

Cloud computing is a cool up-and-coming technology, but until these companies provide visibility, control, traceability, and maintainability, (possibly liability? support?) don't bet - or put - the "farm" on this technology.

There may be more to come on this, feel free to ask questions and I'll call Amazon (oh, wait, I can't do that...) or use my hackerly Googling skills to come up with some data.

**Note: I am not singling out Amazon because of any personal or political reason - this is simply the most prominent cloud platform available today. I am hopeful that all of the recent security/stability discussions will result in Amazon fixing these issues and creating the first "IT-department-friendly" cloud platform.


  1. The reason for cloud computing is so that as your' site architecture scales, you don't have to rewrite the code, and invest in expensive infrastructure - and if you want something good, it's gonna be expensive.

    One of my buddies runs a site on his own rented servers. They've scaled up to 1.2 million hits a day. To handle that, they've had to rewrite page of sql code to optimize load times etc. Big job.

    Cloud computing, if done right, allows you to build it easy the first time, and have another companies technology do all of the load distribution, at a nominal cost (comparatively, although it can eat you later).
    Interesting point about security, and keeping much of the secure data housed internally.


  2. I agree that the scalability of the cloud can be a huge advantage, but I'm not sure I see how that sql would have not required rewriting if it was on the cloud. And I agree that there are a limited number of cases where the cloud is appropriate. I'm just not sure how I feel about having only ssh access to my enterprise app, with no auditing, and especially no clear protocol for "incident reponse". People have argued this is what you have on a VPS - but does anyone really host large-scale apps on a VPS?

    As a side note, from my initial calculations and research, a managed server from most major vendors costs the same or a little less than the same amount of CPU/RAM from EC2, and managed servers give you what the cloud currently does not - an extra layer of management, support, and liability.

    On the other hand, if Amazon starts offering EC2 with a bullet-proof SLA, clearly stating the "responsibility zones" of each party (similar to how T1 and other business Internet lines have clear terms as to who's responsible for what and to what extent) they just might have a solid, long-term success on their hands instead of just a "cloud bubble".


Please enter some legible and hopefully relevant text: