[occi-wg] Is OCCI the HTTP of Cloud Computing?
samj at samj.net
Mon May 4 20:33:52 CDT 2009
I'm going to break my own rules about reposting blog posts because this is
very highly relevant, it's 03:30am already and I'm traveling again tomorrow.
The next step for us is to work out what the protocol itself will look like
on the wire, which is something I have been spending a good deal of time
looking at over many months (both analysing existing efforts and thinking of
"blue sky" possibilities).
I am now 100% convinced that the best results are to be had with a variant
of XML over HTTP (as is the case with Amazon, Google, Sun and VMware) and
that while Google's GData is by far the most successful cloud API in terms
of implementations, users, disparate services, etc. Amazon's APIs are (at
least for the foreseeable future) a legal minefield. I'm also very
interested in the direction Sun and VMware are going and have of course been
paying very close attention to existing public clouds like ElasticHosts and
GoGrid (with a view to being essentially backwards compatible and sysadmin
I think the best strategy by a country mile is to standardise OCCI core
protocol following Google's example (e.g. base it on Atom and/or AtomPub
with additional specs for search, caching, etc.), build IaaS extensions in
the spirit of Sun/VMware APIs and support alternative formats including
HTML, JSON and TXT via XML Stylesheets (e.g.
You can see the basics in action thanks to my Google App Engine reference
http://occitest.appspot.com/ (as well as
of same), KISS junkies bearing in mind that this weighs in under
200 lines of python code! Of particular interest is the ease at which
arbitrarily complex [X]HTML interfaces can be built directly on top of OCCI
(optionally rendered from raw XML in the browser itself) and the use of the
hCard microformat <http://microformats.org/> as a simple demonstration of
what is possible.
Anyway, without further ado:
Is OCCI the HTTP of Cloud Computing?
The Web is built on the Hypertext Transfer Protocol
a client-server protocol that simply allows client user agents to retrieve
and manipulate resources stored on a server. It follows that a single
protocol could prove similarly critical for Cloud
but what would that protocol look like?
The first place to look for the answer is limitations in HTTP itself. For a
start the protocol doesn't care about the payload it carries (beyond
media type <http://en.wikipedia.org/wiki/Internet_media_type>, such as
text/html), which doesn't bode well for realising the
vision<http://www.w3.org/2001/sw/Activity.html>of the [
Web<http://en.wikipedia.org/wiki/World_Wide_Web>as a "universal medium
for the exchange of data". Surely it should be
possible to add some structure to that data in the simplest way possible,
without having to resort to carrying complex, opaque file formats (as is the
Ideally any such scaffolding added would be as light as possible, providing
key attributes common to all objects (such as updated time) as well as basic
metadata such as contributors, categories, tags and links to alternative
versions. The entire web is built on hyperlinks so it follows that the
ability to link between resources would be key, and these links should be
flexible such that we can describe relationships in some amount of detail.
The protocol would also be capable of carrying opaque payloads (as HTTP does
today) and for bonus points transparent ones that the server can seamlessly
Like HTTP this protocol would not impose restrictions on the type of data it
could carry but it would be seamlessly (and safely) extensible so as to
support everything from contacts to contracts, biographies to books (or
entire libraries!). Messages should be able to be serialised for storage
and/or queuing as well as signed and/or encrypted to ensure security.
Furthermore, despite significant performance improvements introduced in HTTP
1.1 it would need to be able to stream many (possibly millions) of objects
as efficiently as possible in a single request too. Already we're asking a
lot from something that must be extremely simple and easy to understand.
It doesn't take a rocket scientist to work out that this "new" protocol is
going to be XML based, building on top of HTTP in order to take advantage of
the extensive existing infrastructure. Those of us who know even a little
about XML will be ready to point out that the "X" in XML means "eXtensible"
so we need to be specific as to the schema for this assertion to mean
anything. This is where things get interesting. We could of course go down
the WS-* route and try to write our own but surely someone else has crossed
this bridge before - after all, organising and manipulating objects is one
of the primary tasks for computers.
Who better to turn to for inspiration than a company whose
mission<http://www.google.com/corporate/>it is to "organize the
world's information and make it universally
accessible and useful", Google. They use a single protocol for almost all of
their APIs, GData <http://code.google.com/apis/gdata/>, and while people
don't bother to look under the hood (no doubt thanks to the myriad client
libraries <http://code.google.com/apis/gdata/clientlibs.html> made available
under the permissive Apache 2.0 license), when you do you may be surprised
at what you find: everything from contacts to calendar items, and pictures
to videos is a feed (with some extensions for things like
Enter the OGF's Open Cloud Computing Interface
(OCCI)<http://www.occi-wg.org/>whose (initial) goal it is to provide
an extensible interface to Cloud
Infrastructure Services (IaaS). To do so it needs to allow clients to
enumerate and manipulate an arbitrary number of server side "resources"
(from one to many millions) all via a single entry point. These compute,
network and storage resources need to be able to be created, retrieved,
updated and deleted (CRUD) and links need to be able to be formed between
them (e.g. virtual machines linking to storage devices and network
interfaces). It is also necessary to manage state (start, stop, restart) and
retrieve performance and billing information, among other things.
The OCCI working group basically has two options now in order to deliver an
implementable draft this month as promised: follow Amazon or follow Google
(the whole while keeping an eye on other players including Sun and VMware).
Amazon use a simple but sprawling XML based API with a PHP style flat
namespace and while there is growing momentum around it, it's not without
its problems. Not only do I have my doubts about its scalability outside of
a public cloud environment (calls like 'DescribeImages' would certainly
choke with anything more than a modest number of objects and we're talking
about potentially millions) but there are a raft of intellectual property
issues as well:
- *Copyrights* (specifically section 3.3 of the Amazon Software
prevent the use of Amazon's "open source" clients with anything other than
Amazon's own services.
- *Patents* pending like
the Amazon Web Services APIs and we know that Amazon have been known
to use patents
- *Trademarks* like
us from even referring to the Amazon APIs by name.
While I wish the guys at Eucalyptus <http://open.eucalyptus.com/> and
Canonical <http://news.zdnet.com/2100-9595_22-292296.html> well and don't
have a bad word to say about Amazon Web Services, this is something I would
be bearing in mind while actively seeking alternatives, especially as Amazon
haven't worked out<http://www.tbray.org/ongoing/When/200x/2009/01/20/Cloud-Interop>whether
the interfaces are IP they should protect. Even if these issues were
resolved via royalty free licensing it would be very hard as a single vendor
to compete with truly open standards (RFC 4287: Atom Syndication
5023: Atom Publishing Protocol <http://tools.ietf.org/html/rfc5023>) which
were developed at IETF by the community based on loose consensus and running
So what does all this have to do with an API for Cloud Infrastructure
Services (IaaS)? In order to facilitate future extension my initial designs
for OCCI have been as modular as possible. In fact the core protocol is
completely generic, describing how to connect to a single entry point,
authenticate, search, create, retrieve, update and delete resources, etc.
all using existing standards including HTTP, TLS, OAuth and Atom. On top of
this are extensions for compute, network and storage resources as well as
state control (start, stop, restart), billing, performance, etc. in much the
same way as Google have extensions for different data types (e.g. contacts
vs YouTube movies).
Simply by standardising at this level OCCI may well become the HTTP of Cloud
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the occi-wg