[occi-wg] Syntax of OCCI API

Sam Johnston samj at samj.net
Fri Apr 17 03:08:17 CDT 2009


Just when you thought you had enough mail from me already it seems I missed

On Thu, Apr 16, 2009 at 6:05 PM, Richard Davies <
richard.davies at elastichosts.com> wrote:

> Sam Johnston wrote:
> > Here's a first pass at flattening the Atom into INI file format
> > (basically what you had but with "=" for human & computer readability):
> Great stuff - I think this is a big step forward to be able to express
> everything as a simple list of objects, each specified by simple key-value
> pairs. Hopefully we can also similarly add a JSON version using the same
> simple data structures, e.g.:
> {"category":"server", "title":"Debian...", "mc.state":"running", ... }

JSON/YAML's on my todo list for this morning.

> I've got two specific comments on the example you give:
> 1) I'm not sure INI format is actually the best text format for key-value.
> I'd prefer something easier to parse from Unix shell, which is where I
> imagine most simple scripts will be written. ElasticHosts went with
>  "key" (without spaces), <space>, "value" (any characters including spaces)
> since this can be parsed with
>  cat file | while read key value ; do ... ; done

I've found the tinydns-data
<http://cr.yp.to/djbdns/tinydns-data.html>format a pleasure to work
with as well, but in any case INI files are
simple, standard across platforms, well defined, etc.

You can parse them in shell like

[ -z "$1" ] || [ -z "$2" ] && exit 1
sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
    -e 's/;.*$//' \
    -e 's/[[:space:]]*$//' \
    -e 's/^[[:space:]]*//' \
    -e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \
   < $1 \
   | sed -n -e "/^\[$2\]/,/^\s*\[/{/^[^;].*\=.*/p;}"

For python you have
PHP has parse_ini_file <http://fr3.php.net/parse_ini_file>, Perl (as per
usual) has a dozen or so
libini <http://sourceforge.net/projects/libini/> and its ilk.

We need a way to group lines together:

   - INI style headers (e.g  [decca5a5-8952-4004-9793-cdbbf05c3c63])
   - ID prefixes (e.g.
   decca5a5-8952-4004-9793-cdbbf05c3c63.content.cpu.cores = 2)
   - Blank line separators, with ID specified as an attribute (e.g. id =

Except in the case where you retrieve a single object this is always going
to add parsing complexity... but perhaps it's worth it just for the (common)
case of dealing with a single object.

2) Going through the keys and values in detail:
> >  [decca5a5-8952-4004-9793-cdbbf05c3c63]
> I like UUIDs and ElasticHosts also uses them, but I might loosen the
> requirement to any unique string of hex and dashes (since other vendors may
> prefer to number sequentially, etc.)

There's that "enough rope" problem again, and the alias option discussed
elsewhere. Another (significant) bonus is that they allow you to migrate
resources, collections or even merge entire clouds without re-mapping,
breaking any object references, etc. There really is huge value here.

> >  category = server
> >  title = Debian GNU/Linux 5.0 Virtual Appliance
> >  summary = Base installation of Debian GNU/Linux 5.0
> Do we need both a title ('name' with ElasticHosts at present) and a summary
> or can we just have one of these?

Most collections tend to have an official title and an additional (optional)
explanation. If you don't use it then that's fine too (actually the
title/summary terminology comes from Atom).

> >  content.cpu = 2
> >  content.memory = 4Gb
> We need to agree units here! Presumably memory would be specified in 'GB'
> or
> alternatively 'MB', 'kB' or nothing. Is CPU the speed quota or the number
> of
> virtual cores? I recommend cores=<integer> and an additional key for speed
> quota (ElasticHosts uses cpu=<total MHz to divide across all cores>)

Sure, or we just say everything's in bytes/megahertz/etc. and worry about
how to render it in the UI (where it arguably belongs). Internally I'd say
we should deal with raw numbers (that's how it will be represented in
databases anyway) and do the mapping as close as possible to the surface.
Defining units is (probably) acceptable though... assuming there's not a
standard for this we can refer to (surely there is somewhere).

> Can we cut the namespace and just write:
> cores = 2
> cpu = 4000MHz
> mem = 4GB

Dispensing with ambiguous terminology is a good idea, but the namespaces are
actually quite important for e.g. extensibility.

> >  link.disk[0].id = 4696b561-a253-42b4-bd27-7aa4950e0a60
> >  link.disk[0].dev = sda
> >  link.network[0].id = 45a73b80-c957-4ae1-97c6-b70652eba1d1
> >  link.network[0].dev = eth0
> This is good - a mapping between hardware devices and uuids of the storage
> or network objects.
> We don't need the [0] indices, since the 'dev' specifiers are already fully
> unique. Taking those out and cutting the namespace gives something like:
> disk.sda = 4696b561-a253-42b4-bd27-7aa4950e0a60
> network.eth0 = 45a73b80-c957-4ae1-97c6-b70652eba1d1

Good point and nice optimisation, but what if we want to capture other
information like "starting state = disconnected" etc?

> >  mc.state = RUNNING
> >  br.meter.rate = 0.10
> >  br.meter.currency = USD
> >  br.meter.unit = hours
> >  br.meter.total = 35.27
> >  pm.monitor.cpu = 75.2
> >  pm.monitor.mem = 1059374258
> All look reasonable, but again I would cut the namespaces:
> state = RUNNING
> br.rate = 0.10
> br.currency = SD
> br.unit = hours
> br.total = 35.27
> pm.cpu = 75.2
> pm.mem = 1059374258

Sure, namespaces within extensions can be safely dropped. Top level
namespaces less so.

> >  mc.ops.start =
> http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/start
> >  mc.ops.stop =
> http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/stop
> >  mc.ops.restart =
> http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/restart
> >  mc.ops.suspend =
> http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/suspend
> Do we need these at all? Surely these will always be the operations which
> are possible on a RUNNING server, and so can always be constructed based on
> the UUID.

HATEOAS <http://www.stucharlton.com/blog/archives/000141.html> is a carry
over from the Sun Cloud API (as explained by Sun
I like it because from the single entry point you can obtain every URL you
should ever need to use, and those that you can't you don't even see (e.g.
because you can't "start" an abstract template, or simply because as a
disaster recovery operator you're only allowed to start but not stop

If you don't like these you can always ignore them, but your users will
probably get bored of receiving errors when they try to conduct invalid

> Also, why have 'ops' in the URLs? Why not just
> http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/start

Interesting question. This was another carry over from Sun but a better
approach is to leave it to the extension:


The question is, are you starting the machine? Its firewall? Billing?
Backup? Failover? Disaster recovery?

>  [4696b561-a253-42b4-bd27-7aa4950e0a60]
> I guess storage needs a 'title' (or 'name') too?

You're probably right... these are common for all resources.

> >  category = storage
> >  content.size = 148251374
> Why not just 'size'?

The "content" namespace is from Atom... it serves to bundle the "payload" of
the resource together without interfering with other elements of it. OVF
could well have a "title" for example, and what if your attribute clashes
with the name of an extension? Let's try to keep the core nice and clean.

> >  link.self = virtual-disk.vmdk
> Not sure what this is?

It's a link to itself (e.g. a storage resource pointing at its VMDK). I'd
suggest a pass over Atom (RFC 4287 <http://tools.ietf.org/html/rfc4287>) to
see how links work (and how flexible they are).

> >  [45a73b80-c957-4ae1-97c6-b70652eba1d1]
> Again, maybe a 'name'?

No problem.

> >  category = network
> >  content.vlan = 4095
> >  content.dhcp = true
> >  content.subnet =
> >  content.netmask =
> >  content.gateway =
> Once again, I'd take the 'content' prefix off all of these.

See above... we need to work out how/if this can be done safely (and whether
it's worth doing).

> The keys you list here work when the network interface is on a private
> but are the wrong set when it is on the public internet.

It's just an example, but I do wonder how much detail we're going to want to
get into here. We should probably support arbitrary attributes for whatever
cruft the network guys want to carry (e.g. frame sizes, etc.) but treat it
as opaque for now.

> On the public internet, the cloud vendor, not the user, defines most of
> these parameters and need to be able to control the customer VM from
> "stealing" IPs from other customers.
> The customer has access to a defined set of static IPs which they have
> purchased or alternatively a free dynamic IP assigned at boot, and all they
> should be able to specify is which of these they want on this particular
> interface, and whether they want to receive a DHCP for it.
> For instance, ElasticHosts currently specifies as:
> ip = <specified static IP address or 'auto' to assign dynamically at boot>
> dhcp = <ip address to send by dhcp or 'auto'; no dhcp if not present>
> Given that the customer will have a set of static IPs which they have
> purchased (common concept across Amazon, ElasticHosts, GoGrid, etc.), the
> API also needs an ability for them to list what these are!

I would suggest that these be advertised in the "network" resources so the
customer can choose one that's already allocated (assuming they don't just
rely on DHCP for this).

Another interesting use case incidentally is that of machines doing
introspection - a machine (authenticated by IP?) should be able to hit OCCI
for information about itself (such as its name? IP address? SSH keys?
application configuration?). Even basic attribute-value pairs being settable
via management interfaces would be incredibly powerful (and we get this for
free already).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/occi-wg/attachments/20090417/031fb516/attachment.html 

More information about the occi-wg mailing list