The Ceilometer project was started in 2012 with one simple goal in mind: to provide an infrastructure to collect any information needed within all OpenStack projects so that rating engines could use this single point to transform events into bill items, which we tagged “metering“.
As the project started to come to life, collecting an increasing number of metrics across multiple projects, the OpenStack community started to realize that a secondary goal could be added to Ceilometer: become a standard way to collect metrics for various uses with different requirements in terms of frequency, security and transport mechanism. For example, Ceilometer can now publish information for monitoring, debugging or graphing tools in addition and in parallel to the metering backend. We tagged this effort as “multi publisher“.
Last so far, but not least in any way, as the Heat project started to come to life, it soon became clear that the OpenStack project needed a tools to watch for some variations in key values in order to trigger various sets of actions. As Ceilometer already had the tooling to collect vast quantity of data, it seemed logical to add this as an extension of the Ceilometer project, which we tagged as “alarming“.
How is data collected?
In a perfect world, each and every project that you want to instrument should be sending events on the Oslo bus about anything that could be of interest for you, which Ceilometer would transform into samples. Unfortunately, not all projects have implemented this and you will often need to instrument as well other tools which may not use the same bus as OpenStack has defined. To circumvent this, the Ceilometer project created 3 independent methods to collect data:
- The bus listener agent which takes events generated on the Oslo notification bus and transforms them into Ceilometer sample. Again this is our preferred method of data collection. If you are working on some OpenStack related project and are using the Oslo library, you are kindly invited to come and talk to one of the project members to learn how you could quickly add instrumentation for your project.
- Push agents which is the only solution to fetch data within projects which do not expose the required data in a remotely useable way. This is not our preferred method as it makes deployment a bit more complex having to add a component to each of the nodes that need to be monitored. However, we do prefer this compared to a polling agent method as resilience (high availability) will not be a problem with this method.
- Polling agents which is our least preferred method, that will poll some API or other tool to collect information at regular interval. The main reason why we do not like this method is the inherent difficulty to make such a component be resilient.
How to access collected data?
Once collected the data is stored in a database. There can be multiple types of databases through the use of different database plugins (see the section “which database to use” later in this document). Moreover, the schema and dictionary of this database can also evolve over time. For both reasons, we offer a REST API that should be the only for you to access the collected data instead of requesting directly to the underlying database. It is possible that the way you’d like to access your data is not yet supported by the API. If you think this is the case, please contact us with your feedback as this will certainly lead us to improve the API.
The list of currently built in meter is available in the developer documentation, and it is also relatively easy to add your own (and eventually contribute it). If you would like to use Ceilometer to collect information for other components which do not use Keystone to reference user, the sample record fields definition allows to create new meters which can create samples with a different “source” in the sample entry. This allows to point to different sources of user reference. This means that you can write your own meters plugins for application running on top of OpenStack, such as a PaaS or a set of SaaS application that would not use keystone but another mechanism as authentication and authorization system.
Moreover end users can also send their own application centric data into the database through the REST API for a various set of use cases (see the section “Alarming” later in this article).
If you divide a billing process in a three steps process as in commonly done in the telco industry, the 3 steps are usually defined as follow:
- Metering: this is the process of collecting information about what, who, when and how much regarding anything that can be billed. The result of this is a collection of “tickets” (a.k.a. samples) which are ready to be processed in anyway you want.
- Rating: this is the process of analysing a series of tickets, according to business rules defined by marketing, in order to transform them into bill line items with a currency value.
- Billing: this is the process to assemble bill line items into a single per customer bill, emitting the bill to start the payment collection.
Clearly, Ceilometer’s initial goal was, and still is, strictly limited to step one. This is a choice made from the beginning not to go into rating or billing, as the variety of possibilities seemed too huge for the project to ever deliver a solution that would fit everyone’s needs, from private to public clouds. This means that if you are looking at this project to solve your billing needs, this is the right way to go, but certainly not the end of the road for you, as once Ceilometer will be in place on your OpenStack deployment, you will still have quite a few things to do before you can produce a bill for your customer, starting with finding the right queries within the Ceilometer API to extract the information you need for your very own rating engine.
You can of course use the same API to feed other needs, such as a data mining solution to help you identify unexpected or new usage types, or a capacity planning solution. In general, people like to download the data from the API in order to work on it in a separate database to avoid overloading the one which should be dedicated to storing tickets. It is also often found that the Ceilometer metering DB only keeps a couple month worth of data while data is regularly offloaded into a long term store connected to the billing system, but this is fully left up to the implementor.
Note that we do not guarantee that we won’t change the DB schema, so it is highly recommended to access the database through the API and not using direct queries.
Publishing meters for different uses is actually a two dimensional problem. The first variable is the frequency of publication. Typically a meter that you publish for billing need will need to be updated every 30 min while the same meter needed for performance tuning may be needed every 10 seconds.
The second variable is the transport. In the case of data intended for a monitoring system, losing an update or not ensuring security (non repudiability) of a message is not really a problem while the same meter will need both security and guaranteed delivery in the case of data intended for rating and billing systems.
To solve this, the notion of multi-publisher can now be configured for each meter within Ceilometer, allowing the same technical meter to be published multiple times to multiple destination each potentially using a different transport and frequency of publication. At the time of writing this, two transports have been implemented so far: the original and relatively secure Oslo RPC queue based, and an UDP one.
The Alarming component of Ceilometer, which is being delivered in the Havana version, allows to set alarms based on threshold evaluation for a collection of samples. An alarm can be set on a single meter, or on a combination. For example, you may want to trigger an alarm when the memory consumption reached 70% on a given instance if the instance has been up for more than 10 min. To setup an alarm, you will call Ceilometer’s API server specifying the alarm conditions and an action to take.
Of course, if you are not administrator of the cloud itself, you can only set alarms on meters for your own components. Good news, you can also send your own meters from within your instances, meaning that you can trigger alarms based on application centric data.
Their can be multiple form of actions, but two have been implemented so far: http call back: you provide a URL to be called whenever the alarm has been set off. The payload of the request contains all the details of why the alarm went off. log: mostly useful for debugging, stores alarms in a log file.
For more details on this, I recommend you to read the blog post by my colleague Mehdi Abaakouk “Autoscaling with Heat and Ceilometer”. A particular attention should be given to the section “Some notes about deploying alarming” as the database setup (using a separate database from the one used for metering) will be critical in all cases of production deployment.
Which database to use
Since the beginning of the project, a plugin model has been put in place to allow various types of database backends for Ceilometer. However, not all implementations are equal and, at the time of writing this, MongoDB is still the recommended because it is the most tested one. Have a look at the “choosing a database backend” section of the documentation for more details. In short, please make sure you set up this project to use a dedicated database, as the volume of data generated by Ceilometer can be huge in production environment and will generally use a lot of IO.
The idea of this article came from numerous conversations we had with Frederic Faure (@fredericfaure works at Ysance, one of our partners on the CloudWatt project). Frederic also helped a lot in the proof reading and clarification of the article, together with Julien Danjou (@juldanjou, one of the great developers working for eNovance and the current Project Technical Lead of the Ceilometer project within OpenStack). Most of the diagrams were derived from an earlier presentation that Julien created for the Havana summit.