Skydive : a real-time network analyzer

SDN solutions are complex and troubleshooting/monitoring them is even harder. It seems that while we have a better way to automate the network we lose visibility and operability. For example, in order to troubleshoot an issue you have of to understand the network in general but also to have a deep understanding of how the SDN solution is implementing the network. And if you have multiple SDN solutions deployed – with maybe nested SDN solutions like container network in VMs – finding the root cause of an issue starts to be really hard.
In this context I will introduce a new project we started few month ago at Red Hat and which aims to bring back visibility and operability to such environment. In this first post I will not describe in details how it works, I will explain the global design and how to setup a lab environment.

Skydive project

What is Skydive ? Skydive is a project that aims to collect, store and analyze the state of a network infrastructure and the flows going through this infrastructure. Skydive is SDN-agnostic which means it doesn’t rely on any SDN solution but provides a way to gather informations from SDN controllers.
This was for the definition, now let’s see how it works.
Skydive is composed of two components:

Skydive agents which collect local topology informations (interfaces, bridges, …), and capture traffic locally.
Skydive analyzers which collect and aggregate topology and flows from the agents. The analyzers can leverage information from SDN controllers such as Neutron or any other SDN solution.

Use cases

Having these data collected and aggregated in one point allows us to :

Find where packets are dropped
What kind of packet lead in issues
Find the congestion points : bandwidth, number of sessions, etc.
Get latency, RTT metrics
Metrics helping in capacity planning, billing
etc…

The lab !

For this lab I will explain how to deploy an all-in-one node, so Agent + Analyzer. It will be easy to start an other Agent later.
Let’s begin by installing the Analyzer. Currently Skydive relies on ElasticSearch as a Data store, thus it needs to be deployed before.
Skydive makes use of Openvswitch and SFlow for the flow capture, a limitation that we will removed soon but for this lab we need to have an up and running OVS with an ovsdb listening on a TCP port.

$ sudo ovs-appctl -t ovsdb-server \
  ovsdb-server/add-remote ptcp:6400

Once you have your ElasticSearch and OpenvSwitch up and running, you are ready to download the Skydive binary :

$ wget https://github.com/redhat-cip/skydive/releases/download/v0.2.0/skydive
$ chmod +x skydive

You will just need a little config file to specify to the agents where the Analyzer is running and the interface listening and which probes that will be used :
skydive.yml

analyzer:
  listen: 0.0.0.0:8082
agent:
  listen: 0.0.0.0:8081
  analyzers: 127.0.0.1:8082
  topology:
    probes:
      - netns
      - netlink
      - ovsdb
  flow:
    probes:
      - ovssflow

$ ./skydive analyzer -c skydive.yml

$ sudo ./skydive agent -c skydive.yml

Once you have the skydive components started, you can check that the API are responding. Since the Agent and the Analyzer offer the same API and WebUI, you can check on both.

$ curl http://localhost:8081/rpc/topology
$ curl http://localhost:8082/rpc/topology

or checking the WebUI
http://localhost:8081/
http://localhost:8082/

Within the Skydive repository there is a script that can be used in order to build a test topology : two namespaces connected by an OpenvSwitch bridge.

$ wget https://raw.githubusercontent.com/redhat-cip/skydive/master/scripts/simple.sh
$ chmod +x simple.sh
$ ./simple.sh start 192.168.0.1/24 192.168.0.2/24

Once this script is executed, the WebUI should look like this :

In order to fill the ElasticSearch a bit and getting some flows on the WebUI, we can do the classical ping test between the two namespaces created by the script. I let the more audacious of you testing with netcat and beyond.

$ sudo ip netns exec vm1 ping 192.168.0.2

On the WebUI moving the mouse pointer over the bridge named “br-int” shows the flows going through this bridge.
By checking the Skydive flows API we can see the flows that were captured, the interfaces involved and where they have been captured :

$ curl http://localhost:8082/rpc/flows

The Conversion view in the WebUI allows you to see the flows between two interfaces based on the layer (Ethernet, IPV4, etc..).

Multi-Nodes lab

Deploying a multi node lab is quite easy, we just need to start another Agent after having changed the Analyzer address in the configuration file. There is another script within the Skydive repository to create two namespaces connected through a GRE tunnel.

$ wget https://raw.githubusercontent.com/redhat-cip/skydive/master/scripts/multinode.sh
$ chmod +x multinode.sh

On the first node :

$ ./multinode.sh start 192.168.0.1/24 <tunnel endpoint IP>

On the second node :

$ ./multinode.sh start 192.168.0.2/24 <tunnel endpoint IP>

Below, a screenshot of the WebUI in a multi-node setup, with the flow captured while doing a ping between namespaces on both nodes.

Conclusion and Roadmap

While the project is young, Skydive can already be deployed for testing purposes. We will add more Flow probes in the next weeks in order to be able to capture traffic outside of OpenvSwitch. Neutron and Docker connectors are already available, and more external connectors will be added in order to qualify topology and flows with extra informations.
I couldn’t finish the post without saying that Skydive is an Open Source project under the Apache licence written in Go and is open to Contributions.
We are hosted on SofwareFactory leveraging the Gerrit contribution model made popular by OpenStack and we have a mirror on Github. Patchsets welcome!

11 Responses

Yatin Kumbhare 2016-02-25 at 08:36 | Permalink

Great post!
Skydive, un nouvel outil d’analyse de votre réseau – My Tiny Tools 2016-03-26 at 16:55 | Permalink

[…] lien n°2 : Blog post […]
nandakumar 2016-04-07 at 00:48 | Permalink

I could bring up the GUI by following the steps mentioned in the blog. But for some reason I am unable to see any flows or any information. It is completely empty, can you give some references where they discuss about issue/errors.
Thanks
1. Sylvain Afchain 2016-04-07 at 09:51 | Permalink
  
  @nandakumar we did a lot of changes on the project since we published this post. The flow captures are now optional and are managed by the Skydive client. Please check the README is not already done. For further discussion about your issue you can join us on IRC(freenode) #skydive-project or fill a bug on Github. We will publish an update soon.
  Thanks.
  1. nandakumar 2016-04-08 at 08:21 | Permalink
    
    Thanks
Sabrina 2016-05-05 at 12:31 | Permalink

Hi great post!.
Could you tell me if it is possible for Skydive to work with InfluxDb instead of ElasticSearch?. Besides i tried to run it in a 32 bit environment and it wasnt possible, are there any releases for 32 bits supported?.
Thanks in advance.
New Security Tools Feb 2016 2016-07-04 at 12:25 | Permalink

[…] Skydive : a real-time network analyzer – Read more […]
ali anwar 2017-04-30 at 06:48 | Permalink

do we need any Dependencies software…?
for skydive.
1. Sylvain Afchain 2017-05-02 at 09:43 | Permalink
  
  There is no dependencies, but if you want to use the history capability of Skydive you will need a database either Elasticsearch or OrientDB. The history capability allows you to travel in time being able to see what was flows/topology at any point of the time.
S.Zahiri 2017-07-15 at 11:31 | Permalink

wget https://github.com/redhat-cip/skydive/releases/download/v0.2.0/skydive
results ERROR 404: Not Found.
what is the problem?
1. Frederic Lepied 2017-07-31 at 13:44 | Permalink
  
  Repository has moved to https://github.com/skydive-project/skydive/ and you can find the releases under https://github.com/skydive-project/skydive/releases