Cisco IOS-XR Streaming Telemetry with Telegraf

Published on 2021-05-18.

Introduction

For various reasons I want to monitor the amount of RPKI ROAs a certain amount of routers are receiving from two running RPKI validators. This data is not - as far as I know - available via SNMP but has to be streamed using streaming telemetry. This means that the router is pushing data towards a collector, instead of the collector pulling data from the router.

Cisco has some information about the subject, albeit outdated. The pipeline code is not maintained anymore but instead Telegraf can be used.

In the end Telegraf will take the incoming telemetry data and output it in a Prometheus format that can be scraped and visualized in for instance Grafana. I will not cover the scraping and visualizing part, there are already tons of information on that online.

The routers are all running IOS-XR 6.5.3, both 32- and 64-bit versions.

Install and configure Telegraf

Download and install Telegraf using the instructions on the site.

Open up /etc/telegraf/telegraf.conf and take a look around. If you haven't used Telegraf before it's basically a massive plugin system. There are tons of inputs and outputs that can accept data and output it in different formats. In this case I will be using the cisco_telemetry_mdt input plugin to accept the telemetry data from the routers and the prometheus_client output plugin to have the data outputted in a Prometheus format.

To enable the two mentioned plugins just find them in the configuration file and remove the comments. Make sure to change the transport protocol from grpc to tcp:

[[inputs.cisco_telemetry_mdt]]
 ## Telemetry transport can be "tcp" or "grpc".  TLS is only supported when
 ## using the grpc transport.
 transport = "tcp"

 ## Address and port to host telemetry listener
 service_address = ":57000"

[[outputs.prometheus_client]]
  ## Address to listen on
  listen = ":9273"

Run systemd restart telegraf and make sure it's running (ps aux and check /var/log/syslog for errors).

Configure the router

The configuration is made of three parts. First one or more destination-groups are configured. These are the server(s) that will receive the data from the device. Then one or more sensor-groups define what data will actually be collected. Finally a subscription is made. This defines what destinations will receive what data.

A simple configuration can look like this:

RP/0/RSP1/CPU0:Router#show run telemetry model-driven
telemetry model-driven
 max-containers-per-path 0
 destination-group telegraf
  address-family ipv4 192.0.2.10 port 57000
   encoding self-describing-gpb
   protocol tcp
  !
 !
 sensor-group rpki
  sensor-path Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-summary
  sensor-path Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-server-list/rpki-server
 !
 subscription telemetry
  sensor-group-id rpki strict-timer
  sensor-group-id rpki sample-interval 60000
  destination-id telegraf
  source-interface Loopback0
 !
!

Finding the sensor-paths

"The sensor path describes a YANG path or a subset of data definitions in a YANG model with a container."

-- Cisco documentation

That's quite a mouthful. I'm not going to get into what YANG is, but for our purpose we need to find the path containing the data we wish to receive and configure it as a sensor-path. I don't have an exact method to it, I cloned the YangModels GitHub repo and started searching. Eventually I found the Cisco-IOS-XR-ipv4-bgp-oper.yang file which contains the RPKI-SUMMARY and RPKI-SERVER-LIST groupings. From here I did some testing, based on other configuration examples from the Cisco documentation and plain trial and error.

One command that's helpful in verifying the path is mdt_exec. When executed on the device a proper path will output data while a not proper path will output nothing.

Unfortunately I don't have a better way of finding and testing the paths.

Verify the setup

RP/0/RSP1/CPU0:Router#show telemetry model-driven sensor-group
  Sensor Group Id:rpki
    Sensor Path:        Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-summary
    Sensor Path State:  Resolved
    Sensor Path:        Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-server-list/rpki-server
    Sensor Path State:  Resolved

RP/0/RSP1/CPU0:Router#show telemetry model-driven subscription
Subscription:  telemetry                State: ACTIVE
-------------
  Sensor groups:
  Id                               Interval(ms)        State
  rpki                             60000               Resolved

  Destination Groups:
  Id                 Encoding            Transport   State   Port    Vrf     IP
  telegraf           self-describing-gpb tcp         Active  57000           192.0.2.10
    No TLS

The Sensor Path State being Resolved means the device has successfully found the YANG sensor path and data can be read, while the destination group state being Active means the device has setup a connection to Telegraf and data is sent.

After a minute or two the Telegraf Prometheus output plugin will show the gathered data. It can be verfied by curl http://localhost:9273/metrics.

Tweak the Telegraf config

If you have two or more RPKI validators configured on the router then you should look closely at the Telegraf output. Chances are you will only see one server and not the others. I'm not sure why this is happening but I think it's because Telegraf isn't realizing there are two or more entries in the data structure, so it gets overwritten.

We can solve this by adding the following to the cisco_telemetry_mdt plugin configuration:

embedded_tags = ["Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-server-list/rpki-server/name"]

We can also add the following to shorten the Prometheus output:

[inputs.cisco_telemetry_mdt.aliases]
  server = "Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-server-list"
  summary = "Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/rpki-summary"

Now you should have Telegraf up and running, and you should be able to create dashboards in Grafana to look at the pretty graphs. If you want to add more telemetry just find the correct sensor-path and add it in the router configuration. Telegraf will take the data and output it with no extra configuration necessary.