A Journey Into BGP-ORR
Published on .
Ever since I first heard of BGP-ORR (or BGP Optimal Route Reflection, RFC9107) some years ago I’ve been nerdingly excited about it. Recently at my job I looked at the possibility of moving from in-path BGP route reflectors (which is in the forwarding path) to out-of-path route reflectors (which is not in the forwarding path, allowing for a BGP free core) and figured BGP-ORR would be a good way of accomplishing a more centralized route reflector setup.
While there are a lot of good resources available about what BGP-ORR does and how it works I still had some questions. So, I brought up a small lab environment using Cisco CML and some Cisco XRv devices. I know Arista EOS also has support for BGP-ORR and I assume JunOS also has it, and as I write this there’s an open issue for FRRouting.
This is the layout of the lab network, not conceptually far away from how parts of the network I run at my job is built:
- p1 and p2 are label routers, no BGP.
- cr1 and cr2 are core routers. They run BGP, advertise a default route and have a full BGP table (well, not in this lab setup… :)).
- pe1, pe2 and pe3 are edge routers. They connect customers and have a subset of the internal BGP table + a default route.
- rr is the route reflector. It has BGP sessions towards every other device.
All routers are running IS-IS as the IGP, with Segment Routing (not SRv6) configured.
I’m not going to show the full configuration from each and every router (you can find that here), but instead just show a small and relevant subset to get the network running. This is all before BGP-ORR has been configured.
Interfaces
Nothing fancy, just a Loopback interface and unnumbered links between all routers so I don’t have to care about linknets. CDP for making sure I don’t mess up what ports go where :)
interface Loopback0
ipv4 address 10.13.37.x 255.255.255.255
!
interface GigabitEthernet0/0/0/0
description Link-xxxx
cdp
ipv4 point-to-point
ipv4 unnumbered Loopback0
no shutdown
!
IS-IS and Segment Routing
net
and prefix-sid
are based on the loopback address. router-id
is not strictly necessary for getting IS-IS to run but is needed later on. I put a default metric cost of 10 on all links just to make it easier for myself.
router isis LAB
is-type level-2-only
net 49.000.0100.1303.700x.00
address-family ipv4 unicast
metric-style wide
metric 10
router-id Loopback0
segment-routing mpls
!
interface Loopback0
passive
address-family ipv4 unicast
prefix-sid index x
!
!
interface GigabitEthernet0/0/0/0
point-to-point
address-family ipv4 unicast
!
!
interface GigabitEthernet0/0/0/1
point-to-point
address-family ipv4 unicast
!
!
!
segment-routing
global-block 16000 23999
!
The segment routing global-block is the standard Cisco block.
BGP (core)
A static route is created and advertised for the default route.
router static
address-family ipv4 unicast
0.0.0.0/0 Null0
!
!
router bgp 65534
bgp router-id 10.13.37.x
default-information originate
address-family ipv4 unicast
redistribute static
!
session-group iBGP
remote-as 65534
timers 10 32
update-source Loopback0
!
neighbor 10.13.37.255
use session-group iBGP
description Route Reflector
address-family ipv4 unicast
!
!
!
BGP (PE)
router bgp 65534
bgp router-id 10.13.37.x
address-family ipv4 unicast
!
session-group iBGP
remote-as 65534
timers 10 32
update-source Loopback0
!
neighbor 10.13.37.255
use session-group iBGP
description Route Reflector
address-family ipv4 unicast
!
!
!
BGP (RR)
router bgp 65534
bgp router-id 10.13.37.255
address-family ipv4 unicast
!
session-group iBGP
remote-as 65534
timers 10 32
update-source Loopback0
!
neighbor-group RR-CLIENTS
use session-group iBGP
address-family ipv4 unicast
route-reflector-client
!
!
neighbor 10.13.37.1
use neighbor-group RR-CLIENTS
description cr1
!
neighbor 10.13.37.2
use neighbor-group RR-CLIENTS
description cr2
!
neighbor 10.13.37.5
use neighbor-group RR-CLIENTS
description pe1
!
neighbor 10.13.37.5
use neighbor-group RR-CLIENTS
description pe1
!
neighbor 10.13.37.6
use neighbor-group RR-CLIENTS
description pe2
!
neighbor 10.13.37.7
use neighbor-group RR-CLIENTS
description pe3
!
!
The full, finished configuration and CML lab setup file is on Github.
Some background and theory
What is BGP-ORR, short version
BGP-ORR is a BGP feature which allows a BGP Route Reflector to send a Route Reflector Client the best path based on the perspective of the client, and not the Route Reflector itself. This is accomplished by having the Route Reflector know about the IGP topology, and based on this data do SPF (Shortest Path First) calculations from IGP locations other than the Route Reflector.
What problem does it solve?
(Feel free to skip this if you’re already familiar with Route Reflectors and the inherent problems that come with them.)
Let’s take a look at our small lab network. It wouldn’t be a problem to configure each PE router to have iBGP sessions with the core routers and have the core routers act as route reflectors, reflecting the prefixes learnt from each PE while also advertise default routes (to make sure each PE can reach the rest of the network and the Internet).
This however gets very tedious, very fast. In a larger network with many PE routers there’s a lot of sessions to configure and a lot of churn on the core routers.
There’s also the money part to think about. Depending on how the network is designed the core routers might have to be able to consume a full BGP table (IPv4 and IPv6) and as I write this we’re at around 950k IPv4 prefixes and 170k IPv6 prefixes. Those are non-trivial numbers and requires some beefy hardware.1
From a network design perspective it would also be a lot easier (and cheaper) to replace the beefy core routers with pure LSR devices which just forward packets from one port to another.
That’s where a dedicated Route Reflector comes into play.
The Route Reflector (or just RR) is a dedicated router (either physical or virtual) whose job is to have iBGP sessions with each and every other router in the network and then reflect routes to other routers (according to configured policies). “Reflect routes” in this context means that the RR will send routes learnt from one iBGP neighbor to other iBGP neighbors - something that BGP normally doesn’t do and also requires the route-reflector-client configuration on the neighbor.
That’s it. Besides the extra configuration necessary for the neighbors there’s nothing special about a route reflector. And that’s the problem.
Let’s look at the lab network again.
Both cr1 and cr2 send a default-route to the RR. This is what the BGP table in the RR looks like:
RP/0/0/CPU0:rr#show bgp ipv4 unicast
Status codes: s suppressed, d damped, h history, * valid, > best
i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* i0.0.0.0/0 10.13.37.1 0 100 0 ?
*>i 10.13.37.2 0 100 0 ?
Processed 1 prefixes, 2 paths
It picks one of the default routes as the best and this is the route that will be advertised to all neighbors. This is the reason:
RP/0/0/CPU0:rr#show bgp ipv4 unicast 0.0.0.0/0 bestpath-compare
BGP routing table entry for 0.0.0.0/0
Versions:
Process bRIB/RIB SendTblVer
Speaker 6 6
Last Modified: Jan 17 11:52:41.481 for 00:01:26
Paths: (2 available, best #2)
Advertised to update-groups (with more than one peer):
0.2
Path #1: Received by speaker 0
Not advertised to any peer
Local, (Received from a RR-client)
10.13.37.1 (metric 30) from 10.13.37.1 (10.13.37.1)
Origin incomplete, metric 0, localpref 100, valid, internal
Received Path ID 0, Local Path ID 0, version 0
Higher IGP metric than best path (path #2)
Path #2: Received by speaker 0
Advertised to update-groups (with more than one peer):
0.2
Local, (Received from a RR-client)
10.13.37.2 (metric 20) from 10.13.37.2 (10.13.37.2)
Origin incomplete, metric 0, localpref 100, valid, internal, best, group-best
Received Path ID 0, Local Path ID 0, version 6
best of local AS, Overall best
The route from 10.13.37.1
(cr1) has a higher IGP metric than the route from cr2, which is expected.
And also as expected, cr2 is advertised as the best route:
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.6 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.2 10.13.37.2 ?
Processed 1 prefixes, 1 paths
In our small network with uniform IGP metrics on all links this isn’t an issue but this is rarely how the real world looks. We might want to have pe1 push its traffic towards cr1 by default, and pe2/pe3 against cr2 (due to for instance real-world distance). Then we would want to have the route reflector send the default route which has the lowest IGP metric, from the PE router perspective.
As our setup is right now this isn’t possible since the route reflector will send the route which is best from the perspective of the route reflector. This is the problem BGP-ORR solves, since the routes sent is based on the IGP metric seen from each PE.
What do I want to find out?
- How does BGP-ORR behave in its most basic configuration?
- What happens if I specify cr1 and cr2 as root nodes in my configuration?
Let’s get technical
First let’s make a small topology change. The IGP metric is increased to 100 between cr1 and pe2, and cr2 and pe1.
The basics
Before anything we must allow topology data from our IGP (IS-IS) to be re-distributed into BGP. This is done using this command on the route reflector:
router isis LAB
distribute link-state
!
OSPF has a the same command but there’s a caveat you should be aware of. If running IS-IS you must also specify the router-id
in the configuration, otherwise the router won’t calcuate the topology (ask me how I know…).
The next step is to configure an ORR group. It’s possible to have 32 groups (at least on IOS-XR and Arista EOS) and each group can have up to three IP addresses (also called root nodes) specified which the SPF calculates will be based on. I’m not sure when you would want to specify more than one, perhaps in a scenario where the primary router goes away and the route reflector needs to re-calculate based on a secondary router?
Anyway, we’ll start easy by creating one group per PE router:
router bgp 65534
address-family ipv4 unicast
optimal-route-reflection pe1 10.13.37.5
optimal-route-reflection pe2 10.13.37.6
optimal-route-reflection pe3 10.13.37.7
!
!
Now we can take a look at the ORR database:
RP/0/0/CPU0:rr#show orrspf database pe1
ORR policy: pe1, IPv4, RIB tableid: 0xe0000010
Configured root: primary: 10.13.37.5, secondary: NULL, tertiary: NULL
Actual Root: 10.13.37.5, Root node: 0100.1303.7005.0000
Prefix Cost
10.13.37.1/32 10
10.13.37.2/32 20
10.13.37.3/32 20
10.13.37.4/32 30
10.13.37.5/32 0
10.13.37.6/32 20
10.13.37.7/32 30
10.13.37.255/32 40
Number of mapping entries: 9
RP/0/0/CPU0:rr#show orrspf database pe2
ORR policy: pe2, IPv4, RIB tableid: 0xe0000011
Configured root: primary: 10.13.37.6, secondary: NULL, tertiary: NULL
Actual Root: 10.13.37.6, Root node: 0100.1303.7006.0000
Prefix Cost
10.13.37.1/32 20
10.13.37.2/32 10
10.13.37.3/32 30
10.13.37.4/32 20
10.13.37.5/32 30
10.13.37.6/32 0
10.13.37.7/32 10
10.13.37.255/32 30
Number of mapping entries: 9
Looks reasonable. Let’s change the metric between cr1 and pe1 to 1000 and see how the database changes:
RP/0/0/CPU0:rr#show orrspf database pe1
ORR policy: pe1, IPv4, RIB tableid: 0xe0000010
Configured root: primary: 10.13.37.5, secondary: NULL, tertiary: NULL
Actual Root: 10.13.37.5, Root node: 0100.1303.7005.0000
Prefix Cost
10.13.37.1/32 110
10.13.37.2/32 100
10.13.37.3/32 120
10.13.37.4/32 110
10.13.37.5/32 0
10.13.37.6/32 110
10.13.37.7/32 120
10.13.37.255/32 120
Number of mapping entries: 9
(If you’re playing along at home you should be prepared that changes in the network take some time to propagate fully into the ORR database, 15-20 seconds at least)
Since the IGP metric between pe1 and cr2 is 100 it’s fully expected to see a cost increase with 100. Let’s change the cr1-pe1 cost back to 10 again.
Now we have verified that the route reflector is “cost aware”, and thus can do SPF calculations based on each client (PE router). To actually enable this we need to activate it on each neighbor (or neighbor-group):
router bgp 65534
neighbor 10.13.37.5
address-family ipv4 unicast
optimal-route-reflection pe1
!
!
neighbor 10.13.37.6
address-family ipv4 unicast
optimal-route-reflection pe2
!
!
neighbor 10.13.37.7
address-family ipv4 unicast
optimal-route-reflection pe3
!
!
!
end
Let’s look at the advertised routes:
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.5 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.1 10.13.37.1 ?
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.6 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.2 10.13.37.2 ?
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.7 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.2 10.13.37.2 ?
If we go back and look at the output from show orrspf database
this is correct. pe1 (10.13.37.5
) has a lower metric towards cr1 (10.13.37.1
) and pe2/pe2 (10.13.37.6
/10.13.37.7
) has a lower metric towards cr2 (10.13.37.2
). Let’s increase the metric between cr1 and pe1 again and see what changes:
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.5 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.2 10.13.37.2 ?
Now the route reflector is advertising the default route cr2, since the cost to cr2 is lower.
The problem we had before is now gone since our route reflector is advertising not what it sees as best but rather what’s best for the client!
Let’s keep trying things out.
Do we need a unique ORR group for each router in our network?
No.
It’s perfectly fine to use the pe2 ORR group for the pe3 router, since it’s single-homed towards pe2 and probably should get the same routes.
router bgp 65534
neighbor 10.13.37.7
address-family ipv4 unicast
no optimal-route-reflection pe3
optimal-route-reflection pe2
!
!
!
end
Do the CR devices need to speak BGP?
No.
Let’s shutdown the BGP-session in cr1 so cr2 will be the only router advertising a default route:
router bgp 65534
neighbor 10.13.37.255
no shutdown
!
!
The route reflector will now advertise the default route from cr2 towars all PE devices but the next-hop is still based on IGP metric:
RP/0/0/CPU0:pe1#sh route | i 10.13.37.2
Gateway of last resort is 10.13.37.2 to network 0.0.0.0
B* 0.0.0.0/0 [200/0] via 10.13.37.2, 00:02:59
i L2 10.13.37.2/32 [115/20] via 10.13.37.1, 00:11:00, GigabitEthernet0/0/0/0
Just as expected.
Scaling up
I briefly mentioned a limit of 32 ORR groups. This specific number isn’t specified in the RFC but is a limit in both IOS-XR and EOS. Practically speaking this means that if we have more than 32 routers we need to think about our setup.
Having more virtual route reflectors is of course an option, where each reflector could serve a specific part of the network, but I want to find out what happens if I configure the core routers (cr1 and cr2) as root nodes in my ORR group and then use that group for all of my PE routers connected to those core routers - i.e. have one ORR group per region in my network.
Let’s create a new ORR group:
router bgp 65534
address-family ipv4 unicast
optimal-route-reflection core 10.13.37.1 10.13.37.2
!
neighbor 10.13.37.5
address-family ipv4 unicast
no optimal-route-reflection pe1
optimal-route-reflection core
!
!
neighbor 10.13.37.6
address-family ipv4 unicast
no optimal-route-reflection pe2
optimal-route-reflection core
!
!
neighbor 10.13.37.7
address-family ipv4 unicast
no optimal-route-reflection pe2
optimal-route-reflection core
!
!
!
ORRSPF database output:
RP/0/0/CPU0:rr#show orrspf database detail
ORR policy: core, IPv4, RIB tableid: 0xe0000013
Configured root: primary: 10.13.37.1, secondary: 10.13.37.2, tertiary: NULL
Actual Root: 10.13.37.1, Root node: 0100.1303.7001.0000
Prefix Cost
10.13.37.1/32 0
10.13.37.2/32 10
10.13.37.3/32 10
10.13.37.4/32 20
10.13.37.5/32 10
10.13.37.6/32 10
10.13.37.7/32 20
10.13.37.255/32 30
Number of mapping entries: 9
Looking at the output it’s probably not that hard to figure out what the route reflector will advertise (although it did take me a while to figure out what was happening).
The ORR group has the primary root node 10.13.37.1
, which is also advertising a default route. So the route reflector will advertise the route from 10.13.37.1
, because the cost is 0. Can’t get lower than that!2 So no matter the IGP metric the PE routers will always receive a default route from 10.13.37.1
(cr1).
This may or may not be a problem but I wouldn’t want to deploy this in a production network. I just feel it that somewhere down the line there’s sub-optimal routing or even a loop just waiting to happen :)
Before I continue I would like to point towards the presentation Modern BGP Design and how the author brings up the possibility of using ORR and add-path together.
Really quick about BGP add-path. By default a BGP speaker only advertises what it think is the best path - as in a single path. Add-path is an extra capability which enables a BGP speaker to send and/or receive more paths, or additional paths. It’s then up to the router to decide what to do with the extra path(s); discard, install as a backup path or something else.
As it is a capability there’s no guarantee that all devices support it and in that case only the best path will be advertised and we’re back to square one. Ask me how I know, again…
To enable add-path we need to configure it on both the Route Reflector and the PE devices:
PE:
router bgp 65534
bgp router-id 10.13.37.5
address-family ipv4 unicast
additional-paths receive
!
!
RR:
route-policy RR-CLIENTS
set path-selection backup 1 advertise
end-policy
!
router bgp 65534
address-family ipv4 unicast
optimal-route-reflection core 10.13.37.1 10.13.37.2
additional-paths send
additional-paths selection route-policy RR-CLIENTS
!
!
Let’s re-establish the session to make sure the capability has been negotiated:
RP/0/0/CPU0:pe1#clear bgp 10.13.37.255
RP/0/0/CPU0:pe1#show bgp ipv4 unicast neighbors 10.13.37.255 | i Additional
Additional-paths Send: received
Additional-paths Receive: advertised
Additional-paths operation: Receive
If everything is working as intended we should see the route reflector advertising two default routes, and the PE should accept them both and install the best one into the FIB:
RP/0/0/CPU0:rr#show bgp ipv4 unicast neighbors 10.13.37.5 advertised-routes
Network Next Hop From AS Path
0.0.0.0/0 10.13.37.2 10.13.37.2 ?
10.13.37.1 10.13.37.1 ?
RP/0/0/CPU0:pe1#show bgp ipv4 unicast
Status codes: s suppressed, d damped, h history, * valid, > best
i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* i0.0.0.0/0 10.13.37.2 0 100 0 ?
*>i 10.13.37.1 0 100 0 ?
This is looking promising. Very promising indeed. Let’s once again increase the metric between cr1 and pe1 to 1000 and see if anything changes. Fingers crossed.
RP/0/0/CPU0:pe1#show bgp ipv4 unicast
Status codes: s suppressed, d damped, h history, * valid, > best
i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i0.0.0.0/0 10.13.37.2 0 100 0 ?
* i 10.13.37.1 0 100 0 ?
Look at that, the default route has switched over to the other router - great!
Summary
That’s it. I’ve tried everything I wanted to try regarding BGP-ORR and I’m quite happy with the results. The limit of 32 groups can be a bit limiting in a large network but if using add-path is an option it seems to be a good way forward.
I’m still excited about BGP-ORR and I do look forward trying it out more in a real production network. Combined with add-path I think it’s going to work out just great.
If you’re read this far, thank you! If I’ve made any mistakes, misunderstood something or you just want to say hi, the easiest way is Mastodon.
Further reading
These are some other posts and articles about BGP-ORR that I recommend as further reading:
- BGP Optimal-route-reflection (BGP-ORR) (tgregory.org)
- Border Gateway Protocol (BGP) Optimal Route Reflection (cisco.com)
- BGP Optimal Route Reflection as an alternative to BGP Add Path (noction.com)
- BGP Optimal Route Reflection – BGP ORR (orhanergun.net)
- BGP Optimal Route Reflection 101 (ipspace.net)