draft-ietf-grow-ix-bgp-route-server-operations-03.txt | draft-ietf-grow-ix-bgp-route-server-operations-04.txt | |||
---|---|---|---|---|
GROW Working Group N. Hilliard | GROW Working Group N. Hilliard | |||
Internet-Draft INEX | Internet-Draft INEX | |||
Intended status: Informational E. Jasinska | Intended status: Informational E. Jasinska | |||
Expires: March 12, 2015 Netflix, Inc | Expires: April 23, 2015 Netflix, Inc | |||
R. Raszuk | R. Raszuk | |||
NTT I3 | Mirantis Inc. | |||
N. Bakker | N. Bakker | |||
Akamai Technologies B.V. | Akamai Technologies B.V. | |||
September 8, 2014 | October 20, 2014 | |||
Internet Exchange Route Server Operations | Internet Exchange Route Server Operations | |||
draft-ietf-grow-ix-bgp-route-server-operations-03 | draft-ietf-grow-ix-bgp-route-server-operations-04 | |||
Abstract | Abstract | |||
The popularity of Internet exchange points (IXPs) brings new | The popularity of Internet exchange points (IXPs) brings new | |||
challenges to interconnecting networks. While bilateral eBGP | challenges to interconnecting networks. While bilateral eBGP | |||
sessions between exchange participants were historically the most | sessions between exchange participants were historically the most | |||
common means of exchanging reachability information over an IXP, the | common means of exchanging reachability information over an IXP, the | |||
overhead associated with this interconnection method causes serious | overhead associated with this interconnection method causes serious | |||
operational and administrative scaling problems for IXP participants. | operational and administrative scaling problems for IXP participants. | |||
Multilateral interconnection using Internet route servers can | Multilateral interconnection using Internet route servers can | |||
dramatically reduce the administrative and operational overhead of | dramatically reduce the administrative and operational overhead | |||
IXP participation and these systems used by many IXP participants as | associated with connecting to IXPs; in some cases, route servers are | |||
a preferred means of exchanging routing information. | used by IXP participants as their preferred means of exchanging | |||
routing information. | ||||
This document describes operational considerations for multilateral | This document describes operational considerations for multilateral | |||
interconnections at IXPs. | interconnections at IXPs. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on March 12, 2015. | This Internet-Draft will expire on April 23, 2015. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 29 | skipping to change at page 2, line 29 | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 | |||
2. Bilateral BGP Sessions . . . . . . . . . . . . . . . . . . . 3 | 2. Bilateral BGP Sessions . . . . . . . . . . . . . . . . . . . 3 | |||
3. Multilateral Interconnection . . . . . . . . . . . . . . . . 4 | 3. Multilateral Interconnection . . . . . . . . . . . . . . . . 4 | |||
4. Operational Considerations for Route Server Installations . . 5 | 4. Operational Considerations for Route Server Installations . . 5 | |||
4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 5 | 4.1. Path Hiding . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . 6 | 4.2. Route Server Scaling . . . . . . . . . . . . . . . . . . 6 | |||
4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 6 | 4.2.1. Tackling Scaling Issues . . . . . . . . . . . . . . . 7 | |||
4.2.1.1. View Merging and Decomposition . . . . . . . . . 7 | 4.2.1.1. View Merging and Decomposition . . . . . . . . . 7 | |||
4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 7 | 4.2.1.2. Destination Splitting . . . . . . . . . . . . . . 8 | |||
4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 8 | 4.2.1.3. NEXT_HOP Resolution . . . . . . . . . . . . . . . 8 | |||
4.3. Prefix Leakage Mitigation . . . . . . . . . . . . . . . . 8 | 4.3. Prefix Leakage Mitigation . . . . . . . . . . . . . . . . 8 | |||
4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 8 | 4.4. Route Server Redundancy . . . . . . . . . . . . . . . . . 9 | |||
4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 | 4.5. AS_PATH Consistency Check . . . . . . . . . . . . . . . . 9 | |||
4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 9 | 4.6. Export Routing Policies . . . . . . . . . . . . . . . . . 9 | |||
4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 9 | 4.6.1. BGP Communities . . . . . . . . . . . . . . . . . . . 10 | |||
4.6.2. Internet Routing Registry . . . . . . . . . . . . . . 9 | 4.6.2. Internet Routing Registries . . . . . . . . . . . . . 10 | |||
4.6.3. Client-accessible Databases . . . . . . . . . . . . . 10 | 4.6.3. Client-accessible Databases . . . . . . . . . . . . . 10 | |||
4.7. Layer 2 Reachability Problems . . . . . . . . . . . . . . 10 | 4.7. Layer 2 Reachability Problems . . . . . . . . . . . . . . 10 | |||
4.8. BGP NEXT_HOP Hijacking . . . . . . . . . . . . . . . . . 10 | 4.8. BGP NEXT_HOP Hijacking . . . . . . . . . . . . . . . . . 11 | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | |||
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 | 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . 12 | 8.1. Normative References . . . . . . . . . . . . . . . . . . 13 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . 12 | 8.2. Informative References . . . . . . . . . . . . . . . . . 14 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
1. Introduction | 1. Introduction | |||
Internet exchange points (IXPs) provide IP data interconnection | Internet exchange points (IXPs) provide IP data interconnection | |||
facilities for their participants, typically using shared Layer-2 | facilities for their participants, using data link layer protocols | |||
networking media such as Ethernet. The Border Gateway Protocol (BGP) | such as Ethernet. The Border Gateway Protocol (BGP) [RFC4271] is | |||
[RFC4271] is normally used to facilitate exchange of network | normally used to facilitate exchange of network reachability | |||
reachability information over these media. | information over these media. | |||
As bilateral interconnection between IXP participants requires | As bilateral interconnection between IXP participants requires | |||
operational and administrative overhead, BGP route servers | operational and administrative overhead, BGP route servers | |||
[I-D.ietf-idr-ix-bgp-route-server] are often deployed by IXP | [I-D.ietf-idr-ix-bgp-route-server] are often deployed by IXP | |||
operators to provide a simple and convenient means of interconnecting | operators to provide a simple and convenient means of interconnecting | |||
IXP participants with each other. A route server redistributes | IXP participants with each other. A route server redistributes BGP | |||
prefixes received from its BGP clients to other clients according to | routes received from its BGP clients to other clients according to a | |||
a pre-specified policy, and it can be viewed as similar to an eBGP | pre-specified policy, and it can be viewed as similar to an eBGP | |||
equivalent of an iBGP [RFC4456] route reflector. | equivalent of an iBGP [RFC4456] route reflector. | |||
Route servers at IXPs require careful management and it is important | Route servers at IXPs require careful management and it is important | |||
for route server operators to thoroughly understand both how they | for route server operators to thoroughly understand both how they | |||
work and what their limitations are. In this document, we discuss | work and what their limitations are. In this document, we discuss | |||
several issues of operational relevance to route server operators and | several issues of operational relevance to route server operators and | |||
provide recommendations to help route server operators provision a | provide recommendations to help route server operators provision a | |||
reliable interconnection service. | reliable interconnection service. | |||
1.1. Notational Conventions | 1.1. Notational Conventions | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
[RFC2119]. | [RFC2119]. | |||
The phrase "BGP route" in this document should be interpreted as the | ||||
term "Route" described in [RFC4271]. | ||||
2. Bilateral BGP Sessions | 2. Bilateral BGP Sessions | |||
Bilateral interconnection is a method of interconnecting routers | Bilateral interconnection is a method of interconnecting routers | |||
using individual BGP sessions between each participant router on an | using individual BGP sessions between each pair of participant | |||
IXP, in order to exchange reachability information. If an IXP | routers on an IXP, in order to exchange reachability information. If | |||
participant wishes to implement an open interconnection policy - i.e. | an IXP participant wishes to implement an open interconnection policy | |||
a policy of interconnecting with as many other IXP participants as | - i.e. a policy of interconnecting with as many other IXP | |||
possible - it is necessary for the participant to liaise with each of | participants as possible - it is necessary for the participant to | |||
their intended interconnection partners. Interconnection can then be | liaise with each of their intended interconnection partners. | |||
implemented bilaterally by configuring a BGP session on both | Interconnection can then be implemented bilaterally by configuring a | |||
participants' routers to exchange network reachability information. | BGP session on both participants' routers to exchange network | |||
If each exchange participant interconnects with each other | reachability information. If each exchange participant interconnects | |||
participant, a full mesh of BGP sessions is needed, as shown in | with each other participant, a full mesh of BGP sessions is needed, | |||
Figure 1. | as shown in Figure 1. | |||
___ ___ | ___ ___ | |||
/ \ / \ | / \ / \ | |||
..| AS1 |..| AS2 |.. | ..| AS1 |..| AS2 |.. | |||
: \___/____\___/ : | : \___/____\___/ : | |||
: | \ / | : | : | \ / | : | |||
: | \ / | : | : | \ / | : | |||
: IXP | \/ | : | : IXP | \/ | : | |||
: | /\ | : | : | /\ | : | |||
: | / \ | : | : | / \ | : | |||
: _|_/____\_|_ : | : _|_/____\_|_ : | |||
: / \ / \ : | : / \ / \ : | |||
..| AS3 |..| AS4 |.. | ..| AS3 |..| AS4 |.. | |||
\___/ \___/ | \___/ \___/ | |||
Figure 1: Full-Mesh Interconnection at an IXP | Figure 1: Full-Mesh Interconnection at an IXP | |||
Figure 1 depicts an IXP platform with four connected routers, | Figure 1 depicts an IXP platform with four connected routers, | |||
administered by four separate exchange participants, each of them | administered by four separate exchange participants, each of them | |||
with a locally unique autonomous system number: AS1, AS2, AS3 and | with a locally unique autonomous system number: AS1, AS2, AS3 and | |||
AS4. Each of these four participants wishes to exchange traffic with | AS4. The lines between the routers depict BGP sessions; the dotted | |||
all other participants; this is accomplished by configuring a full | edge represents the IXP border. Each of these four participants | |||
mesh of BGP sessions on each router connected to the exchange, | wishes to exchange traffic with all other participants; this is | |||
resulting in 6 BGP sessions across the IXP fabric. | accomplished by configuring a full mesh of BGP sessions on each | |||
router connected to the exchange, resulting in 6 BGP sessions across | ||||
the IXP fabric. | ||||
The number of BGP sessions at an exchange has an upper bound of | The number of BGP sessions at an exchange has an upper bound of | |||
n*(n-1)/2, where n is the number of routers at the exchange. As many | n*(n-1)/2, where n is the number of routers at the exchange. As many | |||
exchanges have large numbers of participating networks, the amount of | exchanges have large numbers of participating networks, the amount of | |||
administrative and operation overhead required to implement an open | administrative and operation overhead required to implement an open | |||
interconnection scales quadratically. New participants to an IXP | interconnection scales quadratically. New participants to an IXP | |||
require significant initial resourcing in order to gain value from | require significant initial resourcing in order to gain value from | |||
their IXP connection, while existing exchange participants need to | their IXP connection, while existing exchange participants need to | |||
commit ongoing resources in order to benefit from interconnecting | commit ongoing resources in order to benefit from interconnecting | |||
with these new participants. | with these new participants. | |||
3. Multilateral Interconnection | 3. Multilateral Interconnection | |||
Multilateral interconnection is implemented using a route server | Multilateral interconnection is implemented using a route server | |||
configured to use BGP to distribute network layer reachability | configured to distribute BGP routes among client routers. The route | |||
information (NLRI) among all client routers. The route server | server preserves the BGP NEXT_HOP attribute from all received BGP | |||
preserves the BGP NEXT_HOP attribute from all received NLRI UPDATE | routes and passes them with unchanged NEXT_HOP to its route server | |||
messages, and passes these messages with unchanged NEXT_HOP to its | clients according to its configured routing policy, as described in | |||
route server clients, according to its configured routing policy, as | [I-D.ietf-idr-ix-bgp-route-server]. Using this method of exchanging | |||
described in [I-D.ietf-idr-ix-bgp-route-server]. Using this method | BGP routes, an IXP participant router can receive an aggregated list | |||
of exchanging NLRI messages, an IXP participant router can receive an | of BGP routes from all other route server clients using a single BGP | |||
aggregated list of prefixes from all other route server clients using | session to the route server instead of depending on BGP sessions with | |||
a single BGP session to the route server instead of depending on BGP | each other router at the exchange. This reduces the overall number | |||
sessions with each other router at the exchange. This reduces the | of BGP sessions at an Internet exchange from n*(n-1)/2 to n, where n | |||
overall number of BGP sessions at an Internet exchange from n*(n-1)/2 | is the number of routers at the exchange. | |||
to n, where n is the number of routers at the exchange. | ||||
Although a route server uses BGP to exchange reachability information | Although a route server uses BGP to exchange reachability information | |||
with each of its clients, it does not forward traffic itself and is | with each of its clients, it does not forward traffic itself and is | |||
therefore not a router. | therefore not a router. | |||
In practical terms, this allows dense interconnection between IXP | In practical terms, this allows dense interconnection between IXP | |||
participants with low administrative overhead and significantly | participants with low administrative overhead and significantly | |||
simpler and smaller router configurations. In particular, new IXP | simpler and smaller router configurations. In particular, new IXP | |||
participants benefit from immediate and extensive interconnection, | participants benefit from immediate and extensive interconnection, | |||
while existing route server participants receive reachability | while existing route server participants receive reachability | |||
skipping to change at page 6, line 27 | skipping to change at page 6, line 27 | |||
route server than where a single Loc-RIB is deployed for all clients. | route server than where a single Loc-RIB is deployed for all clients. | |||
As the [RFC4271] BGP decision process must be applied to all Loc-RIBs | As the [RFC4271] BGP decision process must be applied to all Loc-RIBs | |||
deployed on the route server, both CPU and memory requirements on the | deployed on the route server, both CPU and memory requirements on the | |||
host computer scale approximately according to O(P * N), where P is | host computer scale approximately according to O(P * N), where P is | |||
the total number of unique paths received by the route server and N | the total number of unique paths received by the route server and N | |||
is the number of route server clients which require a unique Loc-RIB. | is the number of route server clients which require a unique Loc-RIB. | |||
As this is a super-linear scaling relationship, large route servers | As this is a super-linear scaling relationship, large route servers | |||
may derive benefit from deploying per-client Loc-RIBs only where they | may derive benefit from deploying per-client Loc-RIBs only where they | |||
are required. | are required. | |||
Regardless of any Loc-RIB optimization technique is implemented, the | Regardless of whether any Loc-RIB optimization technique is | |||
route server's control plane bandwidth requirements will scale | implemented, the route server's theoretical upper-bound network | |||
according to O(P * N), where P is the total number of unique paths | bandwidth requirements will scale according to O(P_tot * N), where | |||
received by the route server and N is the total number of route | P_tot is the total number of unique paths received by the route | |||
server clients. In the case where P_avg (the arithmetic mean number | server and N is the total number of route server clients. In the | |||
of unique paths received per route server client) remains roughly | case where P_avg (the arithmetic mean number of unique paths received | |||
constant even as the number of connected clients increases, this | per route server client) remains roughly constant even as the number | |||
relationship can be rewritten as O((P_avg * N) * N) or O(N^2). This | of connected clients increases, the total number of prefixes will | |||
quadratic upper bound on the network traffic requirements indicates | equal the average number of prefixes multiplied by the number of | |||
that the route server model will not scale to arbitrarily large | clients. Symbolically, this can be written as P_tot = P_avg * N. If | |||
sizes. | we assume that in the worst case, each prefix is associated with a | |||
different set of BGP path attributes, so must be transmitted | ||||
individually, the network bandwidth scaling function can be rewritten | ||||
as O((P_avg * N) * N) or O(N^2). This quadratic upper bound on the | ||||
network traffic requirements indicates that the route server model | ||||
may not scale well for larger numbers of clients. | ||||
This scaling analysis presents problems in three key areas: route | In practice, most prefixes will be associated with a limited number | |||
processor CPU overhead associated with BGP decision process | of BGP path attribute sets, allowing more efficient transmission of | |||
calculations, the memory requirements for handling many different BGP | BGP routes from the route server than the theoretical analysis | |||
path entries, and the network traffic bandwidth required to | suggests. In the analysis above, P_tot will increase monotonically | |||
distribute these prefixes from the route server to each route server | according to the number of clients, but will have an upper limit of | |||
client. | the size of the full default-free routing table of the network in | |||
which the IXP is located. Observations from production route servers | ||||
have shown that most route server clients generally avoid using | ||||
custom routing policies and consequently the route server may not | ||||
need to deploy per-client Loc-RIBs. These practical bounds reduce | ||||
the theoretical worst-case scaling scenario to the point where route- | ||||
server deployments are manageable on even on larger IXPs. | ||||
4.2.1. Tackling Scaling Issues | 4.2.1. Tackling Scaling Issues | |||
The network traffic scaling issue presents significant difficulties | The problem of scaling route servers still presents serious practical | |||
with no clear solution - ultimately, each client must receive a | challenges and requires careful attention. Scaling analysis | |||
UPDATE for each unique prefix received by the route server. However, | indicates problems in three key areas: route processor CPU overhead | |||
there are several potential methods for dealing with the CPU and | associated with BGP decision process calculations, the memory | |||
memory resource requirements of route servers. | requirements for handling many different BGP path entries, and the | |||
network traffic bandwidth required to distribute these BGP routes | ||||
from the route server to each route server client. | ||||
4.2.1.1. View Merging and Decomposition | 4.2.1.1. View Merging and Decomposition | |||
View merging and decomposition, outlined in [RS-ARCH], describes a | View merging and decomposition, outlined in [RS-ARCH], describes a | |||
method of optimising memory and CPU requirements where multiple route | method of optimising memory and CPU requirements where multiple route | |||
server clients are subject to exactly the same routing policies. In | server clients are subject to exactly the same routing policies. In | |||
this situation, the multiple Loc-RIB views required by each client | this situation, multiple Loc-RIB views can be merged into a single | |||
are merged into a single view. | view. | |||
There are several variations of this approach. If the route server | There are several variations of this approach. If the route server | |||
operator has prior knowledge of interconnection relationships between | operator has prior knowledge of interconnection relationships between | |||
route server clients, then the operator may configure separate Loc- | route server clients, then the operator may configure separate Loc- | |||
RIBs only for route server clients with unique outbound routing | RIBs only for route server clients with unique routing policies. As | |||
policies. As this approach requires prior knowledge of | this approach requires prior knowledge of interconnection | |||
interconnection relationships, the route server operator must depend | relationships, the route server operator must depend on each client | |||
on each client sharing their interconnection policies, either in a | sharing their interconnection policies, either in a internal | |||
internal provisioning database controlled by the operator, or else in | provisioning database controlled by the operator, or else in an | |||
an external data store such as an Internet Routing Registry Database. | external data store such as an Internet Routing Registry Database. | |||
Conversely, the route server implementation itself may implement | Conversely, the route server implementation itself may implement | |||
internal view decomposition by creating virtual Loc-RIBs based on a | internal view decomposition by creating virtual Loc-RIBs based on a | |||
single in-memory master Loc-RIB, with delta differences for each | single in-memory master Loc-RIB, with delta differences for each | |||
prefix subject to different routing policies. This allows a more | prefix subject to different routing policies. This allows a more | |||
granular and flexible approach to the problem of Loc-RIB scaling, at | fine-grained and flexible approach to the problem of Loc-RIB scaling, | |||
the expense of requiring a more complex in-memory Loc-RIB structure. | at the expense of requiring a more complex in-memory Loc-RIB | |||
structure. | ||||
Whatever method of view merging and decomposition is chosen on a | Whatever method of view merging and decomposition is chosen on a | |||
route server, pathological edge cases can be created whereby they | route server, pathological edge cases can be created whereby they | |||
will scale no better than fully non-optimised per-client Loc-RIBs. | will scale no better than fully non-optimised per-client Loc-RIBs. | |||
However, as most route server clients connect to a route server for | However, as most route server clients connect to a route server for | |||
the purposes of reducing overhead, rather than implementing complex | the purposes of reducing overhead, rather than implementing complex | |||
per-client routing policies, edge cases tend not to arise in | per-client routing policies, edge cases tend not to arise in | |||
practice. | practice. | |||
4.2.1.2. Destination Splitting | 4.2.1.2. Destination Splitting | |||
Destination splitting, also described in [RS-ARCH], describes a | Destination splitting, also described in [RS-ARCH], describes a | |||
method for route server clients to connect to multiple route servers | method for route server clients to connect to multiple route servers | |||
and to send non-overlapping sets of prefixes to each route server. | and to send non-overlapping sets of prefixes to each route server. | |||
As each route server computes the best path for its own set of | As each route server computes the best path for its own set of | |||
prefixes, the quadratic scaling requirement operates on multiple | prefixes, the quadratic scaling requirement operates on multiple | |||
smaller sets of prefixes. This reduces the overall computational and | smaller sets of prefixes. This reduces the overall computational and | |||
memory requirements for managing multiple Loc-RIBs and performing the | memory requirements for managing multiple Loc-RIBs and performing the | |||
best-path calculation on each. In order for this method to perform | best-path calculation on each. | |||
well, destination splitting would require significant co-ordination | ||||
between the route server operator and each route server client. In | In practice, the route server operator would need all route server | |||
practice, this level of close co-ordination between IXP operators and | clients to send a full set of BGP routes to each route server. The | |||
their participants tends not to occur, suggesting that the approach | route server operator could then selectively filter these prefixes | |||
is unlikely to be of any real use on production IXPs. | for each route server by using either BGP Outbound Route Filtering | |||
[RFC5291] or else inbound prefix filters configured on client BGP | ||||
sessions. | ||||
4.2.1.3. NEXT_HOP Resolution | 4.2.1.3. NEXT_HOP Resolution | |||
As route servers are usually deployed at IXPs which use flat layer 2 | As route servers are usually deployed at IXPs where all connected | |||
networks, recursive resolution of the NEXT_HOP attribute is generally | routers are on the same layer 2 broadcast domain, recursive | |||
not required, and can be replaced by a simple check to ensure that | resolution of the NEXT_HOP attribute is generally not required, and | |||
the NEXT_HOP value for each prefix is a network address on the IXP | can be replaced by a simple check to ensure that the NEXT_HOP value | |||
LAN's IP address range. | for each received BGP route is a network address on the IXP LAN's IP | |||
address range. | ||||
4.3. Prefix Leakage Mitigation | 4.3. Prefix Leakage Mitigation | |||
Prefix leakage occurs when a BGP client unintentionally distributes | Prefix leakage occurs when a BGP client unintentionally distributes | |||
NLRI UPDATE messages to one or more neighboring BGP routers. Prefix | BGP routes to one or more neighboring BGP routers. Prefix leakage of | |||
leakage of this form to a route server can cause serious connectivity | this form to a route server can cause serious connectivity problems | |||
problems at an IXP if each route server client is configured to | at an IXP if each route server client is configured to accept all BGP | |||
accept all prefix UPDATE messages from the route server. It is | routes from the route server. It is therefore RECOMMENDED when | |||
therefore RECOMMENDED when deploying route servers that, due to the | deploying route servers that, due to the potential for collateral | |||
potential for collateral damage caused by NLRI leakage, route server | damage caused by BGP route leakage, route server operators deploy | |||
operators deploy prefix leakage mitigation measures in order to | prefix leakage mitigation measures in order to prevent unintentional | |||
prevent unintentional prefix announcements or else limit the scale of | prefix announcements or else limit the scale of any such leak. | |||
any such leak. Although not foolproof, per-client inbound prefix | Although not foolproof, per-client inbound prefix limits can restrict | |||
limits can restrict the damage caused by prefix leakage in many | the damage caused by prefix leakage in many cases. Per-client | |||
cases. Per-client inbound prefix filtering on the route server is a | inbound prefix filtering on the route server is a more deterministic | |||
more deterministic and usually more reliable means of preventing | and usually more reliable means of preventing prefix leakage, but | |||
prefix leakage, but requires more administrative resources to | requires more administrative resources to maintain properly. | |||
maintain properly. | ||||
If a route server operator implements per-client inbound prefix | If a route server operator implements per-client inbound prefix | |||
filtering, then it is RECOMMENDED that the operator also builds in | filtering, then it is RECOMMENDED that the operator also builds in | |||
mechanisms to automatically compare the Adj-RIB-In received from each | mechanisms to automatically compare the Adj-RIB-In received from each | |||
client with the inbound prefix lists configured for those clients. | client with the inbound prefix lists configured for those clients. | |||
Naturally, it is the responsibility of the route server client to | Naturally, it is the responsibility of the route server client to | |||
ensure that their stated prefix list is compatible with what they | ensure that their stated prefix list is compatible with what they | |||
announce to an IXP route server. However, many network operators do | announce to an IXP route server. However, many network operators do | |||
not carefully manage their published routing policies and it is not | not carefully manage their published routing policies and it is not | |||
uncommon to see significant variation between the two sets of | uncommon to see significant variation between the two sets of | |||
prefixes. Route server operator visibility into this discrepancy can | prefixes. Route server operator visibility into this discrepancy can | |||
provide significant advantages to both operator and client. | provide significant advantages to both operator and client. | |||
4.4. Route Server Redundancy | 4.4. Route Server Redundancy | |||
skipping to change at page 9, line 9 | skipping to change at page 9, line 28 | |||
multiple route servers on each shared Layer-2 domain. There is no | multiple route servers on each shared Layer-2 domain. There is no | |||
requirement to use the same BGP implementation or operating system | requirement to use the same BGP implementation or operating system | |||
for each route server on the IXP fabric; however, it is RECOMMENDED | for each route server on the IXP fabric; however, it is RECOMMENDED | |||
that where an operator provisions more than a single server on the | that where an operator provisions more than a single server on the | |||
same shared Layer-2 domain, each route server implementation be | same shared Layer-2 domain, each route server implementation be | |||
configured equivalently and in such a manner that the path | configured equivalently and in such a manner that the path | |||
reachability information from each system is identical. | reachability information from each system is identical. | |||
4.5. AS_PATH Consistency Check | 4.5. AS_PATH Consistency Check | |||
[RFC4271] requires that every BGP speaker which advertises a route to | [RFC4271] requires that every BGP speaker which advertises a BGP | |||
another external BGP speaker prepends its own AS number as the last | route to another external BGP speaker prepends its own AS number as | |||
element of the AS_PATH sequence. Therefore the leftmost AS in an | the last element of the AS_PATH sequence. Therefore the leftmost AS | |||
AS_PATH attribute should be equal to the autonomous system number of | in an AS_PATH attribute should be equal to the autonomous system | |||
the BGP speaker which sent the UPDATE message. | number of the BGP speaker which sent the BGP route. | |||
As [I-D.ietf-idr-ix-bgp-route-server] suggests that route servers | As [I-D.ietf-idr-ix-bgp-route-server] suggests that route servers | |||
should not modify the AS_PATH attribute, a consistency check on the | should not modify the AS_PATH attribute, a consistency check on the | |||
AS_PATH of an UPDATE received by a route server client would normally | AS_PATH of an BGP route received by a route server client would | |||
fail. It is therefore RECOMMENDED that route server clients disable | normally fail. It is therefore RECOMMENDED that route server clients | |||
the AS_PATH consistency check towards the route server. | disable the AS_PATH consistency check towards the route server. | |||
4.6. Export Routing Policies | 4.6. Export Routing Policies | |||
Policy filtering is commonly implemented on route servers to provide | Policy filtering is commonly implemented on route servers to provide | |||
prefix distribution control mechanisms for route server clients. A | prefix distribution control mechanisms for route server clients. A | |||
route server "export" policy is a policy which affects prefixes sent | route server "export" policy is a policy which affects prefixes sent | |||
from the route server to a route server client. Several different | from the route server to a route server client. Several different | |||
strategies are commonly used for implementing route server export | strategies are commonly used for implementing route server export | |||
policies. | policies. | |||
4.6.1. BGP Communities | 4.6.1. BGP Communities | |||
Prefixes sent to the route server are tagged with specific [RFC1997] | Prefixes sent to the route server are tagged with specific standard | |||
or [RFC4360] BGP community attributes, based on pre-defined values | [RFC1997] or extended [RFC4360] BGP community attributes, based on | |||
agreed between the operator and all client. Based on these community | pre-defined values agreed between the operator and all clients. | |||
tags, prefixes may be propagated to all other clients, a subset of | Based on these community tags, BGP routes may be propagated to all | |||
clients, or none. This mechanism allows route server clients to | other clients, a subset of clients, or none. This mechanism allows | |||
instruct the route server to implement per-client export routing | route server clients to instruct the route server to implement per- | |||
policies. | client export routing policies. | |||
As both standard and extended BGP communities values are restricted | As both standard and extended BGP community values are currently | |||
to 6 octets, the route server operator should take care to ensure | restricted to 6 octets or fewer, it is not possible for both the | |||
that the predefined BGP community values mechanism used on their | global and local administrator fields in the BGP community to fit a | |||
route server is compatible with [RFC4893] 4-octet autonomous system | 4-octet autonomous system number. Bearing this in mind, the route | |||
numbers. | server operator SHOULD take care to ensure that the predefined BGP | |||
community values mechanism used on their route server is compatible | ||||
with [RFC4893] 4-octet ASNs. | ||||
4.6.2. Internet Routing Registry | 4.6.2. Internet Routing Registries | |||
Internet Routing Registry databases (IRRDBs) may be used by route | Internet Routing Registry databases (IRRDBs) may be used by route | |||
server operators to implement construct per-client routing policies. | server operators to construct per-client routing policies. [RFC2622] | |||
[RFC2622] Routing Policy Specification Language (RPSL) provides an | Routing Policy Specification Language (RPSL) provides an | |||
comprehensive grammar for describing interconnection relationships, | comprehensive grammar for describing interconnection relationships, | |||
and several toolsets exist which can be used to translate RPSL policy | and several toolsets exist which can be used to translate RPSL policy | |||
description into route server configurations. | description into route server configurations. | |||
4.6.3. Client-accessible Databases | 4.6.3. Client-accessible Databases | |||
Should the route server operator not wish to use either BGP community | Should the route server operator not wish to use either BGP community | |||
tags or the public IRRDBs for implementing client export policies, | tags or the public IRRDBs for implementing client export policies, | |||
they may implement their own routing policy database system for | they may implement their own routing policy database system for | |||
managing their clients' requirements. A database of this form SHOULD | managing their clients' requirements. A database of this form SHOULD | |||
skipping to change at page 10, line 25 | skipping to change at page 10, line 50 | |||
they wish to exchange all their prefixes with any other route server | they wish to exchange all their prefixes with any other route server | |||
client. Optionally, the implementation may allow a client to specify | client. Optionally, the implementation may allow a client to specify | |||
unique routing policies for individual prefixes over which they have | unique routing policies for individual prefixes over which they have | |||
routing policy control. | routing policy control. | |||
4.7. Layer 2 Reachability Problems | 4.7. Layer 2 Reachability Problems | |||
Layer 2 reachability problems on an IXP can cause serious operational | Layer 2 reachability problems on an IXP can cause serious operational | |||
problems for IXP participants which depend on route servers for | problems for IXP participants which depend on route servers for | |||
interconnection. Ethernet switch forwarding bugs have occasionally | interconnection. Ethernet switch forwarding bugs have occasionally | |||
been observed to cause non-commutative reachability. For example, | been observed to cause non-transitive reachability. For example, | |||
given a route server and two IXP participants, A and B, if the two | given a route server and two IXP participants, A and B, if the two | |||
participants can reach the route server but cannot reach each other, | participants can reach the route server but cannot reach each other, | |||
then traffic between the participants may be dropped until such time | then traffic between the participants may be dropped until such time | |||
as the layer 2 forwarding problem is resolved. This situation does | as the layer 2 forwarding problem is resolved. This situation does | |||
not tend to occur in bilateral interconnection arrangements, as the | not tend to occur in bilateral interconnection arrangements, as the | |||
routing control path between the two hosts is usually (but not | routing control path between the two hosts is usually (but not | |||
always, due to IXP inter-switch connectivity load balancing | always, due to IXP inter-switch connectivity load balancing | |||
algorithms) the same as the data path between them. | algorithms) the same as the data path between them. | |||
Problems of this form can be dealt with using [RFC5881] bidirectional | Problems of this form can be partially mitigated by using [RFC5881] | |||
forwarding detection. However, as this is a bilateral protocol | bidirectional forwarding detection. However, as this is a bilateral | |||
configured between routers, and as there is currently no means for | protocol configured between routers, and as there is currently no | |||
automatic configuration of BFD between route server clients, BFD does | protocol to automatically configure BFD sessions between route server | |||
not currently provide an optimal means of handling the problem. | clients, BFD does not currently provide an optimal means of handling | |||
the problem. Even if automatic BFD session configuration were | ||||
possible, practical problems would remain. If two IXP route server | ||||
clients were configured to run BFD between each other and the | ||||
protocol detected a non-transitive loss of reachability between them, | ||||
each of those routers would internally mark the other's prefixes as | ||||
unreachable via the BGP path announced by the route server. As the | ||||
route server only propagates a single best path to each client, this | ||||
could cause either sub-optimal routing or complete connectivity loss | ||||
if there were no alternative paths learned from other BGP sessions. | ||||
4.8. BGP NEXT_HOP Hijacking | 4.8. BGP NEXT_HOP Hijacking | |||
Section 5.1.3(2) of [RFC4271] allows eBGP speakers to change the | Section 5.1.3(2) of [RFC4271] allows eBGP speakers to change the | |||
NEXT_HOP address of an NLRI update to be a different internet address | NEXT_HOP address of a received BGP route to be a different internet | |||
on the same subnet. This is the mechanism which allows route servers | address on the same subnet. This is the mechanism which allows route | |||
to operate on a shared layer 2 IXP network. However, the mechanism | servers to operate on a shared layer 2 IXP network. However, the | |||
can be abused by route server clients to redirect traffic for their | mechanism can be abused by route server clients to redirect traffic | |||
prefixes to other IXP participant routers. | for their prefixes to other IXP participant routers. | |||
____ | ____ | |||
/ \ | / \ | |||
| AS99 | | | AS99 | | |||
\____/ | \____/ | |||
/ \ | / \ | |||
/ \ | / \ | |||
__/ \__ | __/ \__ | |||
/ \ / \ | / \ / \ | |||
..| AS1 |..| AS2 |.. | ..| AS1 |..| AS2 |.. | |||
skipping to change at page 11, line 26 | skipping to change at page 12, line 26 | |||
: \ / : | : \ / : | |||
: \__/ : | : \__/ : | |||
: IXP / \ : | : IXP / \ : | |||
: | RS | : | : | RS | : | |||
: \____/ : | : \____/ : | |||
: : | : : | |||
.................... | .................... | |||
Figure 3: BGP NEXT_HOP Hijacking using a Route Server | Figure 3: BGP NEXT_HOP Hijacking using a Route Server | |||
For example in Figure 3, if AS1 and AS2 both announce prefixes for | For example in Figure 3, if AS1 and AS2 both announce BGP routes for | |||
AS99 to the route server, AS1 could set the NEXT_HOP address for | AS99 to the route server, AS1 could set the NEXT_HOP address for | |||
AS99's prefixes to be the address of AS2's router, thereby diverting | AS99's routes to be the address of AS2's router, thereby diverting | |||
traffic for AS99 via AS2. This may override the routing policies of | traffic for AS99 via AS2. This may override the routing policies of | |||
AS99 and AS2. | AS99 and AS2. | |||
Worse still, if the route server operator does not use inbound prefix | Worse still, if the route server operator does not use inbound prefix | |||
filtering, AS1 could announce any arbitrary prefix to the route | filtering, AS1 could announce any arbitrary prefix to the route | |||
server with a NEXT_HOP address of any other IXP participant. This | server with a NEXT_HOP address of any other IXP participant. This | |||
could be used as a denial of service mechanism against either the | could be used as a denial of service mechanism against either the | |||
users of the address space being announced by illicitly diverting | users of the address space being announced by illicitly diverting | |||
their traffic, or the other IXP participant by overloading their | their traffic, or the other IXP participant by overloading their | |||
network with traffic which would not normally be sent there. | network with traffic which would not normally be sent there. | |||
This problem is not specific to route servers and it can also be | This problem is not specific to route servers and it can also be | |||
implemented using bilateral peering sessions. However, the potential | implemented using bilateral BGP sessions. However, the potential | |||
damage is amplified by route servers because a single BGP session can | damage is amplified by route servers because a single BGP session can | |||
be used to affect many networks simultaneously. | be used to affect many networks simultaneously. | |||
Route server operators SHOULD check that the BGP NEXT_HOP attribute | Because route server clients cannot easily implement next-hop policy | |||
for NLRIs received from a route server client matches the interface | checks against route server BGP sessions, route server operators | |||
address of the client. If the route server receives an NLRI where | SHOULD check that the BGP NEXT_HOP attribute for BGP routes received | |||
these addresses are different and where the announcing route server | from a route server client matches the interface address of the | |||
client is in a different autonomous system to the route server client | client. If the route server receives an BGP route where these | |||
which uses the next hop address, the NLRI SHOULD be dropped. | addresses are different and where the announcing route server client | |||
is in a different autonomous system to the route server client which | ||||
uses the next hop address, the BGP route SHOULD be dropped. | ||||
Permitting next-hop rewriting for the same autonomous system allows | ||||
an organisation with multiple connections into an IXP configured with | ||||
different IP addresses to direct traffic off the IXP infrastructure | ||||
through any of their connections for traffic engineering or other | ||||
purposes. | ||||
5. Security Considerations | 5. Security Considerations | |||
On route server installations which do not employ path hiding | On route server installations which do not employ path hiding | |||
mitigation techniques, the path hiding problem outlined in section | mitigation techniques, the path hiding problem outlined in | |||
Section 4.1 can be used in certain circumstances to proactively block | Section 4.1 could be used by an IXP participant to prevent the route | |||
third party prefix announcements from other route server clients. | server from sending any BGP routes for a particular prefix to other | |||
route server clients, even if there were a valid path to that | ||||
destination via another route server client. | ||||
If the route server operator does not implement prefix leakage | If the route server operator does not implement prefix leakage | |||
mitigation as described in section Section 4.3, it is trivial for | mitigation as described in Section 4.3, it is trivial for route | |||
route server clients to implement denial of service attacks against | server clients to implement denial of service attacks against | |||
arbitrary Internet networks using a route server. | arbitrary Internet networks by leaking BGP routes to a route server. | |||
Route server installations SHOULD be secured against BGP NEXT_HOP | Route server installations SHOULD be secured against BGP NEXT_HOP | |||
hijacking, as described in section Section 4.8. | hijacking, as described in Section 4.8. | |||
6. IANA Considerations | 6. IANA Considerations | |||
There are no IANA considerations. | There are no IANA considerations. | |||
7. Acknowledgments | 7. Acknowledgments | |||
The authors would like to thank Chris Hall, Ryan Bickhart, Steven | The authors would like to thank Chris Hall, Ryan Bickhart, Steven | |||
Bakker and Eduardo Ascenco Reis for their valuable input. | Bakker and Eduardo Ascenco Reis for their valuable input. | |||
In addition, the authors would like to acknowledge the developers of | ||||
BIRD, OpenBGPD and Quagga, whose open source BGP implementations | ||||
include route server capabilities which are compliant with this | ||||
document. | ||||
8. References | 8. References | |||
8.1. Normative References | 8.1. Normative References | |||
[I-D.ietf-idr-ix-bgp-route-server] | [I-D.ietf-idr-ix-bgp-route-server] | |||
Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, | Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, | |||
"Internet Exchange Route Server", draft-ietf-idr-ix-bgp- | "Internet Exchange Route Server", draft-ietf-idr-ix-bgp- | |||
route-server-05 (work in progress), June 2014. | route-server-05 (work in progress), June 2014. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
skipping to change at page 13, line 23 | skipping to change at page 14, line 28 | |||
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended | [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended | |||
Communities Attribute", RFC 4360, February 2006. | Communities Attribute", RFC 4360, February 2006. | |||
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route | [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route | |||
Reflection: An Alternative to Full Mesh Internal BGP | Reflection: An Alternative to Full Mesh Internal BGP | |||
(IBGP)", RFC 4456, April 2006. | (IBGP)", RFC 4456, April 2006. | |||
[RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS | [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS | |||
Number Space", RFC 4893, May 2007. | Number Space", RFC 4893, May 2007. | |||
[RFC5291] Chen, E. and Y. Rekhter, "Outbound Route Filtering | ||||
Capability for BGP-4", RFC 5291, August 2008. | ||||
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | |||
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June | (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June | |||
2010. | 2010. | |||
[RS-ARCH] Govindan, R., Alaettinoglu, C., Varadhan, K., and D. | [RS-ARCH] Govindan, R., Alaettinoglu, C., Varadhan, K., and D. | |||
Estrin, "A Route Server Architecture for Inter-Domain | Estrin, "A Route Server Architecture for Inter-Domain | |||
Routing", 1995, | Routing", 1995, | |||
<http://www.cs.usc.edu/research/95-603.ps.Z>. | <http://www.cs.usc.edu/assets/003/83191.pdf>. | |||
Authors' Addresses | Authors' Addresses | |||
Nick Hilliard | Nick Hilliard | |||
INEX | INEX | |||
4027 Kingswood Road | 4027 Kingswood Road | |||
Dublin 24 | Dublin 24 | |||
IE | IE | |||
Email: nick@inex.ie | Email: nick@inex.ie | |||
skipping to change at page 13, line 41 | skipping to change at page 15, line 4 | |||
Authors' Addresses | Authors' Addresses | |||
Nick Hilliard | Nick Hilliard | |||
INEX | INEX | |||
4027 Kingswood Road | 4027 Kingswood Road | |||
Dublin 24 | Dublin 24 | |||
IE | IE | |||
Email: nick@inex.ie | Email: nick@inex.ie | |||
Elisa Jasinska | Elisa Jasinska | |||
Netflix, Inc | Netflix, Inc | |||
100 Winchester Circle | 100 Winchester Circle | |||
Los Gatos, CA 95032 | Los Gatos, CA 95032 | |||
USA | USA | |||
Email: elisa@netflix.com | Email: elisa@netflix.com | |||
Robert Raszuk | Robert Raszuk | |||
NTT I3 | Mirantis Inc. | |||
101 S Ellsworth Avenue Suite 350 | 615 National Ave. #100 | |||
San Mateo, CA 94401 | Mt View, CA 94043 | |||
US | USA | |||
Email: robert@raszuk.net | Email: robert@raszuk.net | |||
Niels Bakker | Niels Bakker | |||
Akamai Technologies B.V. | Akamai Technologies B.V. | |||
Kingsfordweg 151 | Kingsfordweg 151 | |||
Amsterdam 1043 GR | Amsterdam 1043 GR | |||
NL | NL | |||
Email: nbakker@akamai.com | Email: nbakker@akamai.com | |||
End of changes. 50 change blocks. | ||||
174 lines changed or deleted | 215 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |