draft-ietf-grow-va-04.txt | draft-ietf-grow-va-05.txt | |||
---|---|---|---|---|
Network Working Group P. Francis | Network Working Group P. Francis | |||
Internet-Draft MPI-SWS | Internet-Draft MPI-SWS | |||
Intended status: Informational X. Xu | Intended status: Informational X. Xu | |||
Expires: August 26, 2011 Huawei | Expires: January 2, 2012 Huawei | |||
H. Ballani | H. Ballani | |||
Cornell U. | Cornell U. | |||
D. Jen | D. Jen | |||
UCLA | UCLA | |||
R. Raszuk | R. Raszuk | |||
Cisco | Cisco | |||
L. Zhang | L. Zhang | |||
UCLA | UCLA | |||
February 22, 2011 | July 1, 2011 | |||
FIB Suppression with Virtual Aggregation | FIB Suppression with Virtual Aggregation | |||
draft-ietf-grow-va-04.txt | draft-ietf-grow-va-05.txt | |||
Abstract | Abstract | |||
The continued growth in the Default Free Routing Table (DFRT) | The continued growth in the Default Free Routing Table (DFRT) | |||
stresses the global routing system in a number of ways. One of the | stresses the global routing system in a number of ways. One of the | |||
most costly stresses is FIB size: ISPs often must upgrade router | most costly stresses is FIB size: ISPs often must upgrade router | |||
hardware simply because the FIB has run out of space, and router | hardware simply because the FIB has run out of space, and router | |||
vendors must design routers that have adequate FIB. FIB suppression | vendors must design routers that have adequate FIB. FIB suppression | |||
is an approach to relieving stress on the FIB by NOT loading selected | is an approach to relieving stress on the FIB by not loading selected | |||
RIB entries into the FIB. Virtual Aggregation (VA) allows ISPs to | RIB entries into the FIB. Virtual Aggregation (VA) allows ISPs to | |||
shrink the FIBs of any and all routers, easily by an order of | shrink the FIBs of any and all routers, easily by an order of | |||
magnitude with negligible increase in path length and load. FIB | magnitude with negligible increase in path length and load. FIB | |||
suppression deployed autonomously by an ISP (cooperation between ISPs | suppression can be deployed autonomously by an ISP without requiring | |||
is not required), and can co-exist with legacy routers in the ISP. | cooperation between adjacent ISPs, and can co-exist with legacy | |||
There are no changes from the 03 version. | routers in the ISP. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on August 26, 2011. | This Internet-Draft will expire on January 2, 2012. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2011 IETF Trust and the persons identified as the | Copyright (c) 2011 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 5 | 1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 5 | |||
1.2. Requirements notation . . . . . . . . . . . . . . . . . . 6 | 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 5 | |||
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2. Overview of Virtual Aggregation (VA) . . . . . . . . . . . . . 7 | 2. Overview of Virtual Aggregation (VA) . . . . . . . . . . . . . 6 | |||
2.1. Mix of legacy and VA routers . . . . . . . . . . . . . . . 9 | 2.1. Mix of Legacy and VA Routers . . . . . . . . . . . . . . . 9 | |||
2.2. Summary of Tunnels and Paths . . . . . . . . . . . . . . . 10 | 2.2. Summary of Tunnels and Paths . . . . . . . . . . . . . . . 9 | |||
3. Specification of VA . . . . . . . . . . . . . . . . . . . . . 11 | 3. Specification of VA . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.1. VA Operation . . . . . . . . . . . . . . . . . . . . . . . 12 | 3.1. Legacy Routers . . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.1.1. Legacy Routers . . . . . . . . . . . . . . . . . . . . 12 | 3.2. Advertising and Handling Virtual Prefixes (VP) . . . . . . 12 | |||
3.1.2. Advertising and Handling Virtual Prefixes (VP) . . . . 12 | 3.2.1. Distinguishing VPs from Sub-prefixes . . . . . . . . . 12 | |||
3.1.3. Border VA Routers . . . . . . . . . . . . . . . . . . 16 | 3.2.2. Limitations on Virtual Prefixes . . . . . . . . . . . 12 | |||
3.1.4. Advertising and Handling Sub-Prefixes . . . . . . . . 16 | 3.2.3. Aggregation Point Routers (APR) . . . . . . . . . . . 13 | |||
3.1.5. Suppressing FIB Sub-prefix Routes . . . . . . . . . . 17 | 3.2.4. Non-APR Routers . . . . . . . . . . . . . . . . . . . 14 | |||
3.2. New Configuration . . . . . . . . . . . . . . . . . . . . 18 | 3.2.5. Adding and deleting VPs . . . . . . . . . . . . . . . 14 | |||
4. Usage of Tunnels . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.3. Border VA Routers . . . . . . . . . . . . . . . . . . . . 15 | |||
4.1. MPLS tunnels . . . . . . . . . . . . . . . . . . . . . . . 19 | 3.4. Advertising and Handling Sub-Prefixes . . . . . . . . . . 15 | |||
4.2. Usage of Inner Label . . . . . . . . . . . . . . . . . . . 20 | 3.5. Suppressing FIB Sub-prefix Routes . . . . . . . . . . . . 16 | |||
3.5.1. Selecting Popular Prefixes . . . . . . . . . . . . . . 16 | ||||
3.6. New Configuration . . . . . . . . . . . . . . . . . . . . 17 | ||||
3.7. Interaction with Traffic Engineering . . . . . . . . . . . 18 | ||||
4. Usage of MPLS Tunnels . . . . . . . . . . . . . . . . . . . . 19 | ||||
4.1. Usage of Inner Label . . . . . . . . . . . . . . . . . . . 19 | ||||
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 20 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 20 | |||
6.1. Properly Configured VA . . . . . . . . . . . . . . . . . . 20 | 6.1. Properly Configured VA . . . . . . . . . . . . . . . . . . 20 | |||
6.2. Mis-configured VA . . . . . . . . . . . . . . . . . . . . 21 | 6.2. Mis-configured VA . . . . . . . . . . . . . . . . . . . . 21 | |||
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . . 22 | 8.1. Normative References . . . . . . . . . . . . . . . . . . . 21 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . . 22 | 8.2. Informative References . . . . . . . . . . . . . . . . . . 22 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
1. Introduction | 1. Introduction | |||
ISPs today manage constant DFRT growth in a number of ways. One way, | ISPs today manage constant DFRT growth in a number of ways. One way, | |||
of course, is for ISPs to upgrade their router hardware before DFRT | of course, is for ISPs to upgrade their router hardware before DFRT | |||
growth outstrips the size of the FIB. This is too expensive for many | growth outstrips the size of the FIB. This may be too expensive for | |||
ISPs. They would prefer to extend the lifetime of routers whose FIBs | many ISPs. They would prefer to extend the lifetime of routers whose | |||
can no longer hold the full DFRT. | FIBs can no longer hold the full DFRT. | |||
A common approach taken by lower-tier ISPs is to default route to | A common approach taken by lower-tier ISPs is to default route to | |||
their providers. Routes to customers and peer ISPs are maintained, | their transit providers. Routes to customers and peer ISPs are | |||
but everything else defaults to the provider. This approach has | maintained, but everything else defaults to the provider. This | |||
several disadvantages. First, packets to Internet destinations may | approach has several disadvantages. First, packets to Internet | |||
take longer-than-necessary AS paths. This problem can be mitigated | destinations may take longer-than-necessary Autonomous System (AS) | |||
through careful configuration of partial defaults, but this can | paths. This problem can be mitigated through careful configuration | |||
require substantial configuration overhead. A second problem with | of partial defaults, but this can require substantial configuration | |||
defaulting to providers is that the ISP is no longer able to provide | overhead. A second problem with defaulting to providers is that the | |||
the full DFRT to its customers. Finally, provider defaults prevents | ISP is no longer able to provide the full DFRT to its customers. | |||
the ISP from being able to detect martian packets. As a result, the | Finally, provider defaults prevents the ISP from being able to detect | |||
ISP transmits packets that could otherwise have been dropped over its | martian packets. As a result, the ISP transmits packets that could | |||
expensive provider links. | otherwise have been dropped over its expensive provider links. | |||
An alternative is for the ISP to maintain full routes in its core | An alternative is for the ISP to maintain full routes in its core | |||
routers, but to filter routes from edge routers that do not require a | routers, but to filter routes from edge routers that do not require a | |||
full DFRT. These edge routers can then default route to the core | full DFRT. These edge routers can then default route to the core | |||
routers. This is often possible with edge routers that interface to | routers. This is often possible with edge routers that interface to | |||
customer networks. The problem with this approach is that it cannot | customer networks. The problem with this approach is that it cannot | |||
be used for all edge routers. For instance, it cannot be used for | be used for all edge routers. For instance, it cannot be used for | |||
routers that connect to transits. It of course also does not help in | routers that connect to transits. It of course also does not help in | |||
cases where core routers themselves have inadequate FIB capacity. | cases where core routers themselves have inadequate FIB capacity. | |||
FIB Suppression is an approach to shrinking FIB size that requires no | FIB Suppression is an approach to shrinking FIB size that requires no | |||
changes to BGP, no changes to packet forwarding mechanisms in | changes to BGP, no changes to packet forwarding mechanisms in | |||
routers, and relatively minor changes to control mechanisms in | routers, and relatively minor changes to control mechanisms in | |||
routers and configuration of those mechanisms. The core idea behind | routers and configuration of those mechanisms. The core idea behind | |||
FIB suppression is to run BGP as normal, and in particular to not | FIB suppression is to run BGP as normal, and in particular to not | |||
shrink the RIB, but rather to not load certain RIB entries into the | shrink the RIB, but rather to not load certain RIB entries into the | |||
FIB. This approach minimizes changes to routers, and in particular | FIB. This approach minimizes changes to routers, and in particular | |||
is simpler than more general routing architectures that try to shrink | is simpler than more general routing architectures that try to shrink | |||
both RIB and FIB. With FIB suppression, there are no changes to BGP | both RIB and FIB. With FIB suppression, there are no changes to BGP | |||
per se. The BGP decision process does not change. The selected AS- | per se. The BGP decision process does not change, the selected AS- | |||
path does not change, and except on rare occasion the exit router | PATH does not change, and except on rare occasion the exit router | |||
does not change. ISPs can deploy FIB suppression autonomously and | does not change. ISPs can deploy FIB suppression autonomously and | |||
with no coordination with neighboring ASes. | with no coordination with neighboring ASes. | |||
This document describes an approach to FIB suppression called | This document describes an approach to FIB suppression called | |||
"Virtual Aggregation" (VA). VA operates by organizing the IP (v4 or | "Virtual Aggregation" (VA). VA operates by organizing the IP (v4 or | |||
v6) address space into Virtual Prefixes (VP), and using tunnels to | v6) address space into Virtual Prefixes (VP), and using tunnels to | |||
aggregate the (regular) sub-prefixes within each VP. The decrease in | aggregate the (regular) sub-prefixes within each VP. The decrease in | |||
FIB size can be dramatic, easily 5x or 10x with only a slight path | FIB size can be dramatic, easily 5x or 10x with only a slight path | |||
length and router load increase [nsdi09]. The VPs can be organized | length and router load increase [nsdi09]. | |||
such that all routers in an ISP see FIB size decrease, or in such a | ||||
way that "core" routers keep the full FIB, and "edge" routers have | ||||
almost no FIB (i.e. by defining a VP of 0/0). This "core-edge" style | ||||
of VA deployment is much simpler than a "full" VA deployment, whereby | ||||
multiple VPs are defined, and any router, core or otherwise, can have | ||||
reduced FIB size. This simpler "core-edge" style of deployment is | ||||
specified in a separate draft in order to make it more easily | ||||
understandable [I-D.ietf-grow-simple-va]. | ||||
VA has the following characteristics: | ||||
o it is robust to router failure, | ||||
o it allows for traffic engineering, | ||||
o it allows for existing inter-domain routing policies, | ||||
o it operates in a predictable manner and is therefore possible to | ||||
test, debug, and reason about performance (i.e. establish SLAs), | ||||
o it can be safely installed, tested, and started up, | ||||
o it can be configured and reconfigured without service | ||||
interruption, | ||||
o it can be incrementally deployed, and in particular can be | ||||
operated in an AS with a mix of VA-capable and legacy routers, | ||||
o it accommodates existing security mechanisms such as unicast | ||||
Reverse Path Forwarding (uRPF) ingress filtering and DoS defense, | ||||
o does not introduce significant new security vulnerabilities. | ||||
1.1. Scope of this Document | 1.1. Scope of this Document | |||
The scope of this document is limited to Intra-domain VA operation. | The scope of this document is limited to intra-domain VA operation. | |||
In other words, the case where a single ISP autonomously operates VA | Individual ASs autonomously operate VA internally without any | |||
internally without any coordination with neighboring ISPs. | coordination with neighboring ASs. For the remainder of this | |||
document, the terms ISP, AS, and domain are used interchangeably. | ||||
Note that this document assumes that the VA "domain" (i.e. the unit | ||||
of autonomy) is the AS (that is, different ASes run VA independently | ||||
and without coordination). For the remainder of this document, the | ||||
terms ISP, AS, and domain are used interchangeably. | ||||
This document applies equally to IPv4 and IPv6. | This document applies equally to IPv4 and IPv6. | |||
This document is limited to the following tunnel types: MPLS Label | ||||
Switched Paths (LSP), and of MPLS inner labels tunneled over either | ||||
LSPs or IP headers. | ||||
VA may operate with a mix of upgraded routers and legacy routers. | VA may operate with a mix of upgraded routers and legacy routers. | |||
There are no topological restrictions placed on the mix of routers. | There are no topological restrictions placed on the mix of routers. | |||
In order to avoid loops between upgraded and legacy routers, packets | In order to avoid loops between upgraded and legacy routers, packets | |||
are always tunneled by the VA routers to the BGP NEXT_HOPs of the | are always tunneled by the VA routers to the BGP NEXT_HOPs of the | |||
matched BGP routes. If a given local ASBR is a legacy router, it | matched BGP routes. If a given local ASBR (Autonomous System Border | |||
must be able to terminate tunnels. | Router) is a legacy router, it must be able to terminate tunnels. | |||
1.2. Requirements notation | 1.2. Requirements notation | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" when | |||
document are to be interpreted as described in [RFC2119]. | capitalized in this document are to be interpreted as described in | |||
[RFC2119]. | ||||
1.3. Terminology | 1.3. Terminology | |||
Aggregation Point Router (APR): An Aggregation Point Router (APR) is | Aggregation Point Router (APR): An Aggregation Point Router (APR) is | |||
a router that aggregates a Virtual Prefix (VP) by installing | a router that aggregates a Virtual Prefix (VP) by installing | |||
routes (into the FIB) for all of the sub-prefixes within the VP. | routes (into the FIB) for all of the sub-prefixes within the VP. | |||
APRs advertise the VP to other routers with BGP. For each sub- | APRs advertise the VP to other routers with BGP. For each sub- | |||
prefix within the VP, APRs have a tunnel from themselves to the | prefix within the VP, APRs have a tunnel from themselves to the | |||
remote ASBR (Autonomous System Border Router) where packets for | remote ASBR (Autonomous System Border Router) where packets for | |||
that prefix should be delivered. | that prefix should be delivered. | |||
Install and Suppress: The terms "install" and "suppress" are used to | Install and Suppress: The terms "install" and "suppress" are used to | |||
describe whether a RIB entry has been loaded or not loaded into | describe whether a RIB entry has been loaded or not loaded into | |||
the FIB. In other words, the phrase "install a route" means | the FIB. In particular, "install a route" means "install a route | |||
"install a route into the FIB", and the phrase "suppress a route" | into the FIB", and "suppress a route" means "do not install a | |||
means "do not install a route into the FIB". | route into the FIB". | |||
Legacy Router: A router that does not run VA, and has no knowledge | Legacy Router: A router that does not run VA, and has no knowledge | |||
of VA. Legacy routers, however, must be able to terminate tunnels | of VA. Legacy routers, however, must be able to terminate tunnels | |||
when they are local ASBRs. | when they are local ASBRs. | |||
non-APR Router: In discussing VPs, it is often necessary to | Non-APR Router: In discussing VPs, it is often necessary to | |||
distinguish between routers that are APRs for that VP, and routers | distinguish between routers that are APRs for that VP, and routers | |||
that are not APRs for that VP (but of course may be APRs for other | that are not APRs for that VP (but of course may be APRs for other | |||
VPs not under discussion). In these cases, the term "APR" is | VPs not under discussion). In these cases, the term "APR" is | |||
taken to mean "a VA router that is an APR for the given VP", and | taken to mean "a VA router that is an APR for the given VP", and | |||
the term "non-APR" is taken to mean "a VA router that is not an | the term "non-APR" is taken to mean "a VA router that is not an | |||
APR for the given VP". The term non-APR router is not used to | APR for the given VP". The term non-APR router is not used to | |||
refer to legacy routers. | refer to legacy routers. | |||
Popular Prefix: A Popular Prefix is a sub-prefix that is installed | Popular Prefix: A Popular Prefix is a sub-prefix that is installed | |||
in a router in addition to the sub-prefixes it holds by virtue of | in a router in addition to the sub-prefixes it holds by virtue of | |||
being a Aggregation Point Router. The Popular Prefix allows | being a Aggregation Point Router. The Popular Prefix allows | |||
skipping to change at page 7, line 4 | skipping to change at page 6, line 32 | |||
in this document to refer either to the loc-RIB (as used in | in this document to refer either to the loc-RIB (as used in | |||
[RFC4271]), or to the combined Adj-RIBs-In, the Loc-RIB, and the | [RFC4271]), or to the combined Adj-RIBs-In, the Loc-RIB, and the | |||
Adj-RIBs-Out. | Adj-RIBs-Out. | |||
Sub-Prefix: A regular (physically aggregatable) prefix. These are | Sub-Prefix: A regular (physically aggregatable) prefix. These are | |||
equivalent to the prefixes that would normally comprise the DFRT | equivalent to the prefixes that would normally comprise the DFRT | |||
in the absence of VA. A VA router will contain a sub-prefix entry | in the absence of VA. A VA router will contain a sub-prefix entry | |||
either because the sub-prefix falls within a Virtual Prefix for | either because the sub-prefix falls within a Virtual Prefix for | |||
which the router is an APR, or because the sub-prefix is installed | which the router is an APR, or because the sub-prefix is installed | |||
as a Popular Prefix. Legacy routers hold the same sub-prefixes | as a Popular Prefix. Legacy routers hold the same sub-prefixes | |||
that they hold today. | that they hold today. | |||
Tunnel: This document specifies the use of MPLS Label Switched Paths | ||||
Tunnel: This draft specifies the use of MPLS Label Switched Paths | ||||
(LSP), and of MPLS inner labels tunneled over either LSPs or IP | (LSP), and of MPLS inner labels tunneled over either LSPs or IP | |||
headers. Other types of tunnels may be used, but are not | headers. While in principle other types of tunnels may be used, | |||
specified here. This document generically uses the term tunnel to | they are not specified here. This document uses the term tunnel | |||
refer to any of these tunnel types. | to refer to the above MPLS encapsulations. | |||
VA router: A router that operates Virtual Aggregation according to | VA router: A router that operates Virtual Aggregation according to | |||
this document. | this document. | |||
Virtual Prefix (VP): A Virtual Prefix (VP) is a prefix used to | Virtual Prefix (VP): A Virtual Prefix (VP) is a prefix used to | |||
aggregate its contained regular prefixes (sub-prefixes). The set | aggregate its contained regular prefixes (sub-prefixes). The set | |||
of sub-prefixes in a VP are not physically aggregatable, and so | of sub-prefixes in a VP are not physically aggregatable, and so | |||
they are aggregated at APRs through the use of tunnels. | they are aggregated at APRs through the use of tunnels. | |||
VP-List: A list of defined VPs. All routers must agree on the | VP-List: A list of defined VPs. All routers must agree on the | |||
contents of this list (which is statically configured into every | contents of this list. | |||
VA router). | ||||
2. Overview of Virtual Aggregation (VA) | 2. Overview of Virtual Aggregation (VA) | |||
For descriptive simplicity, this section starts by describing VA | For descriptive simplicity, this section starts by describing VA | |||
assuming that there are no legacy routers in the domain. Section 2.1 | assuming that there are no legacy routers in the domain. Section 2.1 | |||
overviews the additional functions required by VA routers to | overviews the additional functions required by VA routers to | |||
accommodate legacy routers. | accommodate legacy routers. | |||
A key concept behind VA is to operate BGP as normal, and in | A key concept behind VA is to operate BGP as normal, and in | |||
particular to populate the RIB with the full DFRT, but to suppress | particular to populate the RIB with the full DFRT, but to suppress | |||
many or most prefixes from being loaded into the FIB. By populating | many or most prefixes from being loaded into the FIB. By populating | |||
the RIB as normal, we avoid any changes to BGP, and changes to router | the RIB as normal, we avoid any changes to BGP, and changes to router | |||
operation are relatively minor. The basic idea behind VA is quite | operation are relatively minor. The basic idea behind VA is as | |||
simple. The address space is partitioned into large prefixes --- | follows: The address space is partitioned into large prefixes --- | |||
larger than any aggregatable prefix in use today. These prefixes are | larger than any aggregatable prefix in use today. These prefixes are | |||
called Virtual Prefixes (VP). Different VPs do not need to be the | called Virtual Prefixes (VP). Different VPs do not need to be the | |||
same size. They may be a mix of /6, /7, /8 (for IPv4), and so on. | same size. They may be a mix of /6, /7, /8 (for IPv4), and so on. | |||
Indeed, an ISP can define a single /0 VP, and use it for a core/edge | Indeed, an ISP can define a single /0 VP, and use it for a core/edge | |||
type of configuration [I-D.ietf-grow-simple-va]. That is, the core | type of configuration. That is, the core routers would maintain full | |||
routers would maintain full FIBs, and edge routers could maintain | FIBs, and edge routers could maintain default routes to the core | |||
default routes to the core routers, and suppress as much of the FIB | routers, and suppress as much of the FIB as they wish. Each ISP can | |||
as they wish. Each ISP can independently select the size of its VPs. | independently select the size of its VPs. | |||
VPs are not themselves topologically aggregatable. VA makes the VPs | VPs are not themselves topologically aggregatable. VA makes the VPs | |||
aggregatable through the use of tunnels, as follows. Associated with | aggregatable through the use of tunnels, as follows. Associated with | |||
each VP are one or more "Aggregation Point Routers" (APR). An APR | each VP are one or more "Aggregation Point Routers" (APR). An APR | |||
(for a given VP) is a router that installs routes for all sub- | (for a given VP) is a router that installs routes for all sub- | |||
prefixes (i.e. real physically aggregatable prefixes) within the VP. | prefixes (i.e. real physically aggregatable prefixes) within the VP. | |||
Note that an APR is not a special router per se---it is an otherwise | By "install routes" here, we mean: | |||
normal router that is configured to operate as an APR. By "install | ||||
routes" here, we mean: | ||||
1. The route for each of the sub-prefixes is loaded into the FIB, | 1. The route for each of the sub-prefixes is loaded into the FIB, | |||
and | and | |||
2. there is a tunnel from the APR to the BGP NEXT_HOP for the route. | 2. there is a tunnel from the APR to the BGP NEXT_HOP for the route. | |||
The APR originates a BGP route to the VP. This route is distributed | The APR originates a BGP route to the VP. This route is distributed | |||
within the domain, but not outside the domain. With this structure | within the domain, but not outside the domain. With this structure | |||
in place, a packet transiting the ISP goes from the ingress router to | in place, a packet transiting the ISP goes from the ingress router to | |||
the APR (usually via a tunnel), and then from the APR to the BGP | the APR (usually via a tunnel), and then from the APR to the BGP | |||
NEXT_HOP router via a tunnel. VA can operate with MPLS LSPs, or with | NEXT_HOP router via a tunnel. VA can operate with MPLS LSPs, or with | |||
MPLS inner labels over LSPs or IP headers. Section 4 specifies the | MPLS inner labels over LSPs or IP headers. Section 4 specifies the | |||
usage of tunnels. | usage of MPLS tunnels. Other tunnel types (i.e., GRE) may be used, | |||
but are not specified in this document. | ||||
The BGP NEXT_HOP can be either the local ASBR or the remote ASBR. In | The BGP NEXT_HOP can be either the local ASBR or the remote ASBR. In | |||
the former case, an inner label is used to tunnel packets | the former case, an inner label is used to tunnel packets | |||
(Section 4.2). In either case, all tunner headers are stripped by | (Section 4.1). In either case, all tunnel headers are stripped by | |||
the local ASBR before the packet is delivered to the remote ASBR. In | the local ASBR before the packet is delivered to the remote ASBR. In | |||
other words, the remote ASBR sees a normal IP packet, and is | other words, the remote ASBR sees a normal IP packet, and is | |||
completely unaware of the existence of VA in the neighboring ISP. | completely unaware of the existence of VA in the neighboring ISP. | |||
Note that legacy ASBRs MUST set themselves as the BGP NEXT_HOP. | ||||
Note that the AS-path is not effected at all by VA. This means among | Note that the AS-PATH is not effected at all by VA. This means among | |||
other things that AS-level policies are not effected by VA. The | other things that AS-level policies are not effected by VA. The | |||
packet may not, however, follow the shortest path within the ISP | packet may not, however, follow the shortest path within the ISP | |||
(where shortest path is defined here as the path that would have been | (where shortest path is defined here as the path that would have been | |||
taken if VA were not operating), because the APR may not be on the | taken if VA were not operating), because the APR may not be on the | |||
shortest path between the ingress and egress routers. When this | shortest path between the ingress and egress routers. When this | |||
happens, the packet experiences additional latency and creates extra | happens, the packet experiences additional latency and creates extra | |||
load (by virtue of taking more hops than it otherwise would have). | load (by virtue of taking more hops than it otherwise would have). | |||
Note also that, with VA, a packet may occasionally take a different | Note also that, with VA, a packet may occasionally take a different | |||
exit point than it otherwise would have. | exit point than it otherwise would have. This can occur for instance | |||
when the exit point nearest to the selected APR is different than the | ||||
exit point nearest to the router initiating the tunnel to the APR. | ||||
VA can avoid traversing the APR for selected routes by installing | VA can avoid traversing the APR for selected routes by installing | |||
these routes in non-APR routers. In other words, even if an ingress | these routes in non-APR routers. In other words, even if an ingress | |||
router is not an APR for a given sub-prefix, it MAY install that sub- | router is not an APR for a given sub-prefix, it MAY install that sub- | |||
prefix into its FIB. Packets in this case are tunneled directly from | prefix into its FIB. Packets in this case are tunneled directly from | |||
the ingress to the BGP NEXT_HOP. These extra routes are called | the ingress to the BGP NEXT_HOP. These extra routes are called | |||
"Popular Prefixes", and are typically installed for policy reasons | "Popular Prefixes", and are typically installed for policy reasons | |||
(e.g. customer routes are always installed), or for sub-prefixes that | (e.g. customer routes are always installed), or for sub-prefixes that | |||
carry a high volume of traffic (Section 3.1.5.1). Different routers | carry a high volume of traffic (Section 3.5.1). Different routers | |||
MAY have different Popular Prefixes. As such, an ISP MAY assign | may have different Popular Prefixes. As such, an ISP may assign | |||
Popular Prefixes per router, per POP, or uniformly across the ISP. A | Popular Prefixes per router, per POP, or uniformly across the ISP. A | |||
given router MAY have zero Popular Prefixes, or the majority of its | given router may have zero Popular Prefixes, or the majority of its | |||
FIB MAY consist of Popular Prefixes. The effectiveness of Popular | FIB may consist of Popular Prefixes. The effectiveness of Popular | |||
Prefixes to reduce traffic load relies on the fact that traffic | Prefixes to reduce traffic load relies on the fact that traffic | |||
volumes follow something like a power-law distribution: i.e. that 90% | volumes follow something like a power-law distribution: i.e. that 90% | |||
of traffic is destined to 10% of the destinations. Internet traffic | of traffic is destined to 10% of the destinations. Internet traffic | |||
measurement studies over the years have consistently shown that | measurement studies over the years have consistently shown that | |||
traffic patterns follow this distribution, though there is no | traffic patterns follow this distribution [nsdi09], though there is | |||
guarantee that they always will. | no guarantee that they always will. | |||
Note that for routing to work properly, every packet must sooner or | Note that for routing to work properly, every packet must sooner or | |||
later reach a router that has installed a sub-prefix route that | later reach a router that has installed a sub-prefix route that | |||
matches the packet. This would obviously be the case for a given | matches the packet. This would obviously be the case for a given | |||
sub-prefix if every router has installed a route for that sub-prefix | sub-prefix if every router has installed a route for that sub-prefix. | |||
(which of course is the situation in the absence of VA). If this is | If this is not the case, then there MUST be at least one Aggregation | |||
not the case, then there MUST be at least one Aggregation Point | Point Router (APR) for the sub-prefix's Virtual Prefix (VP). | |||
Router (APR) for the sub-prefix's Virtual Prefix (VP). Ideally, | Ideally, every POP contains at least two APRs for every Virtual | |||
every POP contains at least two APRs for every Virtual Prefix. By | Prefix. By having APRs in every POP, the latency imposed by routing | |||
having APRs in every POP, the latency imposed by routing to the APR | to the APR is minimal (the extra hop is within the POP). By having | |||
is minimal (the extra hop is within the POP). By having more than | more than one APR, there is a redundant APR should one fail. In | |||
one APR, there is a redundant APR should one fail. In practice it is | practice it is often not possible to have an APR for every VP in | |||
often not possible to have an APR for every VP in every POP. This is | every POP. This is because some POPs may have only one or a few | |||
because some POPs may have only one or a few routers, and therefore | routers, and therefore there may not have enough cumulative FIB space | |||
there may not have enough cumulative FIB space in the POP to hold | in the POP to hold every sub-prefix. Note that any router ("edge", | |||
every sub-prefix. Note that any router ("edge", "core", etc.) MAY | "core", etc.) MAY be an APR. | |||
be an APR. | ||||
It is important that both the contents of BGP RIBs, as well as the | It is important that both the contents of BGP RIBs, as well as the | |||
contents of the Routing Table (as defined in Section 3.2 of | contents of the Routing Table (as defined in Section 3.2 of | |||
[RFC4271]) not be modified by VA (other than the introduction of | [RFC4271]) not be modified by VA (other than the introduction of | |||
routes to VPs). This is because PIM-SM [RFC4601] relies on the | routes to VPs). This is because PIM-SM [RFC4601] relies on the | |||
contents of the Routing Table to build its own trees and forwarding | contents of the Routing Table to build its own trees and forwarding | |||
table. Therefore, FIB suppression MUST take place between the | table. Therefore, FIB suppression MUST take place between the | |||
Routing Table and the actual FIB(s). | Routing Table and the actual FIB(s). | |||
2.1. Mix of legacy and VA routers | 2.1. Mix of Legacy and VA Routers | |||
It is important that an ISP be able to operate with a mix of "VA | It is important that an ISP be able to operate with a mix of "VA | |||
routers" (routers upgraded to operate VA as described in the | routers" and "legacy routers". This allows ISPs to deploy VA in an | |||
document) and "legacy routers". This allows ISPs to deploy VA in an | ||||
incremental fashion and to continue to use routers that for whatever | incremental fashion and to continue to use routers that for whatever | |||
reason cannot be upgraded. This document allows such a mix, and | reason cannot be upgraded. This document allows such a mix, and | |||
indeed places no topological restrictions on that mix. It does, | indeed places no topological restrictions on that mix. It does, | |||
however, require that legacy routers (and VA routers for that matter) | however, require that legacy routers are able to forward tunneled | |||
are able to forward already-tunneled packets, are able to serve as | packets, are able to serve as tunnel endpoints, and are able to | |||
tunnel endpoints, and are able to participate in distribution of | participate in distribution of tunnel information required to | |||
tunnel information required to establish themselves as tunnel | establish themselves as tunnel endpoints. Depending on the tunnel | |||
endpoints. (This is listed as Requirement R5 in the companion | type, legacy routers MAY also be able to initiate tunneled packets, | |||
tunneling documents.) Depending on the tunnel type, legacy routers | though this is an OPTIONAL requirement. Legacy routers MUST use | |||
MAY also be able to initiate tunneled packets, though this is an | their own address as the BGP NEXT_HOP. | |||
OPTIONAL requirement. (This is listed as Requirement R4 in the | ||||
companion tunneling documents.) Legacy routers MUST use their own | ||||
address as the BGP NEXT_HOP, and MUST FIB-install routes for which | ||||
they are the BGP NEXT_HOP. | ||||
2.2. Summary of Tunnels and Paths | 2.2. Summary of Tunnels and Paths | |||
To summarize, the following tunnels are created: | To summarize, the following tunnels are created: | |||
1. From all VA routers to all BGP NEXT_HOP addresses (where the BGP | 1. From all VA routers to all BGP NEXT_HOP addresses (where the BGP | |||
NEXT_HOP address is either an APR, a local ASBR, or the remote | NEXT_HOP address is either an APR, a local ASBR, or the remote | |||
ASBR neighbor of a VA router). Note that this is listed as | ASBR neighbor of a VA router). | |||
Requirement R3 in the companion tunneling documents. | ||||
2. Optionally, from all legacy routers to all BGP NEXT_HOP | 2. Optionally, from all legacy routers to all BGP NEXT_HOP | |||
addresses. | addresses. | |||
There are a number of possible paths that packets may take through an | There are a number of possible paths that packets may take through an | |||
ISP, summarized in the following diagram. Here, "VA" is a VA router, | ISP, summarized in the following diagram. Here, "VA" is a VA router, | |||
"LR" is a legacy router, the symbol "==>" represents a tunneled | "LR" is a legacy router, the symbol "==>" represents a tunneled | |||
packet (through zero or more routers), "-->" represents an untunneled | packet (through zero or more routers), "-->" represents an untunneled | |||
packet, and "(pop)" represents stripping the tunnel header. The | packet, and "(pop)" represents stripping the tunnel header. The | |||
symbol "::>" represents the portion of the path where although the | symbol "::>" represents the portion of the path where although the | |||
tunnel is targeted to the receiving node, the outer header has been | tunnel is targeted to the receiving node, the outer header has been | |||
stripped. (Note that the remote ASBR may actually be a legacy router | stripped. | |||
or a VA router---it doesn't matter (and isn't known) to the ISP.) | ||||
Egress | Egress | |||
Router | Router | |||
Ingress Some APR (Local Remote | Ingress Some APR (Local Remote | |||
Router Router Router ASBR) ASBR | Router Router Router ASBR) ASBR | |||
------- ------ ------ ------ -------- | ------- ------ ------ ------ -------- | |||
1. VA===================>VA=========>VA(pop)::::>LR | 1. VA===================>VA=========>VA(pop)::::>Peer ASBR | |||
2. VA===================>VA=========>LR--------->LR | 2. VA===================>VA=========>LR--------->Peer ASBR | |||
3. VA===============================>VA(pop)::::>LR | 3. VA===============================>VA(pop)::::>Peer ASBR | |||
4. VA===============================>LR--------->LR | 4. VA===============================>LR--------->Peer ASBR | |||
(The following two exist in the case where legacy routers | (The following two exist in the case where legacy routers | |||
can initiate tunneled packets.) | can initiate tunneled packets.) | |||
5. LR===============================>VA(pop)::::>LR | 5. LR===============================>VA(pop)::::>Peer ASBR | |||
6. LR===============================>LR--------->LR | 6. LR===============================>LR--------->Peer ASBR | |||
(The following two exist in the case where legacy routers | (The following two exist in the case where legacy routers | |||
cannot initiate tunneled packets.) | cannot initiate tunneled packets.) | |||
7. LR------->VA (remaining paths as in 1 to 4 above) | 7. LR------->VA (remaining paths as in 1 to 4 above) | |||
8. LR------->LR--------------------->LR--------->LR | 8. LR------->LR--------------------->LR--------->Peer ASBR | |||
The first and second paths represent the case where the ingress | The first and second paths represent the case where the ingress | |||
router does not have a Popular Prefix for the destination, and MUST | router does not have a Popular Prefix for the destination, and MUST | |||
tunnel the packet to an APR. The third and fourth paths represent | tunnel the packet to an APR. The third and fourth paths represent | |||
the case where the ingress router does have a Popular Prefix for the | the case where the ingress router does have a Popular Prefix for the | |||
destination, and so tunnels the packet directly to the egress. The | destination, and so tunnels the packet directly to the egress. The | |||
fifth and sixth paths are similar to the third and fourth paths | fifth and sixth paths are similar to the third and fourth paths | |||
respectively, but where the ingress is a legacy router that can | respectively, but where the ingress is a legacy router that can | |||
initiate tunneled packets, and effectively has the Popular Prefix by | initiate tunneled packets, and effectively has the Popular Prefix by | |||
virtue of holding the entire DFRT. (Note that some ISPs have only | virtue of holding the entire DFRT. (Note that some ISPs have only | |||
partial RIBs in their customer-facing edge routers, and default route | partial RIBs in their customer-facing edge routers, and default route | |||
to a router that holds the full DFRT. This case is not shown here, | to a router that holds the full DFRT. This case is not shown here, | |||
but works perfectly well.) Finally, paths 7 and 8 represent the case | but works perfectly well.) Finally, paths 7 and 8 represent the case | |||
where legacy routers cannot initiate a tunneled packet. | where legacy routers cannot initiate a tunneled packet. | |||
VA prevents the routing loops that might otherwise occur when VA | VA prevents the routing loops that might otherwise occur when VA | |||
routers and legacy routers are mixed. The trick is avoiding the case | routers and legacy routers are mixed. In particular, VA avoids the | |||
where a legacy router is forwarding packets towards the BGP NEXT_HOP, | case where a legacy router is forwarding packets towards the BGP | |||
while a VA router is forwarding packets towards the APR, with each | NEXT_HOP, while a VA router is forwarding packets towards the APR, | |||
router thinking that the other is on the shortest path to their | with each router thinking that the other is on the shortest path to | |||
respective targets. | their respective targets. | |||
In the first four types of path, the loop is avoided because tunnels | In the first four types of path, the loop is avoided because tunnels | |||
are used all the way to the egress. As a result, there is never an | are used all the way to the egress. As a result, there is never an | |||
opportunity for a legacy router to try to route based on the | opportunity for a legacy router to try to route based on the | |||
destination address unless the legacy router is the egress, in which | destination address unless the legacy router is the egress, in which | |||
case it forwards the packet to the remote ASBR. | case it forwards the packet to the remote ASBR. | |||
In the 5th and 6th cases, the ingress is a legacy router, but this | In the 5th and 6th cases, the ingress is a legacy router, but this | |||
router can initiate tunnels and has the full FIB, and so simply | router can initiate tunnels and has the full FIB, and so simply | |||
tunnels the packet to the egress router. | tunnels the packet to the egress router. | |||
In the 7th and 8th cases, the legacy ingress cannot initiate tunnels, | In the 7th and 8th cases, the legacy ingress cannot initiate tunnels, | |||
and so forwards the packet hop-by-hop towards the BGP NEXT_HOP. The | and so forwards the packet hop-by-hop towards the BGP NEXT_HOP. The | |||
packet will work its way towards the egress router, and will either | packet will work its way towards the egress router, and will either | |||
progress through a series of legacy routers (in which case the IGP | progress through a series of legacy routers (in which case the IGP | |||
prevents loops), or it will eventually reach a VA router, after which | prevents loops), or it will eventually reach a VA router, after which | |||
it will take tunnels as in the 1st and 2nd cases. | it will take tunnels as in the 1st and 2nd cases. | |||
3. Specification of VA | 3. Specification of VA | |||
This section describes in detail how to operate VA. It starts with a | This section describes in detail how to operate VA. | |||
brief discussion of requirements, followed by a specification of | ||||
router support for VA. | ||||
3.1. VA Operation | ||||
In this section, the detailed operation of VA is specified. | ||||
3.1.1. Legacy Routers | 3.1. Legacy Routers | |||
VA can operate with a mix of VA and legacy routers. To prevent the | VA can operate with a mix of VA and legacy routers. To prevent the | |||
types of loops described in Section 2.2, however, legacy routers MUST | types of loops described in Section 2.2, however, legacy routers MUST | |||
satisfy the following requirements: | satisfy the following requirements: | |||
1. When forwarding externally-received routes over iBGP, the BGP | 1. When forwarding externally-received routes over iBGP, the BGP | |||
NEXT_HOP attribute MUST be set to the legacy router itself. | NEXT_HOP attribute MUST be set to the legacy router itself. | |||
2. Legacy routers MUST be able to detunnel packets addressed to | 2. Legacy routers MUST be able to detunnel packets addressed to | |||
themselves at the BGP NEXT_HOP address. They MUST also be able | themselves at the BGP NEXT_HOP address. They MUST also be able | |||
to convey the tunnel information needed by other routers to | to convey the tunnel information needed by other routers to | |||
initiate tunneled packets to them. This is listed as | initiate tunneled packets to them. If a legacy router cannot | |||
"Requirement R1" in the companion tunneling documents. If a | detunnel and convey tunnel parameters, then the AS cannot use VA. | |||
legacy router cannot detunnel and convey tunnel parameters, then | ||||
the AS cannot use VA. | ||||
3. Legacy routers MUST be able to forward all tunneled packets. | 3. Legacy routers MUST be able to forward all tunneled packets. | |||
4. Every legacy router MUST hold its complete FIB. (Note, of | 4. Every legacy router MUST hold its complete FIB. Note, however, | |||
course, that this FIB does not necessarily need to contain the | that this FIB does not necessarily need to contain the full DFRT. | |||
full DFRT. This might be the case, for instance, if the router | This might be the case, for instance, if the router is an edge | |||
is an edge router that defaults to a core router.) | router that defaults to a core router. | |||
As long as legacy routers participating in tunneling as described | As long as legacy routers participating in tunneling as described | |||
above there are no topological restrictions on the legacy routers. | above there are no topological restrictions on the legacy routers. | |||
They may be freely mixed with VA routers without the possibility of | They may be freely mixed with VA routers without the possibility of | |||
forming sustained loops (Section 2.2). | forming sustained loops (Section 2.2). | |||
3.1.2. Advertising and Handling Virtual Prefixes (VP) | 3.2. Advertising and Handling Virtual Prefixes (VP) | |||
3.1.2.1. Distinguishing VPs from Sub-prefixes | 3.2.1. Distinguishing VPs from Sub-prefixes | |||
VA routers MUST be able to distinguish VPs from sub-prefixes. This | VA routers MUST be able to distinguish VPs from sub-prefixes. This | |||
is primarily in order to know which routes to install. In | is primarily in order to know which routes to install. In | |||
particular, non-APR routers MUST know which prefixes are VPs before | particular, non-APR routers SHOULD know which prefixes are VPs before | |||
they receive routes for those VPs, for instance when they first boot | they receive routes for those VPs, for instance when they first boot | |||
up. This is in order to avoid the situation where they unnecessarily | up. This is in order to avoid the situation where they unnecessarily | |||
start filling their FIBs with routes that they ultimately don't need | start filling their FIBs with routes that they ultimately don't need | |||
to install (Section 3.1.5). This leads to the following requirement: | to install (Section 3.5). This leads to the following requirement: | |||
It MUST be possible to statically configure the complete list of VPs | It MUST be possible to configure the complete list of VPs into all VA | |||
into all VA routers. This list is known as the VP-List. | routers. This list is known as the VP-List. | |||
3.1.2.2. Limitations on Virtual Prefixes | 3.2.2. Limitations on Virtual Prefixes | |||
From the point of view of best-match routing semantics, VPs are | From the point of view of best-match routing semantics, VPs are | |||
treated identically to any other prefix. In other words, if the | treated identically to any other prefix. In other words, if the | |||
longest matching prefix is a VP, then the packet is routed towards | longest matching prefix is a VP, then the packet is routed towards | |||
the VP. If a packet matching a VP reaches an Aggregation Point | the VP. If a packet matching a VP reaches an APR for that VP, and | |||
Router (APR) for that VP, and the APR does not have a better matching | the APR does not have a better matching route, then the packet is | |||
route, then the packet is discarded by the APR (just as a router that | discarded by the APR (just as a router that originates any prefix | |||
originates any prefix will discard a packet that does not have a | will discard a packet that does not have a better match). | |||
better match). | ||||
The overall semantics of VPs, however, are slightly different from | The overall semantics of VPs, however, are slightly different from | |||
those of real prefixes. Without VA, when a router originates a route | those of real prefixes. Without VA, when a router originates a route | |||
for a (real) prefix, the expectation is that the addresses within the | for a (real) prefix, the expectation is that the addresses within the | |||
prefix are within the originating AS (or a customer of the AS). For | prefix are within the originating AS (or a customer of the AS). For | |||
VPs, this is not the case. APRs originate VPs whose sub-prefixes | VPs, this is not the case. APRs originate VPs whose sub-prefixes | |||
exist in different ASes. Because of this, it is important that VPs | exist in different ASes. Because of this, VPs MUST not be advertised | |||
not be advertised across AS boundaries. | across AS boundaries. This is done with NO_EXPORT Communities | |||
Attribute (Section 3.2.3). | ||||
It is up to individual domains to define their own VPs. VPs MUST be | It is up to individual domains to define their own VPs. VPs MUST be | |||
"larger" (span a larger address space) than any real sub-prefix. If | "larger" (span a larger address space) than any real sub-prefix. If | |||
a VP is smaller than a real prefix, then packets that match the real | a VP is smaller than a real prefix, then packets that match the real | |||
prefix will nevertheless be routed to an APR owning the VP, at which | prefix will nevertheless be routed to an APR owning the VP, at which | |||
point the packet will be dropped if it does not match a sub-prefix | point the packet will be dropped if it does not match a sub-prefix | |||
within the VP (Section 6). | within the VP (Section 6). | |||
(Note that, in principle there are cases where a VP could be smaller | (Note that, in principle there are cases where a VP could be smaller | |||
than a real prefix. This is where the egress router to the real | than a real prefix. This is where the egress router to the real | |||
skipping to change at page 13, line 45 | skipping to change at page 13, line 11 | |||
forward the packet correctly. On the other hand, if the egress | forward the packet correctly. On the other hand, if the egress | |||
router is a legacy router, then the APR could not tunnel matching | router is a legacy router, then the APR could not tunnel matching | |||
packets to the egress. This is because the egress would view the VP | packets to the egress. This is because the egress would view the VP | |||
as a better match, and would loop the packet back to the APR. For | as a better match, and would loop the packet back to the APR. For | |||
this reason we require that VPs be larger than any real prefixes, and | this reason we require that VPs be larger than any real prefixes, and | |||
that APRs never install prefixes larger than a VP in their FIBs.) | that APRs never install prefixes larger than a VP in their FIBs.) | |||
It is valid for a VP to be a subset of another VP. For example, 20/7 | It is valid for a VP to be a subset of another VP. For example, 20/7 | |||
and 20/8 can both be VPs. In fact, this capability is necessary for | and 20/8 can both be VPs. In fact, this capability is necessary for | |||
"splitting" a VP without temporarily increasing the FIB size in any | "splitting" a VP without temporarily increasing the FIB size in any | |||
router. (Section 3.1.2.5). | router. (Section 3.2.5). | |||
3.1.2.3. Aggregation Point Routers (APR) | 3.2.3. Aggregation Point Routers (APR) | |||
Any router MAY be configured as an Aggregation Point Router (APR) for | For each VP for which a router is an APR, the router does the | |||
one or more Virtual Prefixes (VP). For each VP for which a router is | following: | |||
an APR, the router does the following: | ||||
1. The APR MUST originate a BGP route to the VP [RFC4271]. In this | 1. The APR MUST originate a BGP route to the VP. In this route, the | |||
route, the NLRI are all of the VPs for which the router is an | NLRI are all of the VPs for which the router is an APR. This is | |||
APR. This is true even for VPs that are a subset of another VP. | true even for VPs that are a subset of another VP. The ORIGIN is | |||
The ORIGIN is set to INCOMPLETE (value 2), the AS number of the | set to INCOMPLETE (value 2), the AS number of the APR's AS is | |||
APR's AS is used in the AS_PATH, and the BGP NEXT_HOP is set to | used in the AS_PATH, and the BGP NEXT_HOP is set to the address | |||
the address of the APR. The ATOMIC_AGGREGATE and AGGREGATOR | of the APR. The ATOMIC_AGGREGATE and AGGREGATOR attributes are | |||
attributes are not included. | not included. | |||
2. The APR MUST attach a NO_EXPORT Communities Attribute [RFC1997] | 2. The APR MUST attach a NO_EXPORT Communities Attribute [RFC1997] | |||
to the route. | to the route. | |||
3. The APR MUST be able to detunnel packets addressed to itself at | 3. The APR MUST be able to detunnel packets addressed to itself at | |||
its BGP NEXT_HOP address. It MUST also be able to convey the | its BGP NEXT_HOP address. It MUST also be able to convey the | |||
tunnel information needed by other routers to initiate tunneled | tunnel information needed by other routers to initiate tunneled | |||
packets to them (Requirement R1). | packets to them. | |||
4. If a packet is received at the APR whose best match route is the | 4. If a packet is received at the APR whose best match route is the | |||
VP (i.e. it matches the VP but not any sub-prefixes within the | VP (i.e. it matches the VP but not any sub-prefixes within the | |||
VP), then the packet MUST be discarded (see Section 3.1.2.2). | VP), then the packet MUST be discarded (see Section 3.2.2). This | |||
This can be accomplished by never installing a prefix larger than | can be accomplished by never installing a prefix larger than the | |||
the VP into the FIB, or by installing the VP as a route to | VP into the FIB, or by installing the VP as a route to \dev\null. | |||
\dev\null. | ||||
3.1.2.3.1. Selecting APRs | 3.2.3.1. Selecting APRs | |||
An ISP is free to select APRs however it chooses. The details of | An ISP is free to select APRs however it chooses. The details of | |||
this are outside the scope of this document. Nevertheless, a few | this are outside the scope of this document. Nevertheless, a few | |||
comments are made here. In general, APRs should be selected such | comments are made here. In general, APRs should be selected such | |||
that the distance to the nearest APR for any VP is small---ideally | that the distance to the nearest APR for any VP is small---ideally | |||
within the same POP. Depending on the number of routers in a POP, | within the same POP. Depending on the number of routers in a POP, | |||
and the sizes of the FIBs in the routers relative to the DFRT size, | and the sizes of the FIBs in the routers relative to the DFRT size, | |||
it may not be possible for all VPs to be represented in a given POP. | it may not be possible for all VPs to be represented in a given POP. | |||
In addition, there should be multiple APRs for each VP, again ideally | In addition, there should be multiple APRs for each VP, again ideally | |||
in each POP, so that the failure of one does not unduly disrupt | in each POP, so that the failure of one does not unduly disrupt | |||
traffic. | traffic. | |||
Note that, although VPs MUST be larger than real prefixes, there is | 3.2.4. Non-APR Routers | |||
intentionally no mechanism designed to automatically insure that this | ||||
is the case. Such a mechanisms would be dangerous. For instance, if | ||||
an ISP somewhere advertised a very large prefix (a /4, say), then | ||||
this would cause APRs to throw out all VPs that are smaller than | ||||
this. For this reason, VPs MUST be set through static configuration | ||||
only. | ||||
3.1.2.4. Non-APR Routers | ||||
A non-APR router MUST install at least the following routes: | A non-APR router MUST install at least the following routes: | |||
1. Routes to VPs (identifiable using the VP-List). | 1. Routes to VPs (identifiable using the VP-List). | |||
2. Routes to all sub-prefixes that are not covered by any VP in the | 2. Routes to all sub-prefixes that are not covered by any VP in the | |||
VP-List. | VP-List. | |||
If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it | If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it | |||
MUST use the tunnel to forward packets to the BGP NEXT_HOP. | MUST use the tunnel to forward packets to the BGP NEXT_HOP. | |||
When an APR fails, routers must select another APR to send packets to | When an APR fails, routers must select another APR to send packets to | |||
(if there is one). This happens, however, through normal internal | (if there is one). This happens, however, through normal internal | |||
BGP convergence mechanisms. | BGP convergence mechanisms. | |||
skipping to change at page 15, line 15 | skipping to change at page 14, line 20 | |||
2. Routes to all sub-prefixes that are not covered by any VP in the | 2. Routes to all sub-prefixes that are not covered by any VP in the | |||
VP-List. | VP-List. | |||
If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it | If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it | |||
MUST use the tunnel to forward packets to the BGP NEXT_HOP. | MUST use the tunnel to forward packets to the BGP NEXT_HOP. | |||
When an APR fails, routers must select another APR to send packets to | When an APR fails, routers must select another APR to send packets to | |||
(if there is one). This happens, however, through normal internal | (if there is one). This happens, however, through normal internal | |||
BGP convergence mechanisms. | BGP convergence mechanisms. | |||
3.1.2.5. Adding and deleting VPs | 3.2.5. Adding and deleting VPs | |||
An ISP may from time to time wish to reconfigure its VP-List. There | An ISP may from time to time wish to reconfigure its VP-List. There | |||
are a number of reasons for this. For instance, early in its | are a number of reasons for this. For instance, early in its | |||
deployment an ISP may configure one or a small number of VPs in order | deployment an ISP may configure one or a small number of VPs in order | |||
to test VA. As the ISP gets more confident with VA, it may increase | to test VA. As the ISP gets more confident with VA, it may increase | |||
the number of VPs. Or, an ISP may start with a small number of large | the number of VPs. Or, an ISP may start with a small number of large | |||
VPs (i.e. /4's or even one /0), and over time move to more smaller | VPs (i.e. /4's or even one /0), and over time move to more smaller | |||
VPs in order to save even more FIB. In this case, the ISP will need | VPs in order to save even more FIB. In this case, the ISP will need | |||
to "split" a VP. Finally, since the address space is not uniformly | to "split" a VP. Finally, since the address space is not uniformly | |||
populated with prefixes, the ISP may want to change the size of VPs | populated with prefixes, the ISP may want to change the size of VPs | |||
in order to balance FIB size across routers. This can involve both | in order to balance FIB size across routers. This can involve both | |||
splitting and merging VPs. Of course, an ISP must be able to modify | splitting and merging VPs. Of course, an ISP must be able to modify | |||
its VP-List without 1) interrupting service to any destinations, or | its VP-List without 1) interrupting service to any destinations, or | |||
2) temporarily increasing the size of any FIB (i.e. where the FIB | 2) temporarily increasing the size of any FIB (i.e. where the FIB | |||
size during the change is no bigger than its size either before or | size during the change is no bigger than its largest size either | |||
after the change). | before or after the change). | |||
Adding a VP is straightforward. The first step is to configure the | The first step for adding a VP is to configure the APRs for the VP. | |||
APRs for the VP. This causes the APRs to originate routes for the | This causes the APRs to originate routes for the VP. Non-APR routers | |||
VP. Non-APR routers will install this route according to the rules | will install this route according to the rules in Section 3.2.4 even | |||
in Section 3.1.2.4 even though they do not yet recognize that the | though they do not yet recognize that the prefix is a VP. | |||
prefix is a VP. Subsequently the VP is added to the VP-List of non- | Subsequently the VP is added to the VP-List of non-APR routers. The | |||
APR routers. The Non-APR routers can then start suppressing the sub- | Non-APR routers can then start suppressing the sub-prefixes with no | |||
prefixes with no loss of service. | loss of service. | |||
To delete a VP, the process is reversed. First, the VP is removed | To delete a VP, the process is reversed. First, the VP is removed | |||
from the VP-Lists of non-APRs. This causes the non-APRs to install | from the VP-Lists of non-APRs. This causes the non-APRs to install | |||
the sub-prefixes. After all sub-prefixes have been installed, the VP | the sub-prefixes. After all sub-prefixes have been installed, the VP | |||
may be removed from the APRs. | may be removed from the APRs. | |||
In many cases, it is desirable to split a VP. For instance, consider | In many cases, it is desirable to split a VP. For instance, consider | |||
the case where two routers, Ra and Rb, are APRs for the same prefix. | the case where two routers, Ra and Rb, are APRs for the same prefix. | |||
It would be possible to shrink the FIB in both routers by splitting | It would be possible to shrink the FIB in both routers by splitting | |||
the VP into two VPs (i.e. split one /6 into two /7's), and assigning | the VP into two VPs (i.e. split one /6 into two /7's), and assigning | |||
each router to one of the VPs. While this could in theory be done by | each router to one of the VPs. While this could in theory be done by | |||
first deleting the larger VP, and then adding the smaller VPs, doing | first deleting the larger VP, and then adding the smaller VPs, doing | |||
so would temporarily increase the FIB size in non-APRs, which may not | so would temporarily increase the FIB size in non-APRs, which may not | |||
have adequate space for such an increase. For this reason, we allow | have adequate space for such an increase. For this reason, we allow | |||
overlapping VPs. | overlapping VPs. | |||
To split a VP, first the two smaller VPs are added to the VP-Lists of | To split a VP, first the two smaller VPs are added to the VP-Lists of | |||
all non-APR routers (in addition to the larger superset VP). Next, | all non-APR routers (in addition to the larger superset VP). Next, | |||
skipping to change at page 16, line 22 | skipping to change at page 15, line 28 | |||
forward packets to the APRs for the smaller VPs. Next, the larger VP | forward packets to the APRs for the smaller VPs. Next, the larger VP | |||
can be removed from the VP-Lists of all non-APR routers. Finally, | can be removed from the VP-Lists of all non-APR routers. Finally, | |||
the larger VP can be removed from its APRs. | the larger VP can be removed from its APRs. | |||
To merge two VPs, the new larger VP is configured in all non-APRs. | To merge two VPs, the new larger VP is configured in all non-APRs. | |||
This has no effect on FIB size or APR selection, since the smaller | This has no effect on FIB size or APR selection, since the smaller | |||
VPs are better matches. Next the larger VP is configured in its | VPs are better matches. Next the larger VP is configured in its | |||
selected APRs. Next the smaller VPs are deleted from all non-APRs. | selected APRs. Next the smaller VPs are deleted from all non-APRs. | |||
Finally, the smaller VPs are deleted from their corresponding APRs. | Finally, the smaller VPs are deleted from their corresponding APRs. | |||
3.1.3. Border VA Routers | 3.3. Border VA Routers | |||
A VA router that is an ASBR MUST do the following: | A VA router that is an ASBR MUST do the following: | |||
1. When forwarding externally-received routes over iBGP, if a tunnel | 1. When forwarding externally-received routes over iBGP, if a tunnel | |||
with an inner label is used, the ASBR MUST set the BGP NEXT_HOP | with an inner label is used, the ASBR MUST set the BGP NEXT_HOP | |||
attribute to itself. Otherwise, the BGP NEXT_HOP attribute is | attribute to itself. Otherwise, the BGP NEXT_HOP attribute is | |||
left unchanged. | left unchanged. | |||
2. They MUST establish tunnels as described in Section 4. | 2. They MUST establish tunnels as described in Section 4. | |||
3. The ASBR MUST detunnel the packet before forwarding the packet to | 3. The ASBR MUST detunnel the packet before forwarding the packet to | |||
the remote ASBR. In other words, the remote ASBR receives a | the remote ASBR. | |||
normal untunneled packet identical to the packet it would receive | ||||
without VA. | ||||
4. The ASBR MUST be able to forward the packet without a FIB lookup. | 4. The ASBR MUST be able to forward the packet without a FIB lookup. | |||
In other words, the tunnel information itself contains all the | In other words, the tunnel information itself contains all the | |||
information needed by the border router to know which remote ASBR | information needed by the border router to know which remote ASBR | |||
should receive the packet. | should receive the packet. | |||
3.1.4. Advertising and Handling Sub-Prefixes | 3.4. Advertising and Handling Sub-Prefixes | |||
Sub-prefixes are advertised and handled by BGP as normal. VA does | Sub-prefixes are advertised and handled by BGP as normal. VA does | |||
not effect this behavior. The only difference in the handling of | not effect this behavior. The only difference in the handling of | |||
sub-prefixes is that they might not be installed in the FIB, as | sub-prefixes is that they might not be installed in the FIB, as | |||
described in Section 3.1.5. | described in Section 3.5. | |||
In those cases where the route is installed, packets forwarded to | In those cases where the route is installed, packets forwarded to | |||
prefixes external to the AS MUST be transmitted via the tunnel | prefixes external to the AS MUST be transmitted via the tunnel | |||
established as described in Section 3.1.3. | established as described in Section 3.3. | |||
3.1.5. Suppressing FIB Sub-prefix Routes | 3.5. Suppressing FIB Sub-prefix Routes | |||
Any route not for a known VP (i.e. not in the VP-List) is taken to be | Any route not for a known VP (i.e. not in the VP-List) is taken to be | |||
a sub-prefix. The following rules are used to determine if a sub- | a sub-prefix. The following rules are used to determine if a sub- | |||
prefix route can be suppressed. | prefix route can be suppressed. | |||
1. A VA router MUST NOT FIB-install a sub-prefix route for which | 1. A VA router MUST NOT FIB-install a sub-prefix route for which | |||
there is no tunnel to the BGP NEXT_HOP address. This is to | there is no tunnel to the BGP NEXT_HOP address. This is to | |||
prevent a loop whereby the APR forwards the packet hop-by-hop | prevent a loop whereby the APR forwards the packet hop-by-hop | |||
towards the next hop, but a router on the path that has FIB- | towards the next hop, but a router on the path that has FIB- | |||
suppressed the sub-prefix forwards it back to the APR. If there | suppressed the sub-prefix forwards it back to the APR. | |||
is an alternate route to the sub-prefix for which there is a | ||||
tunnel, then that route SHOULD be selected, even if it is less | ||||
attractive according to the normal BGP best path selection | ||||
algorithm. | ||||
2. If the router is an APR, a route for every sub-prefix within the | 2. If the router is an APR, a route for every sub-prefix within the | |||
VP MUST be FIB-installed (subject to the above limitation that | VP MUST be FIB-installed (subject to the above limitation that | |||
there be a tunnel). | there be a tunnel). | |||
3. If a non-APR router has a sub-prefix route that does not fall | 3. If a non-APR router has a sub-prefix route that does not fall | |||
within any VP (as determined by the VP-List), then the route MUST | within any VP (as determined by the VP-List), then the route MUST | |||
be installed. This may occur because the ISP hasn't defined a VP | be installed. This may occur because the ISP hasn't defined a VP | |||
covering that prefix, for instance during an incremental | covering that prefix, for instance during an incremental | |||
deployment buildup. | deployment build-up. | |||
4. If an ASBR is using strict uRPF to do ingress filtering, then it | 4. If an ASBR is using strict uRPF to do ingress filtering, then it | |||
MUST install routes for which the remote ASBR is the BGP NEXT_HOP | MUST install routes for which the remote ASBR is the BGP NEXT_HOP | |||
[RFC2827]. Note that only a APR may do loose uRPF filtering, and | [RFC2827]. Note that only an APR may do loose uRPF filtering, | |||
then only for routes to sub-prefixes within its VPs. | and then only for routes to sub-prefixes within its VPs. | |||
5. All other sub-prefix routes MAY be suppressed. Such "optional" | 5. All other sub-prefix routes MAY be suppressed. Such "optional" | |||
sub-prefixes that are nevertheless installed are referred to as | sub-prefixes that are nevertheless installed are referred to as | |||
Popular Prefixes. Note, however, that whether or not to install | Popular Prefixes. Note, however, that whether or not to install | |||
a given sub-prefix SHOULD NOT be based on whether or not there is | a given sub-prefix SHOULD NOT be based on whether or not there is | |||
an active route to a VP in the VP-List. This avoids the | an active route to a VP in the VP-List. This avoids the | |||
situation whereby, during BGP initialization, the router receives | situation whereby, during BGP initialization, the router receives | |||
some sub-prefix routes before receiving the corresponding VP | some sub-prefix routes before receiving the corresponding VP | |||
route, with the result that it installs routes in its FIB that it | route, with the result that it installs routes in its FIB that it | |||
will only remove a short time later, possibly even overflowing | will only remove a short time later, possibly even overflowing | |||
its FIB. | its FIB. | |||
3.1.5.1. Selecting Popular Prefixes | 3.5.1. Selecting Popular Prefixes | |||
Individual routers MAY independently choose which sub-prefixes are | Individual routers MAY independently choose which sub-prefixes are | |||
Popular Prefixes. There is no need for different routers to install | Popular Prefixes. There is no need for different routers to install | |||
the same sub-prefixes. There is therefore significant leeway as to | the same sub-prefixes. There is therefore significant leeway as to | |||
how routers select Popular Prefixes. As a general rule, routers | how routers select Popular Prefixes. As a general rule, routers | |||
should fill the FIB as much as possible, because the cost of doing so | should fill the FIB as much as possible, because the cost of doing so | |||
is relatively small, and more FIB entries leads to fewer packets | is relatively small, and more FIB entries leads to fewer packets | |||
taking a longer path. Broadly speaking, an ISP may choose to fill | taking a longer path. Broadly speaking, an ISP may choose to fill | |||
the FIB by making routers APRs for as many VPs as possible, or by | the FIB by making routers APRs for as many VPs as possible, or by | |||
assigning relatively few APRs and rather filling the FIB with Popular | assigning relatively few APRs and rather filling the FIB with Popular | |||
skipping to change at page 18, line 18 | skipping to change at page 17, line 16 | |||
1. Policy-based: The simplest approach for network administrators is | 1. Policy-based: The simplest approach for network administrators is | |||
to have broad policies that routers use to determine which sub- | to have broad policies that routers use to determine which sub- | |||
prefixes are designated as popular. An obvious policy would be a | prefixes are designated as popular. An obvious policy would be a | |||
"customer routes" policy, whereby all customer routes are | "customer routes" policy, whereby all customer routes are | |||
installed (as identified for instance by appropriate community | installed (as identified for instance by appropriate community | |||
attribute tags). Another policy would be for a router to install | attribute tags). Another policy would be for a router to install | |||
prefixes originated by specific ASes. For instance, two ISPs | prefixes originated by specific ASes. For instance, two ISPs | |||
could mutually agree to install each other's originated prefixes. | could mutually agree to install each other's originated prefixes. | |||
A third policy might be to install prefixes with the shortest AS- | A third policy might be to install prefixes with the shortest AS- | |||
path. | PATH. | |||
2. Static list: Another approach would be to configure static lists | 2. Static list: Another approach would be to configure static lists | |||
of specific prefixes to install. For instance, prefixes | of specific prefixes to install. For instance, prefixes | |||
associated with an SLA might be configured. Or, a list of | associated with an SLA might be configured. Or, a list of | |||
prefixes for the most popular websites might be installed. | prefixes for the most popular websites might be installed. | |||
3. High-volume prefixes: By installing high-volume prefixes as | 3. High-volume prefixes: By installing high-volume prefixes as | |||
Popular Prefixes, the latency and load associated with the longer | Popular Prefixes, the latency and load associated with the longer | |||
path required by VA is minimized. One approach would be for an | path required by VA is minimized. One approach would be for an | |||
ISP to measure its traffic volume over time (days or a few | ISP to measure its traffic volume over time (days or a few | |||
weeks), and statically configure high-volume prefixes as Popular | weeks), and statically configure high-volume prefixes as Popular | |||
Prefixes. There is strong evidence that prefixes that are high- | Prefixes. There is strong evidence that prefixes that are high- | |||
volume tend to remain high-volume over multi-day or multi-week | volume tend to remain high-volume over multi-day or multi-week | |||
timeframes (though not necessarily at short timeframes like | timeframes (though not necessarily at short timeframes like | |||
minutes or seconds). High-volume prefixes MAY also be installed | minutes or seconds). High-volume prefixes MAY also be installed | |||
dynamically. In other words, a router measures its own traffic | automatically. For this, a router measures its own traffic | |||
volumes, and installs and removes Popular Prefixes in response to | volumes, and installs and removes Popular Prefixes in response to | |||
short term traffic load. The downside of this approach is that | changes in traffic load. The downside of this approach is that | |||
it complicates debugging network problems. If packets are being | it complicates debugging network problems. If packets are being | |||
dropped somewhere in the network, it is more difficult to find | dropped somewhere in the network, it is more difficult to find | |||
out where if the selected path can change dynamically. | out where if the selected path can change dynamically. | |||
3.2. New Configuration | 3.6. New Configuration | |||
VA places new configuration requirements on ISP administrators. | VA places new configuration requirements on ISP administrators. | |||
Namely, the administrator must: | Namely, the administrator does the following. | |||
1. Select VPs, and configure the VP-List into all VA routers. As a | 1. Select VPs, and configure the VP-List into all VA routers. As a | |||
general rule, having a larger number of relatively small prefixes | general rule, having a larger number of relatively small prefixes | |||
gives administrators the most flexibility in terms of filling | gives administrators the most flexibility in terms of filling | |||
available FIB with sub-prefixes, and in terms of balancing load | available FIB with sub-prefixes, and in terms of balancing load | |||
across routers. Once an administrator has selected a VP-List, it | across routers. Once an administrator has selected a VP-List, it | |||
is just as easy to configure routers with a large list as a small | is just as easy to configure routers with a large list as a small | |||
list. We can expect network operator groups like NANOG to | list. A good list might be one where the number of VPs is | |||
compile good VP-Lists that ISPs can then adopt. A good list | relatively large, say 100 or so (noting again that each VP must | |||
would be one where the number of VPs is relatively large, say 100 | be smaller than a real prefix), and the number of sub-prefixes | |||
or so (noting again that each VP must be smaller than a real | within each VP is roughly the same. | |||
prefix), and the number of sub-prefixes within each VP is roughly | ||||
the same. | ||||
2. Select and configure APRs. There are three primary | 2. Select and configure APRs. There are three primary | |||
considerations here. First, there must be enough APRs to handle | considerations here. First, there needs to be enough APRs to | |||
reasonable APR failure scenarios. Second, APR assignment should | failover to should one or more APRs crash. Second, APR | |||
not result in router overload. Third, particularly long paths | assignment should not result in router overload. Third, | |||
should be avoided. Ideally there should be two APRs for each VP | excessively long paths should be avoided. Ideally there should | |||
within each PoP, but this may not be possible for small PoPs. | be two APRs for each VP within each PoP, but this may not be | |||
Failing this, there should be at least two APRs in each | possible for small PoPs. Failing this, there should be at least | |||
geographical region, so as to minimize path length increase. | two APRs in each geographical region, so as to minimize path | |||
Routers should have the appropriate counters to allow | length increase. Routers should have the appropriate counters to | |||
administrators to know the volume of APR traffic each router is | allow administrators to know the volume of APR traffic each | |||
handling so as to adjust load by adding or removing APR | router is handling so as to adjust load by adding or removing APR | |||
assignments. | assignments. | |||
3. Select and configure Popular Prefixes or Popular Prefix policies. | 3. Select and configure Popular Prefixes or Popular Prefix policies. | |||
There are two general goals here. The first is to minimize load | There are two general goals here. The first is to minimize load | |||
overall by minimizing the number of packets that take longer | overall by minimizing the number of packets that take longer | |||
paths. The second is to insure that specific selected prefixes | paths. The second is to insure that specific selected prefixes | |||
don't have overly long paths. These goals must be weighed | don't have overly long paths. These goals have to be weighed | |||
against the administrative overhead of configuring potentially | against the administrative overhead of configuring potentially | |||
thousands of Popular Prefixes. As one example a small ISP may | thousands of Popular Prefixes. As one example a small ISP may | |||
wish to keep it simple by doing nothing more than indicating that | wish to keep it simple by doing nothing more than indicating that | |||
customer routes should be installed. In this case, the | customer routes should be installed. In this case, the | |||
administrator could otherwise assign as many APRs as possible | administrator could otherwise assign as many APRs as possible | |||
while leaving enough FIB space for customer routes. As another | while leaving enough FIB space for customer routes. As another | |||
example, a large ISP could build a management system that takes | example, a large ISP could build a management system that takes | |||
into consideration the traffic matrix, customer SLAs, robustness | into consideration the traffic matrix, customer SLAs, robustness | |||
requirements, FIB sizes, topology, and router capacity, and | requirements, FIB sizes, topology, and router capacity, and | |||
periodically automatically computes APR and Popular Prefix | periodically automatically computes APR and Popular Prefix | |||
assignments. | assignments. | |||
4. Usage of Tunnels | 3.7. Interaction with Traffic Engineering | |||
4.1. MPLS tunnels | In VA, some traffic traverses an APR as an intermediate "hop", and | |||
some does not. For that traffic that does not, there is no | ||||
difference between how that traffic is handled and how it is handled | ||||
in a non-VA network with edge-to-edge tunnels. As a result, there | ||||
should be no difference in how traffic engineering operates on that | ||||
traffic. | ||||
For traffic that does traverse APR "hop", the following holds: Any | ||||
traffic engineering decisions that affect the BGP NEXT_HOP must be | ||||
made at the APR. Traffic engineering decisions that effects the | ||||
router path through the AS may be handled in one of two ways. First, | ||||
the path decision may simply be made twice independently, once for | ||||
the ingress-to-APR tunnel, and once for the APR-to-egress tunnel. | ||||
This approach requires no changes to the traffic engineering | ||||
mechanisms per se, but it may not make optimal path selection | ||||
decisions. Second, the traffic engineering decision may take into | ||||
account both tunnels, even to the point of choosing among multiple | ||||
transit APRs. This approach may be more optimal, but is more complex | ||||
and requires changes to existing mechanisms. | ||||
Overall, if the majority of traffic does not involve an APR "hop", | ||||
for instance through the use of popular prefixes, then VA in any | ||||
event has a minimal impact on traffic engineering, and so the impact | ||||
of VA may potentially be ignored. | ||||
4. Usage of MPLS Tunnels | ||||
VA utilizes a straight-forward application of MPLS. The tunnels are | VA utilizes a straight-forward application of MPLS. The tunnels are | |||
MPLS Label Switched Paths (LSP), and are signaled using either the | MPLS Label Switched Paths (LSP), and are signaled using either the | |||
Label Distribution Protocol (LDP) [RFC5036] or RSVP-TE [RFC3209]. | Label Distribution Protocol (LDP) [RFC5036] or RSVP-TE [RFC3209]. | |||
Both VA and legacy routers MUST participate in this signaling. | Both VA and legacy routers MUST participate in this signaling. | |||
APRs and ASBRs initiate tunnels. In both cases, Downstream | APRs and ASBRs initiate tunnels. In both cases, Downstream | |||
Unsolicited tunnels are initiated to all IGP neighbors with the full | Unsolicited tunnels are initiated to all IGP neighbors with the full | |||
BGP NEXT_HOP address as the Forwarding Equivalence Class (FEC). In | BGP NEXT_HOP address as the Forwarding Equivalence Class (FEC). In | |||
the case of APRs, the BGP NEXT_HOP is the APR's own address. In the | the case of APRs, the BGP NEXT_HOP is the APR's own address. In the | |||
case of legacy ASBRs, the BGP NEXT_HOP is the ASBR's own address. In | case of legacy ASBRs, the BGP NEXT_HOP is the ASBR's own address. In | |||
the case of VA ASBRs, the BGP NEXT_HOP is that of the remote ASBR. | the case of VA ASBRs, the BGP NEXT_HOP is that of the remote ASBR. | |||
Existing Penultimate Hop Popping (PHP) mechanisms in the data plane | Existing Penultimate Hop Popping (PHP) mechanisms in the data plane | |||
can be used for forwarding packets to remote ASBRs. | can be used for forwarding packets to remote ASBRs. | |||
4.2. Usage of Inner Label | 4.1. Usage of Inner Label | |||
Besides using a separate LSP to identify the remote ASBR as described | Besides using a separate LSP to identify the remote ASBR as described | |||
above, it is also possible to use an inner label to identify the | above, it is also possible to use an inner label to identify the | |||
remote ASBR. Either an outer label or an IP tunnel identifies the | remote ASBR. Either an outer label or an IP tunnel identifies the | |||
local ASBR. | local ASBR. | |||
When a local ASBR advertises a route into iBGP, it sets the NEXT_HOP | When a local ASBR advertises a route into iBGP, it sets the NEXT_HOP | |||
to itself, and assigns a label to the route. This label is used as | to itself, and assigns a label to the route. This label is used as | |||
the inner label, and identifies the remote ASBR from which the route | the inner label, and identifies the remote ASBR from which the route | |||
was received [RFC3107]. | was received [RFC3107]. | |||
skipping to change at page 20, line 35 | skipping to change at page 20, line 12 | |||
packet to the NEXT_HOP address. Otherwise, an IP header address to | packet to the NEXT_HOP address. Otherwise, an IP header address to | |||
the NEXT_HOP address is used. | the NEXT_HOP address is used. | |||
5. IANA Considerations | 5. IANA Considerations | |||
There are no IANA considerations. | There are no IANA considerations. | |||
6. Security Considerations | 6. Security Considerations | |||
We consider the security implications of VA under two scenarios, one | We consider the security implications of VA under two scenarios, one | |||
where VA is configured and operated correctly, and one where it is | where VA is assumed to be configured and operated correctly, and one | |||
mis-configured. A cornerstone of VA operation is that the basic | where it is mis-configured. A cornerstone of VA operation is that | |||
behavior of BGP doesn't change, especially inter-domain. Among other | the basic behavior of BGP doesn't change, especially inter-domain. | |||
things, this makes it easier to reason about security. | Among other things, this makes it easier to reason about security. | |||
6.1. Properly Configured VA | 6.1. Properly Configured VA | |||
If VA is configured and operated properly, then the external behavior | If VA is configured and operated properly, then the external behavior | |||
of an AS does not change. The same upstream ASes are selected, and | of an AS does not change. The same upstream ASes are selected, and | |||
the same prefixes and AS-paths are advertised. Therefore, a properly | the same prefixes and AS-PATHs are advertised. Therefore, a properly | |||
configured VA domain has no security impact on other domains. | configured VA domain has no security impact on other domains. | |||
If another ISP starts advertising a prefix that is larger than a | If another ISP starts advertising a prefix that is larger than a | |||
given VP, this prefix will be ignored by APRs that have a VP that | given VP, this prefix will be ignored by APRs that have a VP that | |||
falls within the larger prefix (Section 3.1.2.3). As a result, | falls within the larger prefix (Section 3.2.3). As a result, packets | |||
packets that might otherwise have been routed to the new larger | that might otherwise have been routed to the new larger prefix will | |||
prefix will be dropped at the APRs. Note that the trend in the | be dropped at the APRs. Note that the trend in the Internet is | |||
Internet is towards large prefixes being broken up into smaller ones, | towards large prefixes being broken up into smaller ones, not the | |||
not the reverse. Therefore, such a larger prefix is likely to be | reverse. Therefore, such a larger prefix is likely to be invalid. | |||
invalid. If it is determined without a doubt that the larger prefix | If it is determined without a doubt that the larger prefix is valid, | |||
is valid, then the ISP will have to reconfigure its VPs. | then the ISP will have to reconfigure its VPs. | |||
VA does not change an ISP's ability to do ingress filtering using | VA does not change an ISP's ability to do ingress filtering using | |||
strict uRPF (Section 3.1.5). | strict uRPF (Section 3.5). | |||
Regarding DoS attacks, there are two issues that need to be | Regarding DoS attacks, there are two issues that need to be | |||
considered. First, does VA result in new types of DoS attacks? | considered. First, does VA result in new types of DoS attacks? | |||
Second, does VA make it more difficult to deploy DoS defense systems. | Second, does VA make it more difficult to deploy DoS defense systems. | |||
Regarding the first issue, one possibility is that an attacker | Regarding the first issue, one possibility is that an attacker | |||
targets a given router by flooding the network with traffic to | targets a given router by flooding the network with traffic to | |||
prefixes that are not popular, and for which that router is an APR. | prefixes that are not popular, and for which that router is an APR. | |||
This would cause a disproportionate amount of traffic to be forwarded | This would cause a disproportionate amount of traffic to be forwarded | |||
to the APR(s). While it is up to individual ISPs to decide if this | to the APR(s). While it is up to individual ISPs to decide if this | |||
attack is a concern, it does not strike the authors that this attack | attack is a concern, it does not strike the authors that this attack | |||
skipping to change at page 21, line 34 | skipping to change at page 21, line 9 | |||
Many DoS defense systems use dynamically established Routing Table | Many DoS defense systems use dynamically established Routing Table | |||
entries to divert victims' traffic into LSPs that carry the traffic | entries to divert victims' traffic into LSPs that carry the traffic | |||
to scrubbers. This mechanism works with VA---it simply over-rides | to scrubbers. This mechanism works with VA---it simply over-rides | |||
whatever route is in place. This mechanism works equally well with | whatever route is in place. This mechanism works equally well with | |||
APRs and non-APRs. | APRs and non-APRs. | |||
6.2. Mis-configured VA | 6.2. Mis-configured VA | |||
VA introduces the possibility that a VP is advertised outside of an | VA introduces the possibility that a VP is advertised outside of an | |||
AS. This in fact should be a low probability event, but it is | AS. This in fact should be a low probability event since routers | |||
considered here none-the-less. | filter these, but it is considered here none-the-less. | |||
If an AS leaks a large VP (i.e. larger than any real prefixes), then | If an AS leaks a large VP (i.e. larger than any real prefixes), then | |||
the impact is minimal. Smaller prefixes will be preferred because of | the impact is minimal. Smaller prefixes will be preferred because of | |||
best-match semantics, and so the only impact is that packets that | best-match semantics, and so the only impact is that packets that | |||
otherwise have no matching routes will be sent to the misbehaving AS | otherwise have no matching routes will be sent to the misbehaving AS | |||
and dropped there. If an AS leaks a small VP (i.e. smaller than a | and dropped there. If an AS leaks a small VP (i.e. smaller than a | |||
real prefix), then packets to that AS will be hijacked by the | real prefix), then packets to that AS will be hijacked by the | |||
misbehaving AS and dropped. This can happen with or without VA, and | misbehaving AS and dropped. (This can happen with or without VA, and | |||
so doesn't represent a new security problem per se. | so doesn't represent a new security problem per se.) | |||
Although VPs MUST be larger than real prefixes, there is | ||||
intentionally no mechanism designed to automatically insure that this | ||||
is the case. Such a mechanisms would be dangerous. For instance, if | ||||
an ISP somewhere advertised a very large prefix (a /4, say), then | ||||
this would cause APRs to throw out all VPs that are smaller than | ||||
this. For this reason, VPs must be set through configuration only. | ||||
7. Acknowledgements | 7. Acknowledgements | |||
The authors would like to acknowledge the efforts of Xinyang Zhang | The authors would like to acknowledge the efforts of Xinyang Zhang | |||
and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an | and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an | |||
early inter-domain variant of FIB suppression, and the efforts of | early inter-domain variant of FIB suppression, and the efforts of | |||
Hitesh Ballani and Tuan Cao, who worked on the configuration-only | Hitesh Ballani and Tuan Cao, who worked on the configuration-only | |||
variant of VA that works with legacy routers. We would also like to | variant of VA that works with legacy routers. We would also like to | |||
thank Scott Brim, Daniel Ginsburg, and Rajiv Asati for their helpful | thank Scott Brim, Daniel Ginsburg, and Rajiv Asati for their helpful | |||
comments. In particular, Daniel's comments significantly simplified | comments. In particular, Daniel's comments significantly simplified | |||
the spec (eliminating the need for a new External Communities | the spec (eliminating the need for a new Extended Communities | |||
Attribute). | Attribute). Finally, we would like to thank Wes George, Med | |||
Boucadair, and Bruno Decraene for their reviews and suggestions. | ||||
8. References | 8. References | |||
8.1. Normative References | 8.1. Normative References | |||
[RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP | [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP | |||
Communities Attribute", RFC 1997, August 1996. | Communities Attribute", RFC 1997, August 1996. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
End of changes. 100 change blocks. | ||||
280 lines changed or deleted | 262 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |