draft-ietf-grow-bgp-wedgies-02.txt | draft-ietf-grow-bgp-wedgies-03.txt | |||
---|---|---|---|---|
GROW T. Griffin | GROW T. Griffin | |||
Internet-Draft University of Cambridge | Internet-Draft University of Cambridge | |||
Expires: October 16, 2005 G. Huston | Expires: December 12, 2005 G. Huston | |||
APNIC | APNIC | |||
April 14, 2005 | June 10, 2005 | |||
BGP Wedgies | BGP Wedgies | |||
draft-ietf-grow-bgp-wedgies-02.txt | draft-ietf-grow-bgp-wedgies-03.txt | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is subject to all provisions | By submitting this Internet-Draft, each author represents that any | |||
of Section 3 of RFC 3667. By submitting this Internet-Draft, each | applicable patent or other IPR claims of which he or she is aware | |||
author represents that any applicable patent or other IPR claims of | have been or will be disclosed, and any of which he or she becomes | |||
which he or she is aware have been or will be disclosed, and any of | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
which he or she become aware will be disclosed, in accordance with | ||||
RFC 3668. | ||||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on October 16, 2005. | This Internet-Draft will expire on December 12, 2005. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2005). | Copyright (C) The Internet Society (2005). | |||
Abstract | Abstract | |||
It has commonly been assumed that the Border Gateway Protocol (BGP) | It has commonly been assumed that the Border Gateway Protocol (BGP) | |||
is a tool for distributing reachability information in a manner that | is a tool for distributing reachability information in a manner that | |||
creates forwarding paths in a deterministic manner. In this memo we | creates forwarding paths in a deterministic manner. In this memo we | |||
skipping to change at page 2, line 42 | skipping to change at page 2, line 40 | |||
exchange. In the same vein the local network is often configured to | exchange. In the same vein the local network is often configured to | |||
prefer routes learned from a peer or a customer over those learned | prefer routes learned from a peer or a customer over those learned | |||
from a directly connected upstream transit provider. These | from a directly connected upstream transit provider. These | |||
preferences may be expressed via a local preference configuration | preferences may be expressed via a local preference configuration | |||
setting, where the local preference overrides the AS path length | setting, where the local preference overrides the AS path length | |||
metric of the base BGP operation. | metric of the base BGP operation. | |||
In terms of engineering reliability in the inter-domain routing | In terms of engineering reliability in the inter-domain routing | |||
environment it is commonly the case that a service provider may enter | environment it is commonly the case that a service provider may enter | |||
into arrangements with two or more upstream transit providers, | into arrangements with two or more upstream transit providers, | |||
passing routes to both providers , and receiving traffic from both | passing routes to all upstream providers, and receiving traffic from | |||
sources. If the path to one upstream fails the traffic will switch | all sources. If the path to one upstream fails the traffic will | |||
to other links, and once the path is recovered, the traffic should | switch to other links. Once the path is recovered, the traffic | |||
switch back. | should switch back. | |||
In such situations of multiple upstream providers it is also | In such situations of multiple upstream providers it is also | |||
commonplace to place a relative preference on the providers, so that | commonplace to place a relative preference on the providers, so that | |||
one connection is regarded as a preferred, or "primary" connection, | one connection is regarded as a preferred, or "primary" connection, | |||
and other connections are regarded as less preferred, or "backup" | and other connections are regarded as less preferred, or "backup" | |||
connections. The intent is typically that the backup connections | connections. The intent is typically that the backup connections | |||
will be used for traffic only for the duration of a failure in the | will be used for traffic only for the duration of a failure in the | |||
primary connection. | primary connection. | |||
It is possible to express this primary / backup policy using local AS | It is possible to express this primary / backup policy using local AS | |||
skipping to change at page 3, line 34 | skipping to change at page 3, line 32 | |||
is no other source of the route. | is no other source of the route. | |||
3. BGP Wedgies | 3. BGP Wedgies | |||
The richness of local policy expression through the use of | The richness of local policy expression through the use of | |||
communities, when coupled with the behavior of a distance vector | communities, when coupled with the behavior of a distance vector | |||
protocol like BGP leads to the observation that certain | protocol like BGP leads to the observation that certain | |||
configurations have more than one "solution", or more than one stable | configurations have more than one "solution", or more than one stable | |||
BGP state. An example of such a situation is indicated in Figure 1. | BGP state. An example of such a situation is indicated in Figure 1. | |||
+----+ +----+ | ||||
|AS 3|----------------|AS 4| | ||||
+----+ peer peer +----+ | +----+ peer peer +----+ | |||
|provider |provider | |AS 3|------------------------|AS 4| | |||
+----+ +----+ | ||||
|provider provider| | ||||
| | | ||||
| | | | | | |||
|customer | | |customer | | |||
+----+ | | +----+ | | |||
|AS 2| | | |AS 2| | | |||
+----+ | | +----+ | | |||
|provider | | |provider | | |||
| | | | | | |||
|customer |customer | | | | |||
+-------+ +----------+ | |customer customer| | |||
backup| |primary | +---------------+ +----------+ | |||
backup service| |primary service | ||||
+----+ | +----+ | |||
|AS 1| | |AS 1| | |||
+----+ | +----+ | |||
Figure 1 | ||||
Figure 1 | ||||
In this case AS1 has marked its advertisement of prefixes to AS2 as | In this case AS1 has marked its advertisement of prefixes to AS2 as | |||
"backup only", and its advertisement of prefixes to AS4 as "primary". | "backup only", and its advertisement of prefixes to AS4 as "primary". | |||
AS3 will hear AS4's advertisement across the peering link, and pick | AS4 will advertise AS1's prefixes to AS3. AS3 will hear AS4's | |||
of AS1's prefixes with the path "AS4, AS1". AS3 will advertise this | advertisement across the peering link, and select AS1's prefixes with | |||
to AS2. AS2 will hear two paths to AS1, the first is by the direct | the path "AS4, AS1". AS3 will advertise these prefixes to AS2. AS2 | |||
will hear two paths to AS1's prefixes, the first is via the direct | ||||
connection to AS1, and the second is via the path "AS3, AS4, AS1". | connection to AS1, and the second is via the path "AS3, AS4, AS1". | |||
AS2 will prefer the longer path as the directly connected routes are | AS2 will prefer the longer path, as the directly connected routes are | |||
marked "backup only", and AS2's local preference decision will prefer | marked "backup only", and AS2's local preference decision will prefer | |||
the AS3 advertisement over the AS1 advertisement. | the AS3 advertisement over the AS1 advertisement. | |||
This is the intended outcome of AS1's policy settings, where no | This is the intended outcome of AS1's policy settings, where in the | |||
traffic passes from AS2 to AS1, and AS2, reaches AS1 via a path that | 'normal' state no traffic passes from AS2 to AS1 across the backup | |||
transits AS3 and AS4. | link, and AS2 reaches AS1 via a path that transits AS3 and AS4, using | |||
the primary link to AS1. | ||||
This intended outcome is achieved as long as AS1 announces its routes | This intended outcome is achieved as long as AS1 announces its routes | |||
on the primary path, to AS4, before announcing its backup routes to | on the primary path to AS4 before announcing its backup routes to | |||
AS2. | AS2. | |||
If the AS1 - AS4 path is broken, causing aBGP sesssion failure | If the AS1 - AS4 path is broken, causing aBGP sesssion failure | |||
between AS1 and AS4, then AS4 will withdraw its advertisement of | between AS1 and AS4, then AS4 will withdraw its advertisement of | |||
AS1's routes to AS3, who, in turn will send a withdrawal to AS2. | AS1's routes to AS3, who, in turn, will send a withdrawal to AS2. | |||
As2, will then select the backup path to AS1. AS2 will advertise | AS2, will then select the backup path to AS1. AS2 will advertise | |||
this path to AS3, and AS3 will advertise this path to AS4. Again, | this path to AS3, and AS3 will advertise this path to AS4. Again, | |||
this is part of the intended operation of the primary / backup policy | this is part of the intended operation of the primary / backup policy | |||
setting. | setting, and all traffic to AS1 will use the backup path. | |||
When connectivity between AS4 and AS1 is restored the BGP state will | When connectivity between AS4 and AS1 is restored the BGP state will | |||
not revert to the original state. AS4 will learn the primary path to | not revert to the original state. AS4 will learn the primary path to | |||
AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3, | AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3, | |||
using a default preference of preferring customer-advertised routes | using a default preference of preferring customer-advertised routes | |||
over peer routes will continue to prefer the "AS2, AS1" path. AS3 | over peer routes will continue to prefer the "AS2, AS1" path. AS3 | |||
will not pass any updates to AS2. After the restoration of the | will not pass any updates to AS2. After the restoration of the AS4 | |||
circuit traffic from AS3 to AS1 and from AS2 to AS1 will be presented | to AS1 circuit the traffic from AS3 to AS1 and from AS2 to AS1 will | |||
to AS1 via the backup path, even through the primary path via AS4 is | be presented to AS1 via the backup path, even through the primary | |||
in service. | path via AS4 is back in service. | |||
The intended forwarding state can only be restored by AS1 | The intended forwarding state can only be restored by AS1 | |||
deliberately bringing down its eBGP session with AS2, even though it | deliberately bringing down its eBGP session with AS2, even though it | |||
is carrying traffic. This will cause the BGP state to revert to the | is carrying traffic. This will cause the BGP state to revert to the | |||
intended configuration. | intended configuration. | |||
It is often the case that an AS will attempt to balance incoming | It is often the case that an AS will attempt to balance incoming | |||
traffic across multiple providers, again using the primary / backup | traffic across multiple providers, again using the primary / backup | |||
mechanism. For some prefixes one link is configured as the primary | mechanism. For some prefixes one link is configured as the primary | |||
link, and the others as the backup link, while for other prefixes | link, and the others as the backup link, while for other prefixes | |||
another link is selected as the primary link. An example is shown in | another link is selected as the primary link. An example is shown in | |||
Figure 2. | Figure 2. | |||
+----+ +----+ | ||||
|AS 3|----------------|AS 4| | ||||
+----+ peer peer +----+ | +----+ peer peer +----+ | |||
|provider |provider | |AS 3|--------------------------|AS 4| | |||
+----+ +----+ | ||||
|provider provider| | ||||
| | | | | | |||
|customer |customer | | customer| | |||
|customer | | ||||
+----+ +----+ | +----+ +----+ | |||
|AS 2| |AS 5| | |AS 2| |AS 5| | |||
+----+ +----+ | +----+ +----+ | |||
|provider |provider | |provider provider| | |||
| | | | | | |||
|customer |customer | | | | |||
+-------+ +----------+ | |customer customer| | |||
backup| |primary for 192.9.200.0/25 | +-----------------+ +----------+ | |||
primary| |backup for 192.9.200.128/25 | | | | |||
backup (192.0.2.0/25) | |primary service (192.0.2.0/25) | ||||
primary (192.0.2.128/25)| |backup service (192.0.2.128/25) | ||||
+----+ | +----+ | |||
|AS 1| | |AS 1| | |||
+----+ | +----+ | |||
Figure 2 | Figure 2 | |||
The intended configuration has all incoming traffic for addresses in | The intended configuration has all incoming traffic for addresses in | |||
the range 192.9.200.0/25 via the link from AS5, and all incoming | the range 192.0.2.0/25 via the link from AS5, and all incoming | |||
traffic for addresses in the range 192.9.200.128/25 from AS2. | traffic for addresses in the range 192.0.2.128/25 from AS2. | |||
In this case if the link between AS3 and AS4 is reset, AS3 will learn | In this case if the link between AS3 and AS4 is reset, AS3 will learn | |||
both routes from AS2, and AS4 will learn both routes from AS5. As | both routes from AS2, and AS4 will learn both routes from AS5. As | |||
these customer routes are preferred over peer routes, when the link | these customer routes are preferred over peer routes, when the link | |||
between AS3 and AS4 is restored, neither AS will alter its routing | between AS3 and AS4 is restored, neither AS3 nor AS4 will alter their | |||
behavior with respect to AS1's routes. This situation is now wedged, | routing behavior with respect to AS1's routes. This situation is now | |||
in that there is no eBGP peering that can be reset that will flip BGP | wedged, in that there is no eBGP peering that can be reset that will | |||
back to the intended state. This is an instance of a BGP Wedgie. | flip BGP back to the intended state. This is an instance of a BGP | |||
Wedgie. | ||||
The restoration path here is that AS1 has to withdraw the backup | The restoration path here is that AS1 has to withdraw the backup | |||
advertisements on both paths and operate for an interval without | advertisements on both paths and operate for an interval without | |||
backup, and then readvertise the backup prefix advertisements. The | backup, and then readvertise the backup prefix advertisements. The | |||
length of the interval cannot be readily determined in advance, as it | length of the interval cannot be readily determined in advance, as it | |||
has to be sufficiently long so as to allow AS2 and AS5 to learn of an | has to be sufficiently long so as to allow AS2 and AS5 to learn of an | |||
alternate path to AS1. At this stage the backup routes can be | alternate path to AS1. At this stage the backup routes can be | |||
readvertised. | readvertised. | |||
4. Multi-Party BGP Wedgies | 4. Multi-Party BGP Wedgies | |||
This situation can be more complex when three or more parties provide | This situation can be more complex when three or more parties provide | |||
upstream transit services to an AS. An example is indicated in | upstream transit services to an AS. An example is indicated in | |||
Figure 3. | Figure 3. | |||
+----+ +----+ | ||||
|AS 3|----------------|AS 4| | ||||
+----+ peer peer +----+ | +----+ peer peer +----+ | |||
||provider |provider | |AS 3|------------------------|AS 4| | |||
|+-----------+ | | +----+ +----+ | |||
||provider provider| | ||||
|+----------------+ | | ||||
| | | | ||||
|customer |customer | | |customer |customer | | |||
+----+peer peer+----+ | | ||||
|AS 2|-----------|AS 5| | | ||||
+----+ +----+ | | +----+ +----+ | | |||
|AS 2|-------|AS 5| | | |provider provider| | | |||
+----+ peer +----+ | | ||||
|provider |provider | | ||||
| | | | | | | | |||
|customer +-+customer |customer | | | | | |||
+-------+ |+----------+ | |customer customer| customer| | |||
backup| ||primary | +---------------+ |+---------+ | |||
backup service| ||primary service | ||||
+----+ | +----+ | |||
|AS 1| | |AS 1| | |||
+----+ | +----+ | |||
Figure 3 | Figure 3 | |||
In this example the intended state is that AS2 and AS5 are both | In this example the intended state is that AS2 and AS5 are both | |||
backup providers, and AS4 is the primary provider. When the link | backup providers to AS1, and AS4 is the primary provider. When the | |||
between AS1 and AS4 breaks and is subsequently restored, AS3 will | link between AS1 and AS4 breaks and is subsequently restored, AS3 | |||
continue to direct traffic to AS1 via AS2 or AS5. In this case a | will continue to direct traffic to AS1 via AS2 or AS5. In this case | |||
single reset of the link between AS2 and AS1 will not restore the | a single reset of the link between AS2 and AS1 will not restore the | |||
original intended BGP state, as the BGP-selected best route to AS1 | original intended BGP state, as the BGP-selected best route to AS1 | |||
will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5. | will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5. | |||
What AS1 is observing is incoming traffic on the backup link from | What AS1 is observing is incoming traffic on the backup link from | |||
AS2. Resetting this connection will not restore traffic back to the | AS2. Resetting this connection will not restore traffic back to the | |||
primary path, but instead will switch incoming traffic over to AS5. | primary path, but instead will switch incoming traffic over to AS5. | |||
The action required to correct the situation is to simultaneously | The action required to correct the situation is to simultaneously | |||
reset both the link to AS2, and also the link to AS5. This is not | reset both the link to AS2, and also the link to AS5. This is not | |||
necessarily an intuitive solution, as at any point on time only one | necessarily an intuitively obvious solution, as at any point on time | |||
of these links will be carrying backup traffic, yet both BGP sessions | only one of these links will be carrying backup traffic, yet both BGP | |||
need to be brought down at the same time in order to commence | sessions need to be brought down at the same time in order to | |||
restoration of the intended primary and backup state. | commence restoration of the intended primary and backup state. | |||
5. BGP and Determinism | 5. BGP and Determinism | |||
BGP does not behave deterministically in all cases, and, as a | BGP does not behave deterministically in all cases, and, as a | |||
consequence, there is intended and unintended non-determinism in BGP. | consequence, there is intended and unintended non-determinism in BGP. | |||
For example, the default final tie break in some implementations of | For example, the default final tie break in some implementations of | |||
BGP is to prefer the longest-lived route. To achieve determinism in | BGP is to prefer the longest-lived route. To achieve determinism in | |||
this last step it would be necessary to use a comparison operator | this last step it would be necessary to use a comparison operator | |||
that has a predictable outcome, such as a comparison of router | that has a predictable outcome, such as a comparison of router | |||
identifiers. This class of non-deterministic behavior is termed here | identifiers. This class of non-deterministic behavior is termed here | |||
skipping to change at page 8, line 34 | skipping to change at page 8, line 43 | |||
introduces no new factors in terms of the security and integrity of | introduces no new factors in terms of the security and integrity of | |||
inter-domain routing. | inter-domain routing. | |||
The memo illustrates that in attempting to create policy-based | The memo illustrates that in attempting to create policy-based | |||
outcomes relating to path selection for incoming traffic it is | outcomes relating to path selection for incoming traffic it is | |||
possible to generate BGP configurations where there are multiple | possible to generate BGP configurations where there are multiple | |||
stable outcomes, rather than a single outcome. Furthermore, of these | stable outcomes, rather than a single outcome. Furthermore, of these | |||
instances of multiple outcomes, there are cases where the BGP | instances of multiple outcomes, there are cases where the BGP | |||
selection of a particular outcome is not a deterministic selection. | selection of a particular outcome is not a deterministic selection. | |||
This class of behaviour may be exploitable by a hostile third party. | ||||
A common theme of BGP Wedgies is that starting from an intended or | ||||
desired forwarding state, the loss and subsequent restoration of an | ||||
eBGP peering connection can flip the network's forwarding | ||||
configuration into an unintended and potentially undesired state. | ||||
Significant administrative effort, based on BGP state and | ||||
configuration knowledge that may not be locally available, may be | ||||
required to shift the BGP forwarding configuration back to the | ||||
intended or desired forwardinging state. If a hostile third party | ||||
can deliberately cause the BGP session to reset, thereby producing | ||||
the initial conditions that lead to an unintended forwarding state, | ||||
the network impacts of the resulting unintended or undesired | ||||
forwarding state may be long-lived, far outliving the temporary | ||||
interruption of connectivity that triggered the condition. If these | ||||
impacts, including potential issues of increased cost, reduction of | ||||
available bandwidth, increases in overall latency or degradation of | ||||
service reliability, are significant, then disrupting a BGP session | ||||
could represent an attractive attack vector to a hostile party. | ||||
7. IANA Considerations | 7. IANA Considerations | |||
[Note to RFC Editor: Please remove this section prior to publication] | [Note to RFC Editor: Please remove this section prior to publication] | |||
This document has no associated IANA actions or considerations. | This document has no associated IANA actions or considerations. | |||
8. References | 8. References | |||
8.1 Normative References | 8.1 Normative References | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |