draft-ietf-grow-bgp-gshut-10.txt | draft-ietf-grow-bgp-gshut-11.txt | |||
---|---|---|---|---|
Network Working Group P. Francois, Ed. | Network Working Group P. Francois, Ed. | |||
Internet-Draft Individual Contributor | Internet-Draft Individual Contributor | |||
Intended status: Informational B. Decraene, Ed. | Intended status: Informational B. Decraene, Ed. | |||
Expires: January 28, 2018 Orange | Expires: March 24, 2018 Orange | |||
C. Pelsser | C. Pelsser | |||
Strasbourg University | Strasbourg University | |||
K. Patel | K. Patel | |||
Arrcus, Inc. | Arrcus, Inc. | |||
C. Filsfils | C. Filsfils | |||
Cisco Systems | Cisco Systems | |||
July 27, 2017 | September 20, 2017 | |||
Graceful BGP session shutdown | Graceful BGP session shutdown | |||
draft-ietf-grow-bgp-gshut-10 | draft-ietf-grow-bgp-gshut-11 | |||
Abstract | Abstract | |||
This draft describes operational procedures aimed at reducing the | This draft standardizes a new well-known BGP community | |||
amount of traffic lost during planned maintenances of routers or | GRACEFUL_SHUTDOWN to signal the graceful shutdown of paths. This | |||
links, involving the shutdown of BGP peering sessions. It defines a | draft also describes operational procedures which use this community | |||
well-known BGP community, called GRACEFUL_SHUTDOWN, to signal the | to reduce the amount of traffic lost when BGP peering sessions are | |||
graceful shutdown of paths. | about to be shut down deliberately, e.g. for planned maintenance. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 28, 2018. | This Internet-Draft will expire on March 24, 2018. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Packet loss upon manual EBGP session shutdown . . . . . . . . 4 | 3. Packet loss upon manual EBGP session shutdown . . . . . . . . 3 | |||
4. Practices to avoid packet losses . . . . . . . . . . . . . . 4 | 4. EBGP graceful shutdown procedure . . . . . . . . . . . . . . 4 | |||
4.1. Improving availability of alternate paths . . . . . . . . 4 | 4.1. Pre-configuration . . . . . . . . . . . . . . . . . . . . 4 | |||
4.2. Make before break convergence: graceful shutdown . . . . 5 | 4.2. Operations at maintenance time . . . . . . . . . . . . . 4 | |||
4.3. Forwarding modes and transient forwarding loops during | 4.3. BGP implementation support for graceful shutdown . . . . 5 | |||
convergence . . . . . . . . . . . . . . . . . . . . . . . 5 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 | |||
5. EBGP graceful shutdown procedure . . . . . . . . . . . . . . 5 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | |||
5.1. Pre-configuration . . . . . . . . . . . . . . . . . . . . 5 | 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
5.2. Operations at maintenance time . . . . . . . . . . . . . 6 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
5.3. BGP implementation support for graceful shutdown . . . . 6 | 8.1. Normative References . . . . . . . . . . . . . . . . . . 5 | |||
6. Beyond EBGP graceful shutdown . . . . . . . . . . . . . . . . 7 | 8.2. Informative References . . . . . . . . . . . . . . . . . 6 | |||
6.1. IBGP graceful shutdown . . . . . . . . . . . . . . . . . 7 | Appendix A. Alternative techniques with limited applicability . 6 | |||
6.2. EBGP session establishment . . . . . . . . . . . . . . . 7 | A.1. Multi Exit Discriminator tweaking . . . . . . . . . . . . 6 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | A.2. IGP distance Poisoning . . . . . . . . . . . . . . . . . 7 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 | Appendix B. Configuration Examples . . . . . . . . . . . . . . . 7 | |||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 | B.1. Cisco IOS XR . . . . . . . . . . . . . . . . . . . . . . 7 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 | B.2. BIRD . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 9 | B.3. OpenBGPD . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 9 | Appendix C. Beyond EBGP graceful shutdown . . . . . . . . . . . 8 | |||
Appendix A. Alternative techniques with limited applicability . 10 | C.1. IBGP graceful shutdown . . . . . . . . . . . . . . . . . 8 | |||
A.1. Multi Exit Discriminator tweaking . . . . . . . . . . . . 10 | C.2. EBGP session establishment . . . . . . . . . . . . . . . 8 | |||
A.2. IGP distance Poisoning . . . . . . . . . . . . . . . . . 10 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
Appendix B. Configuration Examples . . . . . . . . . . . . . . . 10 | ||||
B.1. Cisco IOS XR . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
B.2. BIRD . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
B.3. OpenBGPD . . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 | ||||
1. Introduction | 1. Introduction | |||
Routing changes in BGP can be caused by planned maintenance | Routing changes in BGP can be caused by planned maintenance | |||
operations. This document discusses operational procedures to be | operations. This document defines a well-known community [RFC1997], | |||
applied in order to reduce or eliminate loss of packets during the | called GRACEFUL_SHUTDOWN, for the purpose of reducing the management | |||
maintenance. These losses come from the transient lack of | overhead of gracefully shutting down BGP sessions. The well-known | |||
reachability during the BGP convergence following the shutdown of an | community allows implementers to provide an automated graceful | |||
EBGP peering session between two Autonomous System Border Routers | shutdown mechanism that does not require any router reconfiguration | |||
(ASBR). | at maintenance time. | |||
This document discusses operational procedures to be applied in order | ||||
to reduce or eliminate loss of packets during a maintenance. Loss | ||||
comes from transient lack of reachability during BGP convergence | ||||
which follows the shutdown of an EBGP peering session between two | ||||
Autonomous System Border Routers (ASBR). | ||||
This document presents procedures for the cases where the forwarding | This document presents procedures for the cases where the forwarding | |||
plane is impacted by the maintenance, hence when the use of Graceful | plane is impacted by the maintenance, hence when the use of Graceful | |||
Restart does not apply. | Restart does not apply. | |||
The procedures described in this document can be applied to reduce or | The procedures described in this document can be applied to reduce or | |||
avoid packet loss for outbound and inbound traffic flows initially | avoid packet loss for outbound and inbound traffic flows initially | |||
forwarded along the peering link to be shut down. These procedures | forwarded along the peering link to be shut down. These procedures | |||
trigger, in both ASes, rerouting to the alternate path if one exists | trigger, in both ASes, rerouting to alternate paths if they exist | |||
within the AS, while allowing the use of the old path until alternate | within the AS, while allowing the use of the old path until alternate | |||
ones are learned. This ensures that routers always have a valid | ones are learned. This ensures that routers always have a valid | |||
route available during the convergence process. | route available during the convergence process. | |||
The goal of the document is to meet the requirements described in | The goal of the document is to meet the requirements described in | |||
[RFC6198] at best, without changing the BGP protocol. | [RFC6198] at best, without changing the BGP protocol. | |||
This document defines a well-known community [RFC1997], called | ||||
GRACEFUL_SHUTDOWN, for the purpose of reducing the management | ||||
overhead of gracefully shutting down BGP sessions. The well-known | ||||
community allows implementers to provide an automated graceful | ||||
shutdown mechanism that does not require any router reconfiguration | ||||
at maintenance time. | ||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
2. Terminology | 2. Terminology | |||
graceful shutdown initiator: a router on which the session shutdown | graceful shutdown initiator: a router on which the session shutdown | |||
is performed for the maintenance. | is performed for the maintenance. | |||
graceful shutdown receiver: a router that has a BGP session, to be | graceful shutdown receiver: a router that has a BGP session, to be | |||
shutdown, with the graceful shutdown initiator. | shutdown, with the graceful shutdown initiator. | |||
Initiator AS: the Autonomous System of the graceful shutdown | ||||
initiator. | ||||
Receiver AS: the Autonomous System of the graceful shutdown receiver. | ||||
Loss of Connectivity (LoC: the state when a router has no path toward | ||||
an affected prefix. | ||||
3. Packet loss upon manual EBGP session shutdown | 3. Packet loss upon manual EBGP session shutdown | |||
Packets can be lost during a manual shutdown of an EBGP session for | Packets can be lost during the BGP convergence following a manual | |||
two reasons. | shutdown of an EBGP session for two reasons. | |||
First, routers involved in the convergence process can transiently | First, some routers can have no path toward an affected prefix, and | |||
lack paths toward an affected prefix, and drop traffic destined to | drop traffic destined to this prefix. This is because alternate | |||
this prefix. This is because alternate paths can be hidden by nodes | paths can be hidden by nodes of an AS. This happens when [RFC7911] | |||
of an AS. This happens when the paths are not selected as best by | is not used and the paths are not selected as best by the ASBR that | |||
the ASBR that receive them on an EBGP session, or by Route Reflectors | receive them on an EBGP session, or by Route Reflectors that do not | |||
that do not propagate them further in the IBGP topology because they | propagate them further in the IBGP topology because they do not | |||
do not select them as best. | select them as best. | |||
Second, within the AS, the FIB of routers can be transiently | Second, the FIB can be inconsistent between routers within the AS, | |||
inconsistent during the BGP convergence and packets toward affected | and packets toward affected prefixes can loop and be dropped unless | |||
prefixes can loop and be dropped. Note that these loops only happen | encapsulation is used within the AS. | |||
when ASBR-to-ASBR encapsulation is not used within the AS. | ||||
This document only addresses the first reason. | This document only addresses the first reason. | |||
4. Practices to avoid packet losses | 4. EBGP graceful shutdown procedure | |||
This section describes means for an ISP to reduce the transient loss | ||||
of packets upon a manual shutdown of a BGP session. | ||||
4.1. Improving availability of alternate paths | ||||
All solutions that increase the availability of alternate BGP paths | ||||
at routers performing forwarding lookups from BGP routes such as | ||||
[I-D.ietf-idr-best-external] and [RFC7911] help in reducing the LoC | ||||
bound with the shutdown of EBGP sessions. | ||||
Any such solution where, at any single step of the convergence | ||||
process following the EBGP session shutdown, a BGP router does not | ||||
receive a message withdrawing the only path it currently knows for a | ||||
given NLRI, allows for a simplified graceful shutdown procedure. | ||||
Note that the LoC for the inbound traffic of graceful shutdown | ||||
initiator, due to the lack of an alternate path on the graceful | ||||
shutdown receiver is not under the control of the Initiator AS. The | ||||
part of the procedure aimed at avoiding LoC for incoming traffic | ||||
should thus be applied even if no LoC are expected for the outgoing | ||||
traffic. | ||||
4.2. Make before break convergence: graceful shutdown | This section describes configurations and actions to be performed for | |||
the graceful shutdown of EBGP peering links. | ||||
The goal of this procedure is to retain the paths to be shutdown | The goal of this procedure is to retain the paths to be shutdown | |||
between the peers, but with a lower LOCAL_PREF value, allowing the | between the peers, but with a lower LOCAL_PREF value, allowing the | |||
paths to remain in use while alternate paths are selected and | paths to remain in use while alternate paths are selected and | |||
propagated, rather than simply withdrawing the paths. The LOCAL_PREF | propagated, rather than simply withdrawing the paths. The LOCAL_PREF | |||
value must be lower than the one of the alternate path. 0 being the | value must be lower than the one of the alternate path. 0 being the | |||
lowest value, it can be used in all cases, except if it already has a | lowest value, it can be used in all cases, except if it already has a | |||
special meaning within the AS. | special meaning within the AS. | |||
Section 5 describes configurations and actions to be performed for | 4.1. Pre-configuration | |||
the graceful shutdown of BGP sessions. | ||||
4.3. Forwarding modes and transient forwarding loops during convergence | ||||
The graceful shutdown procedure or the solutions improving the | ||||
availability of alternate paths, do not change the fact that BGP | ||||
convergence and the subsequent FIB updates are run independently on | ||||
each router of the ASes. If the AS applying the solution does not | ||||
rely on encapsulation to forward packets from the Ingress Border | ||||
Router to the Egress Border Router, then transient forwarding loops | ||||
and consequent packet losses can occur during the convergence | ||||
process. If zero LoC is required, encapsulation is required between | ||||
ASBRs of the AS. | ||||
5. EBGP graceful shutdown procedure | ||||
This section describes configurations and actions to be performed for | ||||
the graceful shutdown of EBGP peering links. | ||||
5.1. Pre-configuration | ||||
On each ASBR supporting the graceful shutdown receiver procedure, an | On each ASBR supporting the graceful shutdown receiver procedure, an | |||
inbound BGP route policy is applied on all EBGP sessions of the ASBR, | inbound BGP route policy is applied on all EBGP sessions of the ASBR, | |||
that: | that: | |||
o matches the GRACEFUL_SHUTDOWN community | o matches the GRACEFUL_SHUTDOWN community. | |||
o sets the LOCAL_PREF attribute of the paths tagged with the | o sets the LOCAL_PREF attribute of the paths tagged with the | |||
GRACEFUL_SHUTDOWN community to a low value | GRACEFUL_SHUTDOWN community to a low value. | |||
Note that in the case where an AS is aggregating multiple routes | ||||
under a covering prefix, it is recommended to filter out the | ||||
GRACEFUL_SHUTDOWN community from the resulting aggregate BGP route. | ||||
By doing so, the setting of the GRACEFUL_SHUTDOWN community on one of | ||||
the aggregated routes will not let the entire aggregate inherit the | ||||
community. Not doing so would let the entire aggregate undergo the | ||||
graceful shutdown behavior. | ||||
5.2. Operations at maintenance time | 4.2. Operations at maintenance time | |||
On the graceful shutdown initiator, upon maintenance time, it is | On the graceful shutdown initiator, at maintenance time, the | |||
required to: | operator: | |||
o apply an outbound BGP route policy on the EBGP session to be | o applies an outbound BGP route policy on the EBGP session to be | |||
shutdown. This policy tags the paths propagated over the session | shutdown. This policy tags the paths propagated over the session | |||
with the GRACEFUL_SHUTDOWN community. This will trigger the BGP | with the GRACEFUL_SHUTDOWN community. This will trigger the BGP | |||
implementation to re-advertise all active routes previously | implementation to re-advertise all active routes previously | |||
advertised, and tag them with the GRACEFUL_SHUTDOWN community. | advertised, and tag them with the GRACEFUL_SHUTDOWN community. | |||
o apply an inbound BGP route policy on the EBGP session to be | o applies an inbound BGP route policy on the EBGP session to be | |||
shutdown. This policy tags the paths received over the session | shutdown. This policy tags the paths received over the session | |||
with the GRACEFUL_SHUTDOWN community and sets LOCAL_PREF to a low | with the GRACEFUL_SHUTDOWN community and sets LOCAL_PREF to a low | |||
value. | value. | |||
o wait for convergence to happen. | o wait for route readvertisement over the EBGP session, and BGP | |||
routing convergence on both ASBRs. | ||||
o shutdown the EBGP session, optionally using | o shutdown the EBGP session, optionally using | |||
[I-D.ietf-idr-shutdown] to communicate the reason of the shutdown. | [I-D.ietf-idr-shutdown] to communicate the reason of the shutdown. | |||
In the case of a shutdown of the whole router, in addition to the | In the case of a shutdown of the whole router, in addition to the | |||
graceful shutdown of all EBGP sessions, there is a need to graceful | graceful shutdown of all EBGP sessions, there is a need to gracefully | |||
shutdown the routes originated by this router (e.g, BGP aggregates | shutdown the routes originated by this router (e.g, BGP aggregates | |||
redistributed from other protocols, including static routes). This | redistributed from other protocols, including static routes). This | |||
can be performed by tagging such routes with the GRACEFUL_SHUTDOWN | can be performed by tagging these routes with the GRACEFUL_SHUTDOWN | |||
community and setting LOCAL_PREF to a low value. | community and setting LOCAL_PREF to a low value. | |||
5.3. BGP implementation support for graceful shutdown | 4.3. BGP implementation support for graceful shutdown | |||
A BGP router implementation MAY provide features aimed at automating | ||||
the application of the graceful shutdown procedures described above. | ||||
Upon a session shutdown specified as graceful by the operator, a BGP | ||||
implementation supporting a graceful shutdown feature SHOULD: | ||||
1. Update all the paths propagated over the corresponding EBGP | ||||
session, tagging the GRACEFUL_SHUTDOWN community to them. Any | ||||
subsequent update sent over the session being gracefully shut | ||||
down SHOULD be tagged with the GRACEFUL_SHUTDOWN community. | ||||
2. Lower the LOCAL_PREF value of the paths received over the EBGP | ||||
session being shut down and set the GRACEFUL_SHUTDOWN community. | ||||
3. Optionally shut down the session after a configured time. | ||||
4. Prevent the GRACEFUL_SHUTDOWN community from being inherited by a | ||||
path that would aggregate some paths tagged with the GSHUT | ||||
community. This behavior avoids the GSHUT procedure to be | ||||
applied to the aggregate upon the graceful shutdown of one of its | ||||
covered prefixes. | ||||
A BGP implementation supporting a graceful shutdown feature SHOULD | ||||
also automatically install the BGP policies that are supposed to be | ||||
configured, as described in Section 5.1 for sessions over which | ||||
graceful shutdown is to be supported. | ||||
6. Beyond EBGP graceful shutdown | ||||
6.1. IBGP graceful shutdown | ||||
For the shutdown of an IBGP session, provided the IBGP topology is | ||||
viable after the maintenance of the session, i.e, if all BGP speakers | ||||
of the AS have an IBGP signaling path for all prefixes advertised on | ||||
this graceful shutdown IBGP session, then the shutdown of an IBGP | ||||
session does not lead to transient unreachability. As a consequence, | ||||
no specific graceful shutdown action is required. | ||||
6.2. EBGP session establishment | ||||
We identify two potential causes for transient packet losses upon the | ||||
establishment of an EBGP session. The first one is local to the | ||||
startup initiator, the second one is due to the BGP convergence | ||||
following the injection of new best paths within the IBGP topology. | ||||
6.2.1. Unreachability local to the ASBR | ||||
An ASBR that selects as best a path received over a newly established | ||||
EBGP session may transiently drop traffic. This can typically happen | ||||
when the NEXT_HOP attribute differs from the IP address of the EBGP | ||||
peer, and the receiving ASBR has not yet resolved the MAC address | ||||
associated with the IP address of that "third party" NEXT_HOP. | ||||
A BGP speaker implementation may avoid such losses by ensuring that | ||||
"third party" NEXT_HOPs are resolved before installing paths using | ||||
these in the RIB. | ||||
Alternatively, the operator (script) may ping third party NEXT_HOPs | ||||
that are expected to be used before establishing the session. By | ||||
proceeding like this, the MAC addresses associated with these third | ||||
party NEXT_HOPs are resolved by the startup initiator. | ||||
6.2.2. IBGP convergence | ||||
Corner cases leading to LoC can occur during the establishment of an | ||||
EBGP session. | ||||
A typical example for such transient unreachability for a given | ||||
prefix is the following: | ||||
Let's consider 3 route reflectors RR1, RR2, RR3. There is a full | ||||
mesh of IBGP sessions between them. | ||||
1. RR1 is initially advertising the current best path to the | ||||
members of its IBGP RR full-mesh. It propagated that path within | ||||
its RR full-mesh. RR2 knows only that path toward the prefix. | ||||
2. RR3 receives a new best path originated by the startup | ||||
initiator, being one of its RR clients. RR3 selects it as best, | ||||
and propagates an UPDATE within its RR full-mesh, i.e., to RR1 and | ||||
RR2. | ||||
3. RR1 receives that path, reruns its decision process, and picks | ||||
this new path as best. As a result, RR1 withdraws its previously | ||||
announced best-path on the IBGP sessions of its RR full-mesh. | ||||
4. If, for any reason, RR3 processes the withdraw generated in | ||||
step 3, before processing the update generated in step 2, RR3 | ||||
transiently suffers from unreachability for the affected prefix. | ||||
The use of [I-D.ietf-idr-best-external] among the RR of the IBGP | ||||
full-mesh can solve these corner cases by ensuring that within an AS, | ||||
the advertisement of a new route is not translated into the withdraw | ||||
of a former route. | ||||
Indeed, "best-external" ensures that an ASBR does not withdraw a | BGP Implementers SHOULD provide configuration knobs that utilize the | |||
previously advertised (EBGP) path when it receives an additional, | GRACEFUL_SHUTDOWN community to drain BGP neighbors in preparation of | |||
preferred path over an IBGP session. Also, "best-intra-cluster" | impending neighbor shutdown. Implementation details are outside the | |||
ensures that a RR does not withdraw a previously advertised (IBGP) | scope of this document. | |||
path to its non clients (e.g. other RRs in a mesh of RR) when it | ||||
receives a new, preferred path over an IBGP session. | ||||
7. IANA Considerations | 5. IANA Considerations | |||
The IANA has assigned the community value 0xFFFF0000 to the planned- | The IANA has assigned the community value 0xFFFF0000 to the planned- | |||
shut community in the "BGP Well-known Communities" registry. IANA is | shut community in the "BGP Well-known Communities" registry. IANA is | |||
requested to change the name planned-shut to GRACEFUL_SHUTDOWN and | requested to change the name planned-shut to GRACEFUL_SHUTDOWN and | |||
set this document as the reference. | set this document as the reference. | |||
8. Security Considerations | 6. Security Considerations | |||
By providing the graceful shutdown service to a neighboring AS, an | By providing the graceful shutdown service to a neighboring AS, an | |||
ISP provides means to this neighbor and possibly its downstream ASes | ISP provides means to this neighbor and possibly its downstream ASes | |||
to lower the LOCAL_PREF value assigned to the paths received from | to lower the LOCAL_PREF value assigned to the paths received from | |||
this neighbor. | this neighbor. | |||
The neighbor could abuse the technique and do inbound traffic | The neighbor could abuse the technique and do inbound traffic | |||
engineering by declaring some prefixes as undergoing a maintenance so | engineering by declaring some prefixes as undergoing a maintenance so | |||
as to switch traffic to another peering link. | as to switch traffic to another peering link. | |||
If this behavior is not tolerated by the ISP, it SHOULD monitor the | If this behavior is not tolerated by the ISP, it SHOULD monitor the | |||
use of the graceful shutdown community. | use of the graceful shutdown community. | |||
9. Acknowledgments | 7. Acknowledgments | |||
The authors wish to thank Olivier Bonaventure, Pradosh Mohapatra, Job | The authors wish to thank Olivier Bonaventure, Pradosh Mohapatra, Job | |||
Snijders and John Heasley for their useful comments. | Snijders John Heasley, and Christopher Morrow for their useful | |||
comments. | ||||
10. References | 8. References | |||
10.1. Normative References | 8.1. Normative References | |||
[RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities | [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities | |||
Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, | Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, | |||
<http://www.rfc-editor.org/info/rfc1997>. | <https://www.rfc-editor.org/info/rfc1997>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<http://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC6198] Decraene, B., Francois, P., Pelsser, C., Ahmad, Z., | [RFC6198] Decraene, B., Francois, P., Pelsser, C., Ahmad, Z., | |||
Elizondo Armengol, A., and T. Takeda, "Requirements for | Elizondo Armengol, A., and T. Takeda, "Requirements for | |||
the Graceful Shutdown of BGP Sessions", RFC 6198, | the Graceful Shutdown of BGP Sessions", RFC 6198, | |||
DOI 10.17487/RFC6198, April 2011, | DOI 10.17487/RFC6198, April 2011, | |||
<http://www.rfc-editor.org/info/rfc6198>. | <https://www.rfc-editor.org/info/rfc6198>. | |||
10.2. Informative References | 8.2. Informative References | |||
[I-D.ietf-idr-best-external] | [I-D.ietf-idr-best-external] | |||
Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. | Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. | |||
Gredler, "Advertisement of the best external route in | Gredler, "Advertisement of the best external route in | |||
BGP", draft-ietf-idr-best-external-05 (work in progress), | BGP", draft-ietf-idr-best-external-05 (work in progress), | |||
January 2012. | January 2012. | |||
[I-D.ietf-idr-shutdown] | [I-D.ietf-idr-shutdown] | |||
Snijders, J., Heitz, J., and J. Scudder, "BGP | Snijders, J., Heitz, J., and J. Scudder, "BGP | |||
Administrative Shutdown Communication", draft-ietf-idr- | Administrative Shutdown Communication", draft-ietf-idr- | |||
shutdown-10 (work in progress), June 2017. | shutdown-10 (work in progress), June 2017. | |||
[RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, | [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, | |||
"Advertisement of Multiple Paths in BGP", RFC 7911, | "Advertisement of Multiple Paths in BGP", RFC 7911, | |||
DOI 10.17487/RFC7911, July 2016, | DOI 10.17487/RFC7911, July 2016, | |||
<http://www.rfc-editor.org/info/rfc7911>. | <https://www.rfc-editor.org/info/rfc7911>. | |||
Appendix A. Alternative techniques with limited applicability | Appendix A. Alternative techniques with limited applicability | |||
A few alternative techniques have been considered to provide graceful | A few alternative techniques have been considered to provide graceful | |||
shutdown capabilities but have been rejected due to their limited | shutdown capabilities but have been rejected due to their limited | |||
applicability. This section describe them for possible reference. | applicability. This section describe them for possible reference. | |||
A.1. Multi Exit Discriminator tweaking | A.1. Multi Exit Discriminator tweaking | |||
The MED attribute of the paths to be avoided can be increased so as | The MED attribute of the paths to be avoided can be increased so as | |||
skipping to change at page 12, line 4 ¶ | skipping to change at page 8, line 25 ¶ | |||
honor_graceful_shutdown(); | honor_graceful_shutdown(); | |||
} | } | |||
protocol bgp peer_64497_1 { | protocol bgp peer_64497_1 { | |||
neighbor 2001:db8:1:2::1 as 64497; | neighbor 2001:db8:1:2::1 as 64497; | |||
local as 64496; | local as 64496; | |||
import keep filtered; | import keep filtered; | |||
import filter AS64497_ebgp_inbound; | import filter AS64497_ebgp_inbound; | |||
} | } | |||
B.3. OpenBGPD | B.3. OpenBGPD | |||
AS 64496 | AS 64496 | |||
router-id 192.0.2.1 | router-id 192.0.2.1 | |||
neighbor 2001:db8:1:2::1 { | neighbor 2001:db8:1:2::1 { | |||
remote-as 64497 | remote-as 64497 | |||
} | } | |||
# normally this policy would contain much more | # normally this policy would contain much more | |||
match from any community GRACEFUL_SHUTDOWN set { localpref 0 } | match from any community GRACEFUL_SHUTDOWN set { localpref 0 } | |||
Appendix C. Beyond EBGP graceful shutdown | ||||
C.1. IBGP graceful shutdown | ||||
For the shutdown of an IBGP session, provided the IBGP topology is | ||||
viable after the maintenance of the session, i.e, if all BGP speakers | ||||
of the AS have an IBGP signaling path for all prefixes advertised on | ||||
this graceful shutdown IBGP session, then the shutdown of an IBGP | ||||
session does not lead to transient unreachability. As a consequence, | ||||
no specific graceful shutdown action is required. | ||||
C.2. EBGP session establishment | ||||
We identify two potential causes for transient packet losses upon the | ||||
establishment of an EBGP session. The first one is local to the | ||||
startup initiator, the second one is due to the BGP convergence | ||||
following the injection of new best paths within the IBGP topology. | ||||
C.2.1. Unreachability local to the ASBR | ||||
An ASBR that selects as best a path received over a newly established | ||||
EBGP session may transiently drop traffic. This can typically happen | ||||
when the NEXT_HOP attribute differs from the IP address of the EBGP | ||||
peer, and the receiving ASBR has not yet resolved the MAC address | ||||
associated with the IP address of that "third party" NEXT_HOP. | ||||
A BGP speaker implementation may avoid such losses by ensuring that | ||||
"third party" NEXT_HOPs are resolved before installing paths using | ||||
these in the RIB. | ||||
Alternatively, the operator (script) may ping third party NEXT_HOPs | ||||
that are expected to be used before establishing the session. By | ||||
proceeding like this, the MAC addresses associated with these third | ||||
party NEXT_HOPs are resolved by the startup initiator. | ||||
C.2.2. IBGP convergence | ||||
During the establishment of an EBGP session, in some corner cases a | ||||
router may have no path toward an affected prefix, leading to loss of | ||||
connectivity. | ||||
A typical example for such transient unreachability for a given | ||||
prefix is the following: | ||||
Let's consider 3 route reflectors RR1, RR2, RR3. There is a full | ||||
mesh of IBGP sessions between them. | ||||
1. RR1 is initially advertising the current best path to the | ||||
members of its IBGP RR full-mesh. It propagated that path within | ||||
its RR full-mesh. RR2 knows only that path toward the prefix. | ||||
2. RR3 receives a new best path originated by the startup | ||||
initiator, being one of its RR clients. RR3 selects it as best, | ||||
and propagates an UPDATE within its RR full-mesh, i.e., to RR1 and | ||||
RR2. | ||||
3. RR1 receives that path, reruns its decision process, and picks | ||||
this new path as best. As a result, RR1 withdraws its previously | ||||
announced best-path on the IBGP sessions of its RR full-mesh. | ||||
4. If, for any reason, RR3 processes the withdraw generated in | ||||
step 3, before processing the update generated in step 2, RR3 | ||||
transiently suffers from unreachability for the affected prefix. | ||||
The use of [RFC7911] or [I-D.ietf-idr-best-external] among the RR of | ||||
the IBGP full-mesh can solve these corner cases by ensuring that | ||||
within an AS, the advertisement of a new route is not translated into | ||||
the withdraw of a former route. | ||||
Indeed, "best-external" ensures that an ASBR does not withdraw a | ||||
previously advertised (EBGP) path when it receives an additional, | ||||
preferred path over an IBGP session. Also, "best-intra-cluster" | ||||
ensures that a RR does not withdraw a previously advertised (IBGP) | ||||
path to its non clients (e.g. other RRs in a mesh of RR) when it | ||||
receives a new, preferred path over an IBGP session. | ||||
Authors' Addresses | Authors' Addresses | |||
Pierre Francois (editor) | Pierre Francois (editor) | |||
Individual Contributor | Individual Contributor | |||
Email: pfrpfr@gmail.com | Email: pfrpfr@gmail.com | |||
Bruno Decraene (editor) | Bruno Decraene (editor) | |||
Orange | Orange | |||
End of changes. 42 change blocks. | ||||
249 lines changed or deleted | 167 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |