draft-ietf-grow-ops-reqs-for-bgp-error-handling-05.txt | draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt | |||
---|---|---|---|---|
Internet Engineering Task Force R. Shakir | Internet Engineering Task Force R. Shakir | |||
Internet-Draft BT | Internet-Draft BT | |||
Intended status: Informational July 30, 2012 | Intended status: Informational December 27, 2012 | |||
Expires: January 31, 2013 | Expires: June 30, 2013 | |||
Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 | Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 | |||
draft-ietf-grow-ops-reqs-for-bgp-error-handling-05 | draft-ietf-grow-ops-reqs-for-bgp-error-handling-06 | |||
Abstract | Abstract | |||
BGP-4 is utilised as a key intra- and inter-Autonomous System routing | BGP is utilised as a key intra- and inter-autonomous system routing | |||
protocol in modern IP networks. The failure modes as defined by the | protocol in modern IP networks. The failure modes, as defined by the | |||
original protocol standards are based on a number of assumptions | original protocol standards, are based on a number of assumptions | |||
around the impact of session failure. Numerous incidents both in the | around the impact of session failure. Numerous incidents both in the | |||
global Internet routing table and within Service Provider networks | global Internet routing table and within service provider networks | |||
have been caused by strict handling of a single invalid UPDATE | have been caused by strict handling of a single invalid UPDATE | |||
message causing large-scale failures in one or more Autonomous | message causing large-scale failures in one or more autonomous | |||
Systems. | systems. | |||
This memo describes the current use of BGP-4 within Service Provider | This memo describes the current use of BGP within service provider | |||
networks, and outlines a set of requirements for further work to | networks, and outlines a set of requirements for further work to | |||
enhance the mechanisms available to a BGP-4 implementation when | enhance the mechanisms available to a BGP implementation when | |||
erroneous data is detected. Whilst this document does not provide | erroneous data is detected. Whilst this document does not provide | |||
specification of any standard, it is intended as an overview of a set | specification of any standard, it is intended as an overview of a set | |||
of enhancements to BGP-4 to improve the protocol's robustness to suit | of enhancements to BGP to improve the protocol's robustness to suit | |||
its current deployment. | its current deployment. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 31, 2013. | This Internet-Draft will expire on June 30, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 | 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Overview of Operator Requirements for BGP-4 Error | 2.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 4 | |||
Handling . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Critical and Non-Critical Errors . . . . . . . . . . . . . . . 7 | |||
2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 7 | 4. Error Handling for Non-Critical Errors . . . . . . . . . . . . 9 | |||
2.1. Classifying BGP Errors and Expected Error Handling . . . . 8 | 4.1. NLRI-level Error Handling Requirements . . . . . . . . . . 9 | |||
2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 9 | 4.2. Recovering RIB Consistency following NLRI-level Error | |||
2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 9 | Handling . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 11 | 5. Error Handling for Critical Errors . . . . . . . . . . . . . . 12 | |||
4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 13 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | |||
5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 15 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | |||
6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 17 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
7. Operational Complexities Introduced by Altering RFC4271 . . . 21 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
7.1. Reducing the Network Impact of Session Teardown . . . . . 23 | 9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 | 9.2. Informational References . . . . . . . . . . . . . . . . . 17 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 | ||||
11.2. Informational References . . . . . . . . . . . . . . . . . 28 | ||||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
1. Introduction | ||||
Where BGP-4 [RFC4271] is deployed in the Internet and Service | ||||
Provider networks, numerous incidents have been recorded due to the | ||||
manner in which [RFC4271] specifies errors in routing information | ||||
should be handled. Whilst the behaviour defined in the existing | ||||
standards retains utility, the deployments of the protocol have | ||||
changed within modern networks, resulting in significantly different | ||||
demands for protocol robustness. Whilst a number of Internet Drafts | ||||
have been written to begin to enhance the behaviour of BGP-4 in terms | ||||
of the handling of erroneous messages, this memo intends to define a | ||||
set of requirements for ongoing work. These requirements are | ||||
considered from the perspective of a Network Operator, and hence this | ||||
draft does not intend to define the protocol mechanisms by which such | ||||
error handling behaviour is to be implemented. | ||||
1.1. Role of BGP-4 in Service Provider Networks | ||||
BGP was designed as an inter-Autonomous System (AS) routing protocol | ||||
and hence many of the error handling mechanisms within the protocol | ||||
specification are designed to be conducive to this role. In general, | ||||
this consideration as an inter-AS routing propagation mechanism | ||||
results in the view that a BGP session propagates a relatively small | ||||
amount of network-layer reachability information (NLRI) between two | ||||
ASes. In this case, it is the expectation of session resilience for | ||||
those adjacencies that are key to routing continuity (for example, it | ||||
is expected that two networks peering via BGP would connect multiple | ||||
times in order to safeguard equipment or protocol failure). In | ||||
addition, there is some expectation of multiple paths to a particular | ||||
NLRI being available - it would be expected that a network can fall | ||||
back to utilising alternate, less direct, paths where a failure of a | ||||
more direct path occurs. | ||||
Traditional network architectures would deploy an Interior Gateway | ||||
Protocol (IGP) to carry infrastructure and customer routes, with an | ||||
Exterior Gateway Protocol (EGP) such as BGP being utilised to | ||||
propagate these routes to other Autonomous Systems. However, with | ||||
the growth of IP-based services, this is no longer considered best | ||||
practice. In order to ensure that convergence is within acceptable | ||||
time bounds, the amount of routing information carried within the IGP | ||||
is significantly reduced - and tends to be only infrastructure | ||||
routes. iBGP is then utilised to propagate both customer, and | ||||
external routes within an AS. As such, BGP has become an IGP, with | ||||
traditional IGPs acting as a means by which to propagate the routing | ||||
information which is required to establish a BGP session, and reach | ||||
the egress node within the local routing domain. This change in role | ||||
presents different requirements for the robustness of BGP as a | ||||
routing protocol - with the expectation of similar level of | ||||
robustness to that of an IGP being set. | ||||
Along with this change in role, the nature of the IP routing | ||||
information that is carried has changed. BGP has become a ubiquitous | ||||
means by which service information can be propagated between devices. | ||||
For instance, BGP is utilised to carry routing information for IP/ | ||||
MPLS VPN services as described in [RFC4364]. Since there is an | ||||
existing deployment of the protocol between PE devices in numerous | ||||
networks, it has been adapted to propagate this routing information, | ||||
as its use limits the number of routing protocols required on each | ||||
device. This additional information being propagated represents a | ||||
large change in requirement for the error handling of the protocol - | ||||
where session failure occurs, it is likely a complete service outage | ||||
for at least a subset of a network's customers is experienced where | ||||
an erroneous packet may have occurred within a different sub-topology | ||||
or even service (a different address family for example). For this | ||||
reason, there is a significant demand to avoid service affecting | ||||
failures that may be triggered by routing information within a single | ||||
sub-topology or service. | ||||
The combination of the increased number of deployments of BGP-4 as an | ||||
intra-AS routing protocol, its use for the propagation of additional | ||||
types of routing and service information, and the growth of IP | ||||
services has resulted in a substantial increase in the volume of | ||||
information carried within BGP-4. In numerous networks, RIB sizes of | ||||
the order of millions of entries exist within individual BGP | ||||
speakers, with particularly high-scale points exhibited at BGP | ||||
speakers performing aggregation or functionality designed improve | ||||
utilisation of network resources (e.g., route reflector hierarchies). | ||||
Clearly an increase in the amount routing information carried in BGP | ||||
results in greater impact to services during failures, which is only | ||||
amplified by a corresponding increase in recovery times. Following a | ||||
failure, there is a substantial recovery time to learn, compute and | ||||
distribute new paths, which results in a greater observed impact to | ||||
services affected, and hence adds further weight to the requirement | ||||
to avoid failures altogether or, at least, mitigate their impact to | ||||
the narrowest scope possible, (e.g., a specific NLRI). Whilst an | ||||
argument could be made that convergence time of BGP-4 could | ||||
potentially be reduced through deployment of additional computational | ||||
resource, it is notable that solution is not necessarily | ||||
straightforward from an implementation or deployment perspective, | ||||
(e.g., scaling computation resources within a single address-family | ||||
is difficult). Thus, significant challenges continue to exist for | ||||
operators when scaling BGP-4 deployments, and hence mechanisms which | ||||
improve the scalability of BGP-4 are very important. | ||||
Both within Internet and multi-service routing architectures, a | ||||
number of BGP sessions propagate a large proportion of the required | ||||
routing information for network operation. For Internet routing, | ||||
these are typically BGP sessions which propagate the global routing | ||||
table to an AS - failure of these sessions may have a large impact on | ||||
network service, based on a single erroneous update. In an multi- | ||||
service environment, typical deployments utilise a small number of | ||||
core-facing BGP sessions, typically towards route reflector devices. | ||||
Failure of these sessions may also result in a large impact to | ||||
network operation. Clearly, the avoidance of conditions requiring | ||||
these sessions to fail is of great utility to any network operator, | ||||
and provides further motivation for the revision of the existing | ||||
behaviour. | ||||
Whilst the behaviour in [RFC4271] is suited to ensuring that BGP | ||||
messages with erroneous routing information in are limited in scope | ||||
(by means of session reset), with the above considerations, it is | ||||
clear that this mechanism is not suited to all deployments. It | ||||
should, however, be noted that the change in scope affects the | ||||
handling only of errors occurring after BGP session establishment. | ||||
There is no current operational requirement to amend the means by | ||||
which error handling in session establishment, or liveliness | ||||
detection, are performed. | ||||
1.2. Overview of Operator Requirements for BGP-4 Error Handling | ||||
It is the intention of this document to define a set of criteria for | ||||
the manner in which a revised error handling mechanism in BGP-4 is | ||||
required to conform. The motivation for the definition of these | ||||
requirements can be summarised based on certain behaviour currently | ||||
present in the protocol that is not deemed acceptable within current | ||||
operational deployments, or where there is a short-fall in the tool | ||||
set available to an operator. These key requirements can be | ||||
summarised as follows: | ||||
o It is unacceptable within modern deployments of the BGP-4 protocol | ||||
that a single erroneous UPDATE packet affects routes that it does | ||||
not carry. This requirement therefore requires some modification | ||||
to the means by which erroneous UPDATE packets are handled, and | ||||
reacted to - with a particular focus on avoiding the use of the | ||||
NOTIFICATION message. | ||||
o It is recognised that some error conditions may occur within the | ||||
BGP-4 protocol may not always be handled gracefully, and may | ||||
result in conditions whereby an implementation cannot recover. In | ||||
these (and similar) cases, it is undesirable for an operator that | ||||
this reset of the BGP-4 session results in interruption to | ||||
forwarding packets (by means of withdrawing routes installed by | ||||
BGP-4 into a device's RIB, and subsequently FIB). To this end, | ||||
there is a requirement to define a session reset mechanism which | ||||
provides session re-initialisation in a non-destructive manner. | ||||
o Further to the requirements to provide a more robust protocol, the | ||||
current visibility into error conditions within the BGP-4 protocol | ||||
is extremely limited - where further modifications to this | ||||
behaviour are to be made, complexity is likely to be added. Thus, | ||||
to ensure that BGP-4 is manageable, there are requirements for | ||||
mechanisms by which the protocol can be examined and monitored. | ||||
This document describes each of these requirements in further depth, | ||||
along with an overview of means by which they are expected to be | ||||
achieved. In addition, the mechanism by which the enhancements | ||||
meeting these requirements are to interact is discussed. | ||||
2. Errors within BGP-4 UPDATE Messages | 1. Requirements Language | |||
Both through analysis of incidents occurring with the Internet DFZ, | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
and multi-service environments utilising BGP-4 to signal service or | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
routing information, a number of different classes of errors within | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
BGP-4 UPDATE messages have been observed. In order to consider the | ||||
applicability of enhanced error handling mechanisms, it is possible | ||||
to divide these errors into a number of sub-classes, particularly | ||||
focusing around the location of the error within the UPDATE message. | ||||
Where an UPDATE message is considered invalid by a BGP speaker due to | 2. Problem Statement | |||
an error within a path attribute that is not the NLRI (where the | ||||
definition of NLRI includes reachability information encoded in the | ||||
MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in | ||||
[RFC4760]) it is a requirement of any enhanced error handling | ||||
mechanism to handle the error in a manner focused on the NLRI | ||||
contained within the message found to be erroneous. Since in this | ||||
case, the message received from the remote peer is syntactically | ||||
valid, it is considered that such an UPDATE is indicative of | ||||
erroneous data within one or more path attributes. The impact of the | ||||
current behaviour defined within the protocol makes the implication | ||||
that the BGP speaker from whom the message is received is now an | ||||
invalid path for all NLRI announced via the session - which results | ||||
in a disproportionate impact to overall network operation. In | ||||
particular scenarios (such as networks with centralised BGP route | ||||
reflection) such action can result in a loss of all reachability to a | ||||
network. In other contexts (such as the Internet DFZ), it cannot be | ||||
assumed that the BGP speaker from whom the UPDATE message is received | ||||
is directly responsible for the erroneous information contained | ||||
within the message. | ||||
Two further error cases exist within UPDATE messages, both of which | BGP has become a key intra- and inter-domain routing protocol, | |||
are related to the mechanisms that are applicable to messages | deployed within both the Internet and private networks. The | |||
received where some difficulty exists in parsing the entire BGP | increased reliance on the protocol has resulted in increased demand | |||
message. The two cases concern those cases where a valid NLRI | for robustness - with the error handling behaviour defined in | |||
attribute can be extracted, and those where such an attribute is not | [RFC4271] having been shown to have caused numerous incidents within | |||
able to be parsed. In these cases, errors in the packing of | live network deployments. This document provides an overview of the | |||
attributes within a BGP message may have occurred. Such errors are | current deployment cases for BGP-4, and define a set of requirements | |||
likely indicative of an error specifically caused by the remote BGP | (from the perspective of a network operator) for enhancing error | |||
speaker. It is, however, desirable to an operator that such errors | handling within the protocol. | |||
are handled without affecting all NLRI across a BGP session. As | ||||
such, there is a key requirement to maximise the number of cases in | ||||
which it is possible to extract NLRI from a BGP UPDATE message. To | ||||
this end, it is required that where possible the MP_REACH_NLRI and | ||||
MP_UNREACH_NLRI attributes are utilised for encoding all NLRI | ||||
(including IPv4 Unicast), and that this attribute is included as the | ||||
first attribute of a BGP UPDATE message (as originally recommended in | ||||
[I-D.chen-ebgp-error-handling]). Such a change to the order of | ||||
inclusion of this attribute maximises the number of cases in which | ||||
NLRI can be extracted from an UPDATE. Where this is possible, it is | ||||
again required that the error handling mechanisms utilised should be | ||||
directly applied to the NLRI included in the UPDATE. | ||||
For all cases whereby NLRI can be obtained from an UPDATE message, it | 2.1. Role of BGP-4 in Service Provider Networks | |||
is expected that the requirements outlined in Section 3 should be | ||||
considered by any enhancement to the BGP-4 protocol. | ||||
In the case that it is not possible to completely parse the NLRI | BGP was designed as an inter-autonomous system (AS) routing protocol. | |||
attribute from the UPDATE message received from a peer, it is | Many of the error handling mechanisms within the protocol are defined | |||
extremely likely that this is indicative of a serious error with | in order to be guarantee consistency, and correctness of information | |||
either the process of attribute packing, or buffer usage on the | between two neighbouring speakers. The assumption is made that each | |||
remote BGP speaker. In this case, clearly, it is not possible to | AS operates with many adjacencies, each propagating a relatively | |||
apply any error handling mechanism that is limited to a specific set | small amount of routing information. Through focusing on information | |||
of NLRI, since an implementation has no knowledge of the NLRI | consistency, the protocol specification prefers failure of an | |||
included within the UPDATE message. In addition, such errors are | individual routing adjacency to maintaining reachability to all NLRI | |||
considered to be relatively fundamental to the operation of a BGP | received from a particular neighbour, with the expectation that | |||
implementation, and hence may indicate a case whereby significant | alternate, less direct, paths can be selected where a failure occurs. | |||
system errors have occurred. The current BGP-4 standard results in a | The assumptions of the nature of BGP deployments resulted in the | |||
BGP speaker restarting a session with the remote BGP speaker. | specification made in [RFC4271] whereby the receipt of an erroneous | |||
However where such an error does occur, it is required that a | UPDATE message is reacted to by sending a NOTIFICATION message, and | |||
graceful mechanism is utilised to provide a lower impact to network | tearing down the adjacency with the remote speaker from whom the | |||
operation. The requirements for enhancements of this nature to BGP-4 | error was observed. | |||
are outlined in Section 5, with the requirements outlined therein | ||||
focused on providing a means by which system integrity can be | ||||
restored whilst allowing for continued network operation. | ||||
2.1. Classifying BGP Errors and Expected Error Handling | Historically, a network would deploy an interior gateway protocol | |||
(IGP) to carry infrastructure and customer routes, and utilise an | ||||
external gateway protocol (EGP) such as BGP to propagate routes to | ||||
other autonomous systems. However, BGP's deployments have evolved | ||||
with the growth of IP-based services. To ensure route convergence | ||||
within an AS is within acceptable time bounds the amount of | ||||
information within the IGP has been minimised (typically to only | ||||
infrastructure routes). iBGP is then utilised to carry both internal, | ||||
customer and external routes within an AS. As such, this has | ||||
resulted in BGP having become an IGP, with traditional IGPs providing | ||||
only reachability between nodes within the AS for packet forwarding | ||||
and to establish iBGP sessions. This change in role within the | ||||
overall architecture of an AS has resulted in an increased robustness | ||||
requirement for BGP, with the expectation of a similar level of | ||||
robustness to that of an IGP being set. The loss of an iBGP session | ||||
can result in significant levels of unreachability internally to an | ||||
AS, especially since there are typically limited (when compared to | ||||
the Internet) signalling and forwarding paths available. | ||||
It is clearly of advantage for BGP-4 implementations to utilise a | In parallel with this change of deployment, the volume and nature of | |||
consistent set of error handling mechanisms for the different types | the information carried within BGP has also changed. BGP has become | |||
of errors that are described in Section 2, and provide consistent | the ubiquitous means through which service information can be | |||
nomenclature to refer to them. It is therefore suggested that errors | propagated between devices. For instance, being utilised to carry | |||
that are indicative of larger scale failures of a BGP speaker, and | IP/MPLS service information such as Layer 3 IP VPN routes [RFC4364] , | |||
hence require some error handling at the session level are referred | and Layer 2 Virtual Private LAN Service device membership [RFC4761]. | |||
to as 'critical' errors, whilst those errors that are identified | Since these extensions to the protocol allow signalling of multiple | |||
based on incorrect content of one of more attributes of a message are | services (represented by address families within BGP), and multiple | |||
referred to as 'semantic' errors. | customer topologies (i.e., subsets of routes within each address | |||
family) via the BGP protocol, the impact of session failure is | ||||
increased. The tear down of a single BGP session can result in a | ||||
complete outage to all customer services signalled via the session, | ||||
even where the triggering event is related to only one service or | ||||
topology being carried - reflecting a disproportional impact to all | ||||
other services and routing topologies. | ||||
The errors identified within the following sections consider only | The convergence of services to IP, and BGP's changing deployment has | |||
those errors within the specifications at the time of writing, it is | resulted in a significant growth in the volume of routing information | |||
recommended that in the definition of future extensions to the BGP-4 | carried in the protocol. In numerous networks, the RIB size of | |||
specification, the error handling behaviour (and the category within | individual BGP speakers can be of the order of millions of paths. | |||
which errors within the extension should be considered by an | Particularly large RIBs are observed at BGP speakers performing | |||
implementation) is defined. | aggregation and border roles (such as ASBR, or route reflector | |||
hierarchies). This increased volume of routes results not only in a | ||||
significant number of services being impacted during a protocol | ||||
failure, but also increases the time to recovery after re- | ||||
establishing a BGP session. The time taken to learn, compute and | ||||
distribute new paths increases the impact of failures on services | ||||
carried by the network - adding further weight to the requirement to | ||||
avoid failures, or limit the extent of their impact. Furthermore, | ||||
the impact of individual session failures is increased due to the | ||||
existence of a relatively small number of highly-critical BGP | ||||
sessions within Internet and multi-service network deployments. | ||||
These sessions propagate a high-proportion of the reachability | ||||
information - for instance, providing an Internet AS with the global | ||||
routing table from upstream providers, or connecting IP/MPLS Provider | ||||
Edge devices to route reflector hierarchies from which they are | ||||
signalled reachability for services connected elsewhere within the | ||||
routing domain. In both cases, the failure of these sessions can | ||||
result in a significant outage to customer services. | ||||
2.1.1. Critical BGP Errors | For the current deployments of BGP, the behaviour described in | |||
[RFC4271] related to handling errors in UPDATE messages is | ||||
suboptimal, and results in significant disruption to services in | ||||
modern network deployments. This document defines a set of | ||||
requirements for protocol developments, and revisions to [RFC4271] to | ||||
address these concerns through a set of generalised definitions. It | ||||
should be noted that the scope of these requirements is limited to | ||||
the handling of UPDATE messages as, at the time of writing, there is | ||||
no operational requirement to amend the means by which error handling | ||||
in session establishment, or liveliness detection are performed. | ||||
As described in this document, it is of advantage to limit the number | 3. Critical and Non-Critical Errors | |||
of 'critical' errors that occur within the protocol, therefore, based | ||||
on analysis of the processing of BGP UPDATE messages, it is required | ||||
that 'critical' error handling behaviour is applied to: | ||||
o UPDATE Message Length errors - whereby the specified overall | As described in Section 2.1, the error handling behaviour described | |||
UPDATE message length is inconsistent with sum of the Total Path | in [RFC4271] is applied at a per-session level, affecting all NLRI | |||
Attribute and Withdrawn Routes length. In this case, this is | signalled via the adjacency on which an erroneous message is | |||
indicative of message packing failure, whereby the NLRI may not be | observed. In order to reduce the impact of error handling to those | |||
correctly extracted. | NLRI affected by an erroneous UPDATE, a BGP speaker MUST limit the | |||
error handling mechanisms implemented to those NLRI contained within | ||||
an erroneous UPDATE message where it is possible to do so. Clearly, | ||||
some errors within the formation of BGP UPDATE messages may result in | ||||
it being impossible to reliably extract NLRI from the received | ||||
message, and hence the same error handling procedures may not apply. | ||||
There is therefore a requirement to classify errors based on their | ||||
impact to the BGP UPDATE message, hence messages whereby the NLRI | ||||
attribute cannot be extracted or parsed are referred to throughout | ||||
this document as Critical errors. These Critical errors are limited | ||||
to: | ||||
o Errors Parsing the NLRI attributes of an UPDATE message - where | o UPDATE Message Length errors - where the specified UPDATE message | |||
NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn | length is inconsistent with the sum of the Total Path Attribute | |||
routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes | and Withdrawn Routes length. These errors relate to message | |||
[RFC2858], it is not possible to target error handling mechanisms | packing or framing, and result in cases whereby the NLRI attribute | |||
to specific NLRI, and hence session level mechanisms must be | cannot be correctly extracted from the message. | |||
utilised. | ||||
It is expected that those requirements outlined in Section 5 are | o Errors parsing the NLRI attribute of an UPDATE message - where the | |||
utilised to provide session-level handling of those errors identified | contents of the IPv4 Unicast Advertised or Withdrawn Routes | |||
as 'critical'. | attributes, or multi-protocol BGP NLRI attributes (MP_REACH_NLRI | |||
and/or MP_UNREACH_NLRI as defined in [RFC2858]), cannot be | ||||
successfully parsed. | ||||
2.1.2. Semantic BGP Errors | In the case of Critical errors is expected that error handling is | |||
applied at a session level as per Section 5 of this document. | ||||
Where a BGP message is correctly formed, a number of cases exist | All errors whereby the contained NLRI can be extracted, are referred | |||
whereby the contents of the UPDATE are not valid - in these cases, | to as Non-Critical. It is expected that the following cases fall | |||
this represents errors that can be identified to affect specific | within this category: | |||
NLRI. The following cases are expected to be classified as semantic | ||||
errors: | ||||
o Zero or invalid length errors in path attributes excluding those | o Zero or invalid length errors in path attributes, excluding those | |||
containing NLRI, or where the length of all path attributes | containing NLRI, or where the length of all path attributes | |||
contained within the UPDATE does not correspond to the total path | contained within the UPDATE does not correspond to the total path | |||
attributes length. In this case, the NLRI can be correctly | attribute length. | |||
extracted, and hence acted upon. | ||||
o Messages where invalid data or flags are contained in a path | o Messages where invalid data or flags are contained in a path | |||
attribute that does not relate to the NLRI. | attribute that does not relate to the NLRI. | |||
o UPDATE messages missing mandatory attributes, unrecognised non- | o UPDATE messages missing mandatory attributes, unrecognised non- | |||
optional attributes or those that contain duplicate or invalid | optional attributes, or those that contain duplicate or invalid | |||
attributes (be they unsupported or unexpected). | attributes (be they unsupported, or unexpected). | |||
o Those messages where the NEXT_HOP, or MP_REACH next-hop values are | ||||
missing, length zero, or invalid for the relevant AFI/SAFI. | ||||
In these cases, it is expected that these errors can be handled | ||||
gracefully, following the requirements detailed in Section 3 and | ||||
Section 4 of this memo. | ||||
3. Avoiding use of NOTIFICATION | ||||
The error handling behaviour defined in RFC4271 is problematic due to | ||||
the limited options that are available to an implementation. When an | ||||
erroneous BGP message is received, at the current time, the | ||||
implementation must either ignore the error, or send a NOTIFICATION | ||||
message, after which it is mandatory to terminate the BGP session. | ||||
It is apparent that this requirement is at odds with that of protocol | ||||
robustness. | ||||
There is significant complexity to this requirement. The mechanism | ||||
defined in [I-D.chen-ebgp-error-handling] describes a means by which | ||||
no NOTIFICATION message is generated for all cases whereby NLRI can | ||||
be extracted from an UPDATE. The NLRI contained within the erroneous | ||||
UPDATE message is considered as though the remote BGP speaker has | ||||
provided an UPDATE marking it as withdrawn. This results in a limit | ||||
in the propagation of the invalid routing information, whilst also | ||||
ensuring that no traffic is forwarded via a previously-known path | ||||
that may no longer be valid. This mechanism is referred to as | ||||
"treat-as-withdraw". | ||||
Whilst this behaviour results in avoiding a NOTIFICATION message, | ||||
keeping other routing information advertised by the remote BGP | ||||
speaker within the RIB, it may result in unreachability for a sub-set | ||||
of the NLRI advertised by the remote speaker. Two cases should be | ||||
considered - that where the entry for a route in the Adj-RIB-In of | ||||
the neighbour propagating an erroneous packet is utilised, and that | ||||
where the route installed in the device's RIB is learnt from another | ||||
BGP speaker. In the former case, should the identified NLRI not be | ||||
treated as withdrawn, the original NLRI is utilised within the global | ||||
RIB. However, this information is potentially now invalid (i.e. it | ||||
no longer provides a valid forwarding path), whilst an alternate | ||||
(valid) path may exist in another Adj-RIB-In. By continuing to | ||||
utilise the NLRI for which the UPDATE was considered invalid, traffic | ||||
may be forwarded via an invalid path, resulting in routing loops, or | ||||
black-holing. In the second case, no impact to the forwarding of | ||||
traffic, or global RIB, is incurred, yet where treat-as-withdraw is | ||||
implemented, possibly stale routing information is purged from the | ||||
Adj-RIB-In of the neighbour propagating errors. | ||||
Whilst mechanisms such as "treat-as-withdraw" are currently | ||||
documented, the proposals are limited in their scope - particularly | ||||
in terms of restrictions to implementation only on eBGP sessions. | ||||
This limitation is made based on the view that the BGP RIB must be | ||||
consistent across an autonomous system. By implementing treat-as- | ||||
withdraw for a iBGP session, one or more routers within the | ||||
Autonomous System may not have reachability to a route, and hence | ||||
blackholing of traffic, or routing loops, may occur. It should, | ||||
however, be considered if this view is valid, in light of the manner | ||||
in which BGP is utilised within operator networks. Inconsistency in | ||||
a RIB based on a single UPDATE being treated as withdrawn may cause a | ||||
inconsistency in a single sub-topology (e.g. Layer 3 VPN service), | ||||
or a service not operating completely (in the case of an UPDATE | ||||
carrying service membership information). Where a NOTIFICATION and | ||||
teardown is utilised this is destructive to all sub-topologies in all | ||||
address family identifiers (AFIs) carried by the session in question. | ||||
Even where mechanisms such as multi-session BGP are utilised, a whole | ||||
AFI is affected by such a NOTIFICATION message. In terms of routing | ||||
operation, it is therefore far less costly to endure a situation | ||||
where a limited sub-set of routing information within an AS is | ||||
invalid, than to consider all routing information as invalid based on | ||||
a single trigger. | ||||
At the time of writing, error handling mechanisms related to | ||||
optional, transitive attributes - such as | ||||
[I-D.ietf-idr-optional-transitive] are restricted to handling only a | ||||
subset of attribute errors - whereas the operational requirement is | ||||
to expand this coverage to the widest set of errors possible (i.e., | ||||
all semantic errors within UPDATE messages). Additionally, where | ||||
approaches applicable to a greater number of attributes are proposed | ||||
(e.g., [I-D.chen-ebgp-error-handling]), these are limited to | ||||
deployment in eBGP applications only, where requirements also exist | ||||
in intra-domain cases. As such, it is envisaged that if extended to | ||||
cover these expanded cases, these mechanisms provide a means to avoid | ||||
the transmission of a NOTIFICATION message to a remote BGP speaker, | ||||
based on a single erroneous message, where at all possible, and hence | ||||
meet this requirement. Critical errors, including those whereby the | ||||
NLRI cannot be extracted from the UPDATE message, represent cases | ||||
whereby the receiving system cannot handle the error gracefully based | ||||
on this mechanism. | ||||
4. Recovering RIB Consistency | ||||
The recommendations described in Section 3 may result in the RIB for | ||||
a topology within an AS being inconsistent across the AS' internal | ||||
routers. Alternatively, where such mechanisms are deployed at an AS | ||||
boundary, interconnects between two ASes may be inconsistent with | ||||
each other. There are therefore risks of traffic blackholing, due to | ||||
missing routing information, or forwarding loops. Whilst this is | ||||
deemed an acceptable compromise in the short term, clearly, it is | ||||
suboptimal. Therefore, a requirement exists to provide mechanisms by | ||||
which a BGP speaker is able to recover the consistency of the Adj- | ||||
RIB-In for a particular neighbour. | ||||
In the general case, the consistency of the BGP RIB can be recovered | ||||
by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is | ||||
re-advertised. A mechanism to achieve this re-advertisement is | ||||
defined within the ROUTE-REFRESH specification [RFC2918]. It is | ||||
envisaged that by requesting a refresh of all NLRI advertised by a | ||||
BGP speaker, any NLRI which has been withdrawn due to being contained | ||||
within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH | ||||
is used to directly perform a consistency check between the Adj-RIB- | ||||
Out of a remote device, and the Adj-RIB-In of the local BGP speaker, | ||||
a demarcation between the ROUTE-REFRESH, and normal UPDATE messages | ||||
is required (in order that an "end" of the refresh can be used to | ||||
identify any 'stale' NLRI) - | ||||
[I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which | ||||
the ROUTE-REFRESH mechanism can be extended to meet this requirement. | ||||
Whilst re-advertisement of the whole BGP RIB provides a means by | ||||
which withdrawn NLRI can be re-advertised, there are some scaling | ||||
implications that must be considered. In the case that a ROUTE- | ||||
REFRESH is generated, all NLRI must be re-packed into UPDATE messages | ||||
and advertised by one speaker on the BGP session, whilst the other | ||||
must receive all UPDATE messages, and validate the RIB's consistency. | ||||
In order to avoid the control-plane load, it is therefore a | ||||
requirement to utilise targeted mechanisms where possible, rather | ||||
than incurring the additional load on both the advertising and | ||||
receiving speaker of building and processing UPDATEs for the entire | ||||
contents of the RIB. | ||||
It is envisaged that during routing inconsistencies caused by | ||||
utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is | ||||
aware that some routing information was not able to be processed - | ||||
due to the fact that an UPDATE message was not parsed correctly. | ||||
Since this mechanism (as discussed in Section 3) requires the local | ||||
BGP speaker to have determined the set of NLRI for which an erroneous | ||||
UPDATE message was received, it is possible to use a targeted | ||||
mechanisms to re-request the specific NLRI that was contained within | ||||
the erroneous UPDATE message. By re-requesting, this provides the | ||||
remote BGP speaker an opportunity to re-transmit the NLRI - possibly | ||||
providing an opportunity to leverage alternative methods to build the | ||||
UPDATE message. Such a request requires extension to the existing | ||||
BGP-4 protocol, in terms of specific UPDATE generation filters with a | ||||
transient lifetime. It is envisaged that the work within | ||||
[I-D.zeng-idr-one-time-prefix-orf] provides a mechanism allowing | ||||
targeted elements of the Adj-RIB-In for a BGP neighbour to be | ||||
recovered. | ||||
It is of particular note for both means of recovering RIB consistency | ||||
described that these are effective only when considering transient | ||||
errors within an implementation - for instance, should an RFC | ||||
interpretation error within an implementation be present, regardless | ||||
of the number of times a specific UPDATE is generated, it is likely | ||||
that this error condition will persist (as it may with the existing | ||||
behaviour defined by [RFC4271]). For this reason, there is an | ||||
requirement to consider the means by which such consistency recovery | ||||
mechanisms are utilised. It is not advisable that a dynamic filter | ||||
and advertisement mechanism is triggered by all error handling events | ||||
due to the load this is likely to place on the neighbour receiving | ||||
such a request. Where this BGP speaker is a relatively centralised | ||||
device - a route reflector (as described by [RFC4456]) for example - | ||||
the act of generation of UPDATE messages with such frequency is | ||||
likely to cause disproportionate load. It is therefore an | ||||
operational requirement of such mechanisms that means of request | ||||
dampening be required by any such extension. | ||||
In cases whereby the consistency of the Adj-RIB-In is to be restored | ||||
(e.g., following the 'treat-as-withdraw' behaviour described in | ||||
Section 3), and mechanisms such as those described herein are | ||||
triggered, such a condition should be noted to an operator by means | ||||
of a specific flag, SNMP trap, or other logging mechanism. In order | ||||
to identify the subset of NLRI that are considered to be | ||||
inconsistent, this information is of operational benefit and hence | ||||
should be logged. | ||||
5. Reducing the Impact of Session Reset | ||||
Even where protocol enhancements allow errors in the BGP-4 protocol | ||||
to cease to trigger NOTIFICATION messages, and hence reset a BGP | ||||
session, it is clear that some error conditions may not be exited. | ||||
In particular, errors due to existing state, or memory structures, | ||||
associated with a specific BGP session will not be handled. It is | ||||
therefore important to consider how these error conditions are | ||||
currently handled by the protocol. It should be noted that the | ||||
following discussion and analysis considers only those NOTIFICATION | ||||
messages generated in response to errors in UPDATE messages (as | ||||
defined by Section 6.3 in [RFC4271]). | ||||
The existing NOTIFICATION behaviour triggers a reset of all elements | ||||
of the BGP-4 session, as described in Section 6 of [RFC4271]. It is | ||||
expected that session teardown requires an implementation to re- | ||||
initialise all structures and state required for session maintenance. | ||||
Clearly, there is some utility to this requirement, as error | ||||
conditions in BGP are, in general, exited from. However, this | ||||
definition is responsible for the forwarding outages within networks | ||||
utilising BGP for propagation of routing or service when each error | ||||
is experienced. The requirement described in Section 3 is intended | ||||
to reduce the cases whereby a NOTIFICATION is required, however, any | ||||
mechanism implemented as a response to this requirement by definition | ||||
cannot provide a session reset to the extent of that achieved by the | ||||
current behaviour. | ||||
In order to address this, there is a requirement for a means by which | ||||
a BGP speaker can signal that an unhandled error condition in an | ||||
UPDATE message occurred - requiring a session reset - yet also | ||||
continue to utilise the paths advertised by the neighbour that are | ||||
currently in use within the RIB. In this case, the Adj-RIB-In | ||||
received from the neighbour is not considered invalid, despite a | ||||
NOTIFICATION, and session reset, being required. This set of | ||||
requirements is akin to those answered by the BGP Graceful Restart | ||||
mechanism described in [RFC4724]. Since the operational requirement | ||||
in this case is to provide a means to achieve a complete session | ||||
restart without disrupting the forwarding path of those routes in use | ||||
within a BGP speaker's RIB, it is expected that utilising a procedure | ||||
similar to the Graceful Restart mechanism meets the error handling | ||||
requirement. By responding to an error condition (repeated or | ||||
otherwise) with a message indicating that an error that cannot be | ||||
handled has occurred, forcing session reset, whilst retaining | ||||
forwarding information within the RIB allows forwarding to all routes | ||||
within a system's RIB to continue during the period in which the | ||||
session restarts. It is envisaged that the additional complexity | ||||
introduced by the introduction of such a mechanism can be limited by | ||||
extending existing BGP messages - one such approach is proposed in | ||||
[I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the | ||||
restart lifetime, should an error condition not be transient - for | ||||
example, should an error have occurred with the BGP process, rather | ||||
than a specific of the BGP session - the remote BGP speaker is still | ||||
detected as an invalid device for forwarding. | ||||
In some cases, the erroneous condition may be due to corruption of | ||||
the Adj-RIB-Out on the advertising BGP speaker - rather than caused | ||||
by the receiving speaker's state. In these cases, where existing | ||||
structures are replayed whilst performing graceful restart | ||||
functionality, the error condition is not necessarily resolved. | ||||
Therefore, it is recommended that during a session restart event, as | ||||
described within this section, the advertising speaker purge and | ||||
rebuild RIB structures, in order to resolve any corruption within | ||||
these structures. | ||||
It should be noted that a protocol enhancement meeting this | o Those messages where the NEXT_HOP, the MP_REACH_NLRI next-hop | |||
requirement is not able to solve all error conditions - however, a | values are missing, zero-length, or invalid for the relevant | |||
complete restart of the BGP and TCP session between two BGP speakers | address family. | |||
implements an identical recovery mechanism to that which is achieved | ||||
by the existing behaviour. Where an error condition such as memory | ||||
or configuration corruption has occurred in a BGP implementation, it | ||||
is expected that a mechanism meeting this requirement continues to | ||||
detect this, by means of a bound on time for session restart to | ||||
occur. Whilst there may be some consideration that packets continue | ||||
to be forwarded through a device which can be in an failure mode of | ||||
this nature for a longer period due to this requirement, the | ||||
architecture of modern IP routers should be considered. A divided | ||||
forwarding and control plane is common in many devices, as well as | ||||
process separation for software-based devices - corruption of a | ||||
specific protocol daemon does not necessarily imply forwarding is | ||||
affected. Indeed, where forwarding behaviour of a device is | ||||
affected, it is envisaged that a failure detection mechanism (be it | ||||
Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) | ||||
will detect such a failure in almost all cases, with the symptomatic | ||||
behaviour of such a failure being an invalid UPDATE message in very | ||||
few other cases. | ||||
6. Operational Toolset for Monitoring BGP | For these Non-Critical errors, the NLRI-targeted error handling | |||
requirements described in Section 4 should be followed. | ||||
A significant complexity that is introduced through the requirements | In order to maximise the number of cases whereby the NLRI attributes | |||
defined in this document is that of monitoring BGP session status for | can be reliably extracted from a received message, where a BGP | |||
an operator. Although the existing error handling behaviour causes a | speaker supports multi-protocol extensions, the MP_REACH_NLRI and | |||
disproportionate failure, session failure is extremely visible to | MP_UNREACH_NLRI attributes SHOULD be utilised for all address | |||
most operational personnel within a Network Operator due to both | families (including IPv4 Unicast) and these attributes should be the | |||
existing definitions of SNMP trap mechanisms for BGP, along with the | first attribute contained within the UPDATE message. | |||
forwarding impact typically caused by such a failure. By introducing | ||||
mechanisms by which errors of this nature are not as visible, this is | ||||
no longer the case. There is a requirement that where subsets of the | ||||
RIB on a device are no longer reachable from a BGP speaker, or indeed | ||||
an AS, that some visibility of this situation, alongside a mechanism | ||||
to determine the cause is available to an operator. Whilst, to some | ||||
extent, this can be solved by mandating a sub-requirement of each of | ||||
the aforementioned requirements that a BGP speaker must log where | ||||
such errors occur, and are hence handled, this does not solve all | ||||
cases. In order to clarify this requirement, the example of the | ||||
transmission of an erroneous Optional Transitive attribute can be | ||||
considered. Since, by definition, there is no requirement for all | ||||
BGP speakers to parse such an attribute, a receiving router may treat | ||||
NLRI as withdrawn based on an erroneous attribute not examined by its | ||||
neighbour. In this case, the upstream device or network, propagating | ||||
the UPDATE, has no visibility of this error. Operationally, however, | ||||
it is of interest to the upstream router operator that such invalid | ||||
information was propagated. | ||||
The requirement for logging of error conditions in transmitted BGP | Where attributes are introduced by future extensions to the BGP | |||
messages, which are visible to only the receiver, cannot be achieved | protocol the error handling behaviour applied MUST be assumed that | |||
by any existing BGP message, or capability. It is envisaged that | applied to Non-Critical errors, unless otherwise specified within the | |||
each erroneous event should be transmitted to the remote peer - | per-extension memo, or the attribute relates directly to carrying | |||
including the information as to the set of NLRI that were considered | NLRI. Authors of future BGP extensions SHOULD specify the error | |||
invalid. Whilst with some mechanisms this is achieved by default | handling behaviour required for new attributes in terms of the | |||
(for example, One-Time Prefix ORF [I-D.zeng-idr-one-time-prefix-orf] | classification into a Critical or Non-Critical error on a per- | |||
(Outbound Route Filtering) will transmit the set of routes that are | attribute error basis. | |||
required), the operator requirement is to know which routes may have | ||||
been unreachable in all cases. It is envisaged that an extension to | ||||
meet this requirement will allow for such information to be | ||||
transmitted between peers, and hence logged. Such a mechanism may | ||||
provide further utility as a either a diagnostic, or logging toolset. | ||||
As such, it is possible to divide the messages that are required in | 4. Error Handling for Non-Critical Errors | |||
order to provide further visibility into BGP for an operator. Such a | ||||
division can be made both due to the required means of message | ||||
transmission, alongside the criticality of each request. | ||||
o Messages required to replace NOTIFICATION - In cases where the | 4.1. NLRI-level Error Handling Requirements | |||
error handling mechanisms defined by [RFC4271] currently result in | ||||
a NOTIFICATION message being generated, a number of the | ||||
requirements detailed within this document result this message | ||||
being suppressed. Despite this change, the error condition's | ||||
occurrence is still of interest to an operator in order to provide | ||||
both monitoring and troubleshooting capabilities, since some form | ||||
of invalid data has been received on a session. It therefore | ||||
considered that an implementation must generate a message both | ||||
locally, and transmitted to the remote peer, based on the such a | ||||
condition. Where such a message is transmitted to the remote | ||||
peer, it is considered that the BGP session via which the | ||||
erroneous UPDATE message was received should be used as transport | ||||
to the remote peer. The information transmitted in such a message | ||||
should be minimised to allow identification of the paths which | ||||
were considered erroneous (i.e. restricting the information to | ||||
that which is directly relevant to a network operator in the case | ||||
of an error condition occurring). Any delay to convergence on the | ||||
session in question is considered to be acceptable, given the | ||||
suboptimal nature of the reception of invalid routing information | ||||
via a BGP session. Further concerns regarding such a mechanism | ||||
relate to the load generated on the BGP speaker in question, | ||||
however, it must be considered that in the case of an erroneous | ||||
UPDATE being received, and the 'treat-as-withdraw' mechanism being | ||||
utilised, where the erroneous path is removed from the Loc-RIB, | ||||
there is likely to be a requirement to generate UPDATE messages | ||||
withdrawing the route from all further BGP speakers to which the | ||||
prefix is advertised. The load generated by the generation of | ||||
such UPDATEs is likely to be much greater than that of | ||||
transmitting error information via a logging message type back to | ||||
the speaker from which it was received. It is envisaged that | ||||
light-weight BGP message-based signalling mechanisms such as the | ||||
ADVISORY message types detailed in | ||||
[I-D.ietf-idr-operational-message] provide a suitable means to | ||||
satisfy this requirement. | ||||
o Additional Diagnostic Capabilities for BGP - In a number of cases, | When a Non-Critical error is detected within an UPDATE message a BGP | |||
there is an operational requirement to further debug erroneous BGP | speaker MUST NOT send a NOTIFICATION message to the remote neighbour. | |||
UPDATE messages, along with the particulars of the state of a BGP | Instead, the NLRI contained within the message MUST be considered as | |||
speaker. For instance, where an invalid BGP UPDATE message is | no longer viable until they are updated by a subsequent UPDATE | |||
transmitted between two BGP speakers, the exact format of the | message, thus treating the NLRI as withdrawn as per the treat-as- | |||
UPDATE message is of interest to an operator, as this information | withdraw mechanism described in [I-D.chen-ebgp-error-handling]. | |||
provides a clear indication of an message considered to be | ||||
erroneous by the BGP speaker to which it was transmitted. In this | ||||
case, it is considered of great utility that the entire UPDATE | ||||
message is transmitted back to the advertising speaker, in order | ||||
to allow for further debugging to occur. Whilst such information | ||||
is particularly useful to an operator, it clearly provides | ||||
information that is not key to protocol operation - for this | ||||
reason, it is expected that some of the concerns regarding the | ||||
additional complexity, and load that a BGP speaker is subjected to | ||||
is not acceptable. For this reason, it is required that where | ||||
mechanisms are developed to support this requirement, messages of | ||||
this nature can be supported both within an existing BGP session, | ||||
and via a dedicated separate session, be it BGP carrying messages | ||||
such as those defined in [I-D.ietf-idr-operational-message] or a | ||||
dedicated monitoring protocol akin to BMP described in | ||||
[I-D.ietf-grow-bmp]. | ||||
Whilst the operational requirement for such monitoring tools to allow | Network operators SHOULD recognise that where such behaviour is | |||
for visibility into BGP is clearly agreed upon, the means by which | implemented black-holing or looping of traffic may occur in the | |||
such messages are transmitted between two BGP speakers is likely to | period between the NLRI being treated as withdrawn, and subsequent | |||
be dependent upon both the positions of the speakers in question (for | updates, dependent upon the routing topology. It SHOULD be noted | |||
instances, the requirements for such a protocol may differ where a | that such periods of RIB inconsistency (where one speaker has | |||
session is between two ASBRs under separate administration). The | advertised a prefix, which has been treated as withdrawn by the | |||
introduction of additional message types to the BGP protocol clearly | receiving speaker) may be relatively long lived, based on situations | |||
introduces further complexity - and leaves room for further | such as an erroneous implementation at the receiver, or the error | |||
implementation and standardisation errors that may compromise the | occurring within an optional, transitive attribute not examined by | |||
robustness of the BGP protocol. In addition, the queuing and | the advertising device. In order to allow operators to select | |||
scheduling of these BGP messages must be interleaved with the | sessions on which this risk of inconsistency is acceptable, an | |||
transmission of the key protocol messages - such as KEEPALIVE and | implementation SHOULD provide means by which NLRI-level error | |||
UPDATE packets. It is therefore a concern that should a large number | handling for Non-Critical errors can be disabled on a per-session | |||
of messages specifically for operational visibility be transmitted, | basis. | |||
this will delay the transmission of UPDATE packets, and hence | ||||
adversely affect the end-to-end convergence time for NLRI carried | ||||
within BGP. The operational requirement for why messages are | ||||
advantageous to be in-band to a protocol should also be considered. | ||||
In particular, it should be noted that where such information is to | ||||
be transmitted between administrative boundaries a BGP session | ||||
represents an existing channel between the two ASes. This channel is | ||||
considered to be secure insofar as the routing information, and | ||||
requests sent via the session are considered to come from a trusted | ||||
source. Since error information relates to both a particular | ||||
attachment, and is key to ensuring that such a session is operating | ||||
as expected, it is considered of great operational benefit that this | ||||
information is transmitted over this channel. In addition, the | ||||
overall system scalability is improved by such in-band transmission. | ||||
It is expected that erroneous information resulting in the 'treat-as- | ||||
withdraw' mechanism being utilised is relatively infrequently | ||||
transmitted between two peers (when compared to the frequency of | ||||
UPDATE messages transmission). The impact of including an additional | ||||
BGP message type for such operational visibility is relatively small | ||||
from a resource utilisation perspective - additional processing | ||||
overhead is only experienced when such a message is received. Where | ||||
a separate session is maintained, particular network elements within | ||||
a service provider topology may require hundreds, or thousands, of | ||||
additional sessions for the transmission of this information. Such | ||||
an resource consumption overhead is likely to be unacceptable to some | ||||
network operators. | ||||
For the reasons explained above, it is expected that mechanisms | Since the Non-Critical error handling required within this section | |||
specified to meet the requirements for event visibility consider the | results in no NOTIFICATION message being transmitted, the fact that | |||
relative impacts of additional monitoring sessions, or message | an error has occurred and hence there may be inconsistency between | |||
inclusion in band to BGP in order not to compromise the security, | the local and remote BGP speaker MUST be flagged to the network | |||
scalability and robustness of the BGP-4 protocol. | operator through standard operational interfaces (e.g., SNMP, | |||
syslog). The information highlighted MUST include the NLRI | ||||
identified to be contained within the error message, and SHOULD | ||||
contain a exact copy of the received message for further analysis. | ||||
7. Operational Complexities Introduced by Altering RFC4271 | In order that the operator of the BGP speaker from whom an erroneous | |||
UPDATE message has been advertised is aware of the fact that some | ||||
NLRI advertised to the remote speaker have been considered withdrawn | ||||
due to being contained within an erroneous UPDATE, a BGP speaker | ||||
SHOULD support mechanisms to report the occurrence of Non-Critical | ||||
error handling to the remote speaker. The receiving speaker SHOULD | ||||
transmit the NLRI contained within the erroneous message to the | ||||
advertising speaker. An exact copy of the received UPDATE message | ||||
SHOULD also be sent. | ||||
The existing NOTIFICATION and subsequent teardown of a BGP session | The exchange of information related to events occurring as a result | |||
upon encountering an error has the advantage that a consistent | of BGP messages is not currently supported by any extension to the | |||
approach to error handling is required of all implementations of the | protocol. Clearly, where the two speakers reside within the same | |||
BGP-4 protocol. This is of operational advantage as it provides a | administrative domain, shared logging infrastructure can be utilised | |||
clear expectation of the behaviour of the protocol. The requirements | to identify the root cause of errors, however, in many cases | |||
defined herein add further complexity to the error-handling within | neighbouring BGP speakers reside within separate administrative | |||
BGP, and hence are liable to compromise the existing deterministic | domains (e.g., are ASBRs for Internet or private networks). In this | |||
protocol behaviour. It is therefore deemed that there is a further | case, mechanisms allowing transmission in-band to the BGP session | |||
requirement to define a set of recommended behaviours based on the | SHOULD be utilised (e.g., the OPERATIONAL message described in | |||
reception of a particular class of erroneous UPDATE message, | [I-D.ietf-idr-operational-message]). Such an in-band channel is | |||
alongside highlighting some of the implementation complexities that | preferred based on the BGP session representing a pre-established | |||
may need to be handled in the case that particular recommendations | trusted channel which is related to a specific BGP-speaking device | |||
made within this memo are deployed. | within a network. It is expected that the overall system scalability | |||
of a BGP speaker is improved through utilising the existing channel, | ||||
rather than incurring overhead for maintaining many additional | ||||
logging-specific protocol sessions for relatively infrequent | ||||
messaging events when errors occur. However, the extensions | ||||
providing such a channel MUST consider their impact to base BGP | ||||
protocol functions such as the transmission of UPDATE or KEEPALIVE | ||||
messages, and SHOULD limit the volume of messaging to direct | ||||
reactions to Non-Critical errors occurring. These considerations | ||||
SHOULD be made in order to ensure that no compromise is made to the | ||||
security, scalability and robustness of BGP. Where additional BGP | ||||
monitoring information that is not suitable to be carried in-band is | ||||
required, out-of-band mechanisms such as the BMP protocol described | ||||
in [I-D.ietf-grow-bmp] could be utilised to provide further | ||||
information relating to erroneous messages. | ||||
Utilising the classes of erroneous UPDATE message described in | 4.2. Recovering RIB Consistency following NLRI-level Error Handling | |||
Section 2, the recommended behaviour for a BGP-4 implementation can | ||||
be divided into two branches. Primarily, where a semantic error is | ||||
identified, an implementation is expected to utilise the reduced- | ||||
impact error handling approach, as described in Section 3. In the | ||||
case that such an approach results in known NLRI being withdrawn from | ||||
the BGP speaker's RIB, and an implementation provides functionality | ||||
such that these errors are recovered from through an automatically | ||||
triggered means, such as those described within Section 4, some | ||||
consideration of the scalability of these recovery mechanisms is | ||||
required. Clearly, there is an computational and bandwidth overhead | ||||
associated with the re-advertisement of NLRI between two BGP speakers | ||||
- both due to the generation of UPDATE messages, their transmission | ||||
between the two speakers, and the parsing and processing into the RIB | ||||
required. This overhead is directly proportional to the number of | ||||
UPDATE messages that are required. Where a semantic error is | ||||
experienced, by definition the NLRI contained within the UPDATE can | ||||
be extracted. It is therefore possible to minimise the proportion of | ||||
the RIB that is re-advertised by targeting any recovery mechanism on | ||||
the NLRI contained within the erroneous UPDATE. Such a targeted | ||||
mechanism can be achieved through a means such as One-Time ORF, or | ||||
other means of targeting UPDATE messages not discussed within this | ||||
memo. It is recommended that where available, any automatic (or | ||||
manual) triggered recovery mechanism behaviour utilises such targeted | ||||
means in preference to any whole RIB refresh mechanism (such as | ||||
ROUTE-REFRESH). | ||||
In the case that an erroneous UPDATE has been processed through a | Following NLRI being treated as withdrawn due to Non-Critical error | |||
means such as treat-as-withdraw (described within Section 3), a | handling, inconsistencies exist between the Adj-RIB-Out of the | |||
recovering mechanism may be considered superfluous, if the assumption | advertising BGP speaker, and the Adj-RIB-In of the receiving device. | |||
is made that the RIB inconsistency will only be recovered from based | These inconsistencies may result in forwarding loops or blackholing | |||
on a path re-convergence (or change in BGP attribute) for the | of traffic in some routing topologies. In order to ensure that such | |||
advertising BGP speaker. However, where this assumption is not | cases can be recovered from a means by which a validation and | |||
considered to provide adequate recovery behaviour, and a mechanism to | recovery of consistency can be achieved SHOULD be provided to an | |||
restore RIB consistency automatically is implemented, some | operator. This function may be provided through enhancing the ROUTE- | |||
consideration must be made for where repeated erroneous messages | REFRESH [RFC2918] mechanism to add means to identify the beginning | |||
occur. In this case, in order to limit the impact to the BGP | and end of a replay of the entire Adj-RIB-Out of the advertising | |||
speaker's network operation, at a pre-defined point it is recommended | speaker (as per the suggestion in | |||
that such automatic recovery mechanisms towards the BGP speaker from | [I-D.ietf-idr-bgp-enhanced-route-refresh]). | |||
which erroneous UPDATEs are repeatedly received are suppressed, and | ||||
the fact that such suppression has occurred is highlighted to an | ||||
operator. The point at which such behaviour is suppressed is to be | ||||
defined on a per-implementation basis, taking into account feedback | ||||
from the Network Operator community based on the deployment of the | ||||
recommendations described in this document. It is expected that such | ||||
trigger points are dependent upon the mechanisms implemented for a | ||||
particular BGP-4 implementations, and the impact upon the speaker of | ||||
these means of RIB recovery. | ||||
Where critical errors are experienced, such that a session reset is | As Non-Critical error handling is localised to the NLRI contained | |||
required, the mechanism discussed in Section 5 should be used. | within the erroneous UPDATE message, a targeted recovery mechanism | |||
Again, since such a mechanism results in a restart of a BGP session, | MAY be provided allowing a speaker to request re-advertisement of a | |||
it expected that all NLRI carried over the session is re-advertised | particular subset of the Adj-RIB-Out. Where such targeted refresh | |||
as it is re-established, incurring processing overhead on both the | functions are available, they SHOULD be preferred to mechanisms | |||
advertising and receiving BGP speaker. In order to minimise the | requesting re-advertisement of the whole Adj-RIB-Out based on their | |||
consumption of control-plane computational resource on both speakers, | more limited use of CPU and network resources. | |||
it is recommended that mechanisms allowing a reduced set of BGP | ||||
UPDATE messages to be re-transmitted between two speakers are | ||||
employed wherever possible - for instance through employing | ||||
mechanisms such as those described in [I-D.ietf-idr-enhanced-gr]. | ||||
In the case that repeated critical errors occur, the overhead of | A BGP speaker may automatically trigger recovery mechanisms such as | |||
performing any mechanism implemented based on the requirements in | those described in this section following the receipt of an erroneous | |||
Section 5 is incurred following each erroneous UPDATE message. Since | UPDATE message identified as Non-Critical to expedite recovery. It | |||
these mechanisms are, by definition, performed automatically in | should be noted that if automatic recovery mechanisms trigger only | |||
response to the erroneous message being received similar | re-advertisement of an identical erroneous message, they are likely | |||
considerations as to the impact to the BGP speaker must be taken into | to be ineffective. Additionally, where the best-path to be | |||
account. As such, it is expected that after a certain trigger level, | advertised by remote speaker changes, this will be advertised | |||
the ongoing receipt of critical errors within BGP UPDATE messages is | directly, without a requirement for a request from the receiver. | |||
deemed to be indicative of a long-lasting failure, and a session no | However, in some cases, RIB consistency recovery mechanisms may | |||
longer considered viable. Where such an case is experienced, it is | prompt alternate UPDATE message packing, and hence allow quicker | |||
expected that the BGP session reverts to the standard session failure | recovery. Where such mechanisms are implemented, mechanisms focused | |||
behaviour, as described in [RFC4271] and documents updating this base | to smaller sets of NLRI SHOULD be preferred over those requesting the | |||
standard. Where such a reversion is implemented this condition | entire RIB. In addition, such mechanisms SHOULD have dampening | |||
should be flagged to an network operator. The number of restart | mechanisms to ensure that their impact to computational and network | |||
attempts before the session reverts to being shut down should be | resources is limited. | |||
determined based on the overhead of the recovery mechanisms | ||||
implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is | ||||
implemented, the impact of session restart may be significantly | ||||
lower), and operational experience of the deployment of the | ||||
recommendations described in this document. | ||||
Since repeated erroneous UPDATE messages which experience critical | 5. Error Handling for Critical Errors | |||
errors may be indicative of long-lasting failure modes, it is | ||||
recommended that a back-off from restarting BGP sessions experiencing | ||||
such behaviour is implemented. As such, this is not applicable to | ||||
restart behaviour through means such as those described in Section 5 | ||||
since such restarts are time-bound based on the period for which the | ||||
Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when | ||||
considering BGP Graceful Restart, such restarts are time-bound by the | ||||
Restart Time described in [RFC4724]). However, following a session | ||||
reverting to being pulled down based on repeated error conditions, it | ||||
is recommended that following restart attempts are subject to an | ||||
exponentially increasing interval between subsequent attempts. It is | ||||
therefore recommended that in such cases an implementation implements | ||||
the increasing values of IdleHoldTimer as described in the BGP-4 FSM | ||||
documented in [RFC4271]. | ||||
7.1. Reducing the Network Impact of Session Teardown | Where an UPDATE message containing a Critical error is received, | |||
since the NLRI cannot be extracted, error handling mechanisms must be | ||||
applied at the per-session level. In order to limit the impact to | ||||
network operation, these session-level mechanisms MUST be applied in | ||||
a manner which allows the paths NLRI received from the remote speaker | ||||
to continue to be utilised for forwarding during the session reset | ||||
and re-establishment. It is envisaged that this requirement may be | ||||
met through extension of the BGP Graceful Restart mechanism | ||||
([RFC4724]) to be triggered by NOTIFICATION messages indicating the | ||||
occurrence of a Critical error. Such an extension allows a restart | ||||
of the TCP and BGP sessions between two speakers, in a similar manner | ||||
to the current session restart behaviour triggered by a NOTIFICATION | ||||
message. In order to maximise the level of re-initialisation which | ||||
occurs during such a restart triggered by a Critical error, BGP | ||||
speakers MAY re-initialise memory structures related to the | ||||
Adj-RIB-In and Adj-RIB-Out associated with the session on which the | ||||
erroneous UPDATE was observed. | ||||
As discussed within the preceding section, where repeated critical | Where such a restart event occurs, the continued liveliness of the | |||
UPDATE message errors are received, it is recommended that the impact | remote device MAY be verified by BGP KEEPALIVE packets or other OAM | |||
to the both advertising and receiving BGP-4 speakers be limited by | functions such as Bidirectional Forwarding Detection ([RFC5880]). In | |||
reverting to tearing the BGP-4 session experiencing such errors down. | cases where the observed Critical BGP error is indicative of a wider | |||
The BGP-4 specification presented in [RFC4271] achieves such a | device failure of the remote speaker, it is expected that a BGP | |||
session shutdown by sending a NOTIFICATION message, however, this has | sessions will not re-establish correctly. Each BGP speaker SHOULD | |||
the net result that all downstream BGP speakers (i.e. those to whom | maintain a limited time window in which session restart is expected | |||
the routes carried over the now ceased BGP session was readvertised) | in order to mitigate this possibility. | |||
must withdraw this route from their RIB, and perform a best-path | ||||
selection if required. In some cases, there may be no alternate path | ||||
available, and hence a period of time for which no valid BGP route | ||||
exists. Particularly, this is very likely to occur where an upstream | ||||
BGP speaker performs a best-path selection and advertises only a | ||||
single path to its neighbours - there is a requirement for the | ||||
upstream speaker to perform a best-path selection, and re-advertise a | ||||
new set of NLRI before the downstream system is able to converge to a | ||||
new path. It should be noted that where UPDATE messages withdrawing | ||||
NLRI are not subject to the BGP session's configured | ||||
MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements | ||||
are, this may result in a BGP speaker being without a path for a | ||||
period up to the MRAI. | ||||
Clearly, it is advantageous to avoid this period of time for which | When a Critical error occurs, the network operator MUST be made aware | |||
there may be no reachability for a set of routes, especially since | of its occurrence through local logging mechanisms (e.g., SNMP traps | |||
the BGP speaker terminating a particular session is doing so due to a | or syslog). The BGP speaker receiving an UPDATE message identified | |||
particular error handling policy. The graceful shutdown mechanism | as a Critical error MUST log its occurrence and a copy of the UPDATE | |||
detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a | message. Where a inter-device messaging mechanism is implemented (as | |||
BGP speaker is able to signal that a set of routes are to be | discussed in Section Section 4.1) a copy of the erroneous UPDATE | |||
withdrawn, and hence allow downstream systems to pre-emptively | message SHOULD be transmitted to the remote speaker. Both BGP | |||
perform a best-path selection, and hence advertise new reachability | speakers MUST indicate to an operator the cause of a session restart | |||
information in a make-before-break manner. | was a Critical error in an UPDATE message. | |||
It is therefore envisaged, that where a session is to be shutdown, | Since repeated critical errors (and session restarts) may have an | |||
based on a trigger relating to erroneous UPDATE messages being | impact in overall device scaling if the failure condition is not | |||
received (be they repeated or not) that the graceful shutdown | resolved by session restart, a BGP speaker MAY choose to revert to | |||
procedure in utilised, so as to reduce the forwarding impact of | the session tear down behaviour described in the base BGP | |||
routes received on the session being withdrawn. | specification. This reversion SHOULD only be utilised after a number | |||
of attempts which SHOULD be controllable by the network operator. | ||||
Where a session is shut down, the implementation MAY utilise a back- | ||||
off from session restart attempts (as per the IdleHoldTimer described | ||||
in the BGP FSM [RFC4271]). Where reversion to tearing down the BGP | ||||
session is performed, a speaker SHOULD limit the impact of | ||||
withdrawing prefixes from downstream speakers where possible. It is | ||||
envisaged that this can be achieved by utilising a mechanism such as | ||||
the BGP Graceful Shutdown procedure as described in | ||||
[I-D.ietf-grow-bgp-gshut]. | ||||
8. IANA Considerations | 6. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
9. Security Considerations | 7. Security Considerations | |||
The requirements outlined in this document provide mechanisms by | The requirements outlined in this document provide mechanisms which | |||
which erroneous BGP messages may be responded to with limited impact | limit the overall impact of the response to an error in a BGP UPDATE | |||
to forwarding operation. This is of benefit to the security of a BGP | message. This is of benefit to the security of a BGP speaker. | |||
speaker in general. Where UPDATE messages may have been propagated | Without these mechanisms, where erroneous UPDATE messages relating to | |||
by a single malicious Autonomous System or router within a network | a single NLRI entry can be propagated to a BGP speaker, all other | |||
(or the Internet default free zone - DFZ), which are then propagated | NLRI carried via the same session are affected by the resulting | |||
to all devices within the same routing domain, all other NLRI | session tear-down. This may result in an AS being isolated from | |||
available over the same session become unreachable. This mechanism | particular routing domains (such as the Internet) should an UPDATE | |||
may provide means by which an Autonomous System can be isolated from | message be propagated via targeted specific paths. It is envisaged | |||
required routing domains (such as the Internet), should the relevant | by reducing the impact of the reaction of the receiving speaker to | |||
UPDATE messages be propagated via specific paths. By reducing the | these messages, the isolation can be constrained to specific sets of | |||
impact of such failures, it is envisaged that this possibility may be | NLRI, or a specific topology. | |||
constrained to a specific set of NLRI, or a specific topology. | ||||
Some mechanisms meeting the requirements specified in this document, | A number of the mechanisms meeting the requirements specified within | |||
particularly those within Section 6 may provide further security | the document (particularly those relating to operational monitoring) | |||
concerns, however, it is envisaged that these are addressed in per- | may raise further security concerns. Such concerns will be addressed | |||
enhancement memos. | during the specification of these mechanisms. | |||
10. Acknowledgements | 8. Acknowledgements | |||
The author would like to thank the following network operators for | The author would like to thank the following network operators for | |||
their insight, and valuable input in defining the requirements for a | their insight, and valuable input into defining the requirements for | |||
variety of operational deployments of the BGP-4 protocol; Shane | a variety of deployments of the BGP protocol: Shane Amante, Bruno | |||
Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom | Decraene, Rob Evans, David Freedman, Wes George, Tom Hodgson, Sven | |||
Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom | Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Scholl and | |||
Scholl and Ilya Varlashkin. | Ilya Varlashkin. | |||
In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, | In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, | |||
Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert | Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert | |||
Raszuk for their expertise relating to implementations of the BGP-4 | Raszuk for their expertise relating to implementations of the BGP | |||
protocol. | protocol. | |||
11. References | 9. References | |||
11.1. Normative References | 9.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
Requirement Levels", BCP 14, RFC 2119, March 1997. | ||||
[RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, | [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, | |||
"Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. | "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. | |||
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, | [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, | |||
September 2000. | September 2000. | |||
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway | [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway | |||
Protocol 4 (BGP-4)", RFC 4271, January 2006. | Protocol 4 (BGP-4)", RFC 4271, January 2006. | |||
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
Networks (VPNs)", RFC 4364, February 2006. | Networks (VPNs)", RFC 4364, February 2006. | |||
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route | ||||
Reflection: An Alternative to Full Mesh Internal BGP | ||||
(IBGP)", RFC 4456, April 2006. | ||||
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. | [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. | |||
Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, | Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, | |||
January 2007. | January 2007. | |||
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, | [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service | |||
"Multiprotocol Extensions for BGP-4", RFC 4760, | (VPLS) Using BGP for Auto-Discovery and Signaling", | |||
January 2007. | RFC 4761, January 2007. | |||
11.2. Informational References | [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | |||
(BFD)", RFC 5880, June 2010. | ||||
9.2. Informational References | ||||
[I-D.chen-ebgp-error-handling] | [I-D.chen-ebgp-error-handling] | |||
Chen, E., Mohapatra, P., and K. Patel, "Revised Error | Chen, E., Mohapatra, P., and K. Patel, "Revised Error | |||
Handling for BGP Updates from External Neighbors", | Handling for BGP Updates from External Neighbors", | |||
draft-chen-ebgp-error-handling-01 (work in progress), | draft-chen-ebgp-error-handling-01 (work in progress), | |||
September 2011. | September 2011. | |||
[I-D.ietf-grow-bgp-gshut] | [I-D.ietf-grow-bgp-gshut] | |||
Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. | Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. | |||
Filsfils, "Graceful BGP session shutdown", | Filsfils, "Graceful BGP session shutdown", | |||
draft-ietf-grow-bgp-gshut-03 (work in progress), | draft-ietf-grow-bgp-gshut-04 (work in progress), | |||
December 2011. | October 2012. | |||
[I-D.ietf-grow-bmp] | [I-D.ietf-grow-bmp] | |||
Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring | Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring | |||
Protocol", draft-ietf-grow-bmp-06 (work in progress), | Protocol", draft-ietf-grow-bmp-07 (work in progress), | |||
December 2011. | October 2012. | |||
[I-D.ietf-idr-bgp-enhanced-route-refresh] | [I-D.ietf-idr-bgp-enhanced-route-refresh] | |||
Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced | Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced | |||
Route Refresh Capability for BGP-4", | Route Refresh Capability for BGP-4", | |||
draft-ietf-idr-bgp-enhanced-route-refresh-02 (work in | draft-ietf-idr-bgp-enhanced-route-refresh-03 (work in | |||
progress), June 2012. | progress), December 2012. | |||
[I-D.ietf-idr-bgp-gr-notification] | ||||
Patel, K., Fernando, R., and J. Scudder, "Notification | ||||
Message support for BGP Graceful Restart", | ||||
draft-ietf-idr-bgp-gr-notification-00 (work in progress), | ||||
December 2011. | ||||
[I-D.ietf-idr-enhanced-gr] | ||||
Patel, K., Chen, E., Fernando, R., and J. Scudder, | ||||
"Accelerated Routing Convergence for BGP Graceful | ||||
Restart", draft-ietf-idr-enhanced-gr-01 (work in | ||||
progress), June 2012. | ||||
[I-D.ietf-idr-operational-message] | [I-D.ietf-idr-operational-message] | |||
Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL | Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL | |||
Message", draft-ietf-idr-operational-message-00 (work in | Message", draft-ietf-idr-operational-message-00 (work in | |||
progress), March 2012. | progress), March 2012. | |||
[I-D.ietf-idr-optional-transitive] | ||||
Scudder, J., Chen, E., Mohapatra, P., and K. Patel, | ||||
"Revised Error Handling for BGP UPDATE Messages", | ||||
draft-ietf-idr-optional-transitive-04 (work in progress), | ||||
October 2011. | ||||
[I-D.zeng-idr-one-time-prefix-orf] | ||||
Zeng, Q., Dong, J., Heitz, J., Patel, K., Shakir, R., and | ||||
Z. Huang, "One-time Address-Prefix Based Outbound Route | ||||
Filter for BGP-4", draft-zeng-idr-one-time-prefix-orf-02 | ||||
(work in progress), July 2012. | ||||
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection | ||||
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, | ||||
June 2010. | ||||
Author's Address | Author's Address | |||
Rob Shakir | Rob Shakir | |||
BT | BT | |||
pp C3L | pp C3L, BT Centre | |||
BT Centre | ||||
81, Newgate Street | 81, Newgate Street | |||
London EC1A 7AJ | London EC1A 7AJ | |||
UK | UK | |||
Email: rob.shakir@bt.com | Email: rob.shakir@bt.com | |||
URI: http://www.bt.com/ | URI: http://www.bt.com/ | |||
End of changes. 66 change blocks. | ||||
926 lines changed or deleted | 373 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |