draft-ietf-grow-ops-reqs-for-bgp-error-handling-05.txt   draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt 
Internet Engineering Task Force R. Shakir Internet Engineering Task Force R. Shakir
Internet-Draft BT Internet-Draft BT
Intended status: Informational July 30, 2012 Intended status: Informational December 27, 2012
Expires: January 31, 2013 Expires: June 30, 2013
Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4
draft-ietf-grow-ops-reqs-for-bgp-error-handling-05 draft-ietf-grow-ops-reqs-for-bgp-error-handling-06
Abstract Abstract
BGP-4 is utilised as a key intra- and inter-Autonomous System routing BGP is utilised as a key intra- and inter-autonomous system routing
protocol in modern IP networks. The failure modes as defined by the protocol in modern IP networks. The failure modes, as defined by the
original protocol standards are based on a number of assumptions original protocol standards, are based on a number of assumptions
around the impact of session failure. Numerous incidents both in the around the impact of session failure. Numerous incidents both in the
global Internet routing table and within Service Provider networks global Internet routing table and within service provider networks
have been caused by strict handling of a single invalid UPDATE have been caused by strict handling of a single invalid UPDATE
message causing large-scale failures in one or more Autonomous message causing large-scale failures in one or more autonomous
Systems. systems.
This memo describes the current use of BGP-4 within Service Provider This memo describes the current use of BGP within service provider
networks, and outlines a set of requirements for further work to networks, and outlines a set of requirements for further work to
enhance the mechanisms available to a BGP-4 implementation when enhance the mechanisms available to a BGP implementation when
erroneous data is detected. Whilst this document does not provide erroneous data is detected. Whilst this document does not provide
specification of any standard, it is intended as an overview of a set specification of any standard, it is intended as an overview of a set
of enhancements to BGP-4 to improve the protocol's robustness to suit of enhancements to BGP to improve the protocol's robustness to suit
its current deployment. its current deployment.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 31, 2013. This Internet-Draft will expire on June 30, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 3
1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Overview of Operator Requirements for BGP-4 Error 2.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 4
Handling . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Critical and Non-Critical Errors . . . . . . . . . . . . . . . 7
2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 7 4. Error Handling for Non-Critical Errors . . . . . . . . . . . . 9
2.1. Classifying BGP Errors and Expected Error Handling . . . . 8 4.1. NLRI-level Error Handling Requirements . . . . . . . . . . 9
2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 9 4.2. Recovering RIB Consistency following NLRI-level Error
2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 9 Handling . . . . . . . . . . . . . . . . . . . . . . . . . 10
3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 11 5. Error Handling for Critical Errors . . . . . . . . . . . . . . 12
4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 15 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15
6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 17 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
7. Operational Complexities Introduced by Altering RFC4271 . . . 21 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1. Reducing the Network Impact of Session Teardown . . . . . 23 9.1. Normative References . . . . . . . . . . . . . . . . . . . 17
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 9.2. Informational References . . . . . . . . . . . . . . . . . 17
9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
11.1. Normative References . . . . . . . . . . . . . . . . . . . 28
11.2. Informational References . . . . . . . . . . . . . . . . . 28
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 30
1. Introduction
Where BGP-4 [RFC4271] is deployed in the Internet and Service
Provider networks, numerous incidents have been recorded due to the
manner in which [RFC4271] specifies errors in routing information
should be handled. Whilst the behaviour defined in the existing
standards retains utility, the deployments of the protocol have
changed within modern networks, resulting in significantly different
demands for protocol robustness. Whilst a number of Internet Drafts
have been written to begin to enhance the behaviour of BGP-4 in terms
of the handling of erroneous messages, this memo intends to define a
set of requirements for ongoing work. These requirements are
considered from the perspective of a Network Operator, and hence this
draft does not intend to define the protocol mechanisms by which such
error handling behaviour is to be implemented.
1.1. Role of BGP-4 in Service Provider Networks
BGP was designed as an inter-Autonomous System (AS) routing protocol
and hence many of the error handling mechanisms within the protocol
specification are designed to be conducive to this role. In general,
this consideration as an inter-AS routing propagation mechanism
results in the view that a BGP session propagates a relatively small
amount of network-layer reachability information (NLRI) between two
ASes. In this case, it is the expectation of session resilience for
those adjacencies that are key to routing continuity (for example, it
is expected that two networks peering via BGP would connect multiple
times in order to safeguard equipment or protocol failure). In
addition, there is some expectation of multiple paths to a particular
NLRI being available - it would be expected that a network can fall
back to utilising alternate, less direct, paths where a failure of a
more direct path occurs.
Traditional network architectures would deploy an Interior Gateway
Protocol (IGP) to carry infrastructure and customer routes, with an
Exterior Gateway Protocol (EGP) such as BGP being utilised to
propagate these routes to other Autonomous Systems. However, with
the growth of IP-based services, this is no longer considered best
practice. In order to ensure that convergence is within acceptable
time bounds, the amount of routing information carried within the IGP
is significantly reduced - and tends to be only infrastructure
routes. iBGP is then utilised to propagate both customer, and
external routes within an AS. As such, BGP has become an IGP, with
traditional IGPs acting as a means by which to propagate the routing
information which is required to establish a BGP session, and reach
the egress node within the local routing domain. This change in role
presents different requirements for the robustness of BGP as a
routing protocol - with the expectation of similar level of
robustness to that of an IGP being set.
Along with this change in role, the nature of the IP routing
information that is carried has changed. BGP has become a ubiquitous
means by which service information can be propagated between devices.
For instance, BGP is utilised to carry routing information for IP/
MPLS VPN services as described in [RFC4364]. Since there is an
existing deployment of the protocol between PE devices in numerous
networks, it has been adapted to propagate this routing information,
as its use limits the number of routing protocols required on each
device. This additional information being propagated represents a
large change in requirement for the error handling of the protocol -
where session failure occurs, it is likely a complete service outage
for at least a subset of a network's customers is experienced where
an erroneous packet may have occurred within a different sub-topology
or even service (a different address family for example). For this
reason, there is a significant demand to avoid service affecting
failures that may be triggered by routing information within a single
sub-topology or service.
The combination of the increased number of deployments of BGP-4 as an
intra-AS routing protocol, its use for the propagation of additional
types of routing and service information, and the growth of IP
services has resulted in a substantial increase in the volume of
information carried within BGP-4. In numerous networks, RIB sizes of
the order of millions of entries exist within individual BGP
speakers, with particularly high-scale points exhibited at BGP
speakers performing aggregation or functionality designed improve
utilisation of network resources (e.g., route reflector hierarchies).
Clearly an increase in the amount routing information carried in BGP
results in greater impact to services during failures, which is only
amplified by a corresponding increase in recovery times. Following a
failure, there is a substantial recovery time to learn, compute and
distribute new paths, which results in a greater observed impact to
services affected, and hence adds further weight to the requirement
to avoid failures altogether or, at least, mitigate their impact to
the narrowest scope possible, (e.g., a specific NLRI). Whilst an
argument could be made that convergence time of BGP-4 could
potentially be reduced through deployment of additional computational
resource, it is notable that solution is not necessarily
straightforward from an implementation or deployment perspective,
(e.g., scaling computation resources within a single address-family
is difficult). Thus, significant challenges continue to exist for
operators when scaling BGP-4 deployments, and hence mechanisms which
improve the scalability of BGP-4 are very important.
Both within Internet and multi-service routing architectures, a
number of BGP sessions propagate a large proportion of the required
routing information for network operation. For Internet routing,
these are typically BGP sessions which propagate the global routing
table to an AS - failure of these sessions may have a large impact on
network service, based on a single erroneous update. In an multi-
service environment, typical deployments utilise a small number of
core-facing BGP sessions, typically towards route reflector devices.
Failure of these sessions may also result in a large impact to
network operation. Clearly, the avoidance of conditions requiring
these sessions to fail is of great utility to any network operator,
and provides further motivation for the revision of the existing
behaviour.
Whilst the behaviour in [RFC4271] is suited to ensuring that BGP
messages with erroneous routing information in are limited in scope
(by means of session reset), with the above considerations, it is
clear that this mechanism is not suited to all deployments. It
should, however, be noted that the change in scope affects the
handling only of errors occurring after BGP session establishment.
There is no current operational requirement to amend the means by
which error handling in session establishment, or liveliness
detection, are performed.
1.2. Overview of Operator Requirements for BGP-4 Error Handling
It is the intention of this document to define a set of criteria for
the manner in which a revised error handling mechanism in BGP-4 is
required to conform. The motivation for the definition of these
requirements can be summarised based on certain behaviour currently
present in the protocol that is not deemed acceptable within current
operational deployments, or where there is a short-fall in the tool
set available to an operator. These key requirements can be
summarised as follows:
o It is unacceptable within modern deployments of the BGP-4 protocol
that a single erroneous UPDATE packet affects routes that it does
not carry. This requirement therefore requires some modification
to the means by which erroneous UPDATE packets are handled, and
reacted to - with a particular focus on avoiding the use of the
NOTIFICATION message.
o It is recognised that some error conditions may occur within the
BGP-4 protocol may not always be handled gracefully, and may
result in conditions whereby an implementation cannot recover. In
these (and similar) cases, it is undesirable for an operator that
this reset of the BGP-4 session results in interruption to
forwarding packets (by means of withdrawing routes installed by
BGP-4 into a device's RIB, and subsequently FIB). To this end,
there is a requirement to define a session reset mechanism which
provides session re-initialisation in a non-destructive manner.
o Further to the requirements to provide a more robust protocol, the
current visibility into error conditions within the BGP-4 protocol
is extremely limited - where further modifications to this
behaviour are to be made, complexity is likely to be added. Thus,
to ensure that BGP-4 is manageable, there are requirements for
mechanisms by which the protocol can be examined and monitored.
This document describes each of these requirements in further depth,
along with an overview of means by which they are expected to be
achieved. In addition, the mechanism by which the enhancements
meeting these requirements are to interact is discussed.
2. Errors within BGP-4 UPDATE Messages 1. Requirements Language
Both through analysis of incidents occurring with the Internet DFZ, The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
and multi-service environments utilising BGP-4 to signal service or "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
routing information, a number of different classes of errors within document are to be interpreted as described in RFC 2119 [RFC2119].
BGP-4 UPDATE messages have been observed. In order to consider the
applicability of enhanced error handling mechanisms, it is possible
to divide these errors into a number of sub-classes, particularly
focusing around the location of the error within the UPDATE message.
Where an UPDATE message is considered invalid by a BGP speaker due to 2. Problem Statement
an error within a path attribute that is not the NLRI (where the
definition of NLRI includes reachability information encoded in the
MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in
[RFC4760]) it is a requirement of any enhanced error handling
mechanism to handle the error in a manner focused on the NLRI
contained within the message found to be erroneous. Since in this
case, the message received from the remote peer is syntactically
valid, it is considered that such an UPDATE is indicative of
erroneous data within one or more path attributes. The impact of the
current behaviour defined within the protocol makes the implication
that the BGP speaker from whom the message is received is now an
invalid path for all NLRI announced via the session - which results
in a disproportionate impact to overall network operation. In
particular scenarios (such as networks with centralised BGP route
reflection) such action can result in a loss of all reachability to a
network. In other contexts (such as the Internet DFZ), it cannot be
assumed that the BGP speaker from whom the UPDATE message is received
is directly responsible for the erroneous information contained
within the message.
Two further error cases exist within UPDATE messages, both of which BGP has become a key intra- and inter-domain routing protocol,
are related to the mechanisms that are applicable to messages deployed within both the Internet and private networks. The
received where some difficulty exists in parsing the entire BGP increased reliance on the protocol has resulted in increased demand
message. The two cases concern those cases where a valid NLRI for robustness - with the error handling behaviour defined in
attribute can be extracted, and those where such an attribute is not [RFC4271] having been shown to have caused numerous incidents within
able to be parsed. In these cases, errors in the packing of live network deployments. This document provides an overview of the
attributes within a BGP message may have occurred. Such errors are current deployment cases for BGP-4, and define a set of requirements
likely indicative of an error specifically caused by the remote BGP (from the perspective of a network operator) for enhancing error
speaker. It is, however, desirable to an operator that such errors handling within the protocol.
are handled without affecting all NLRI across a BGP session. As
such, there is a key requirement to maximise the number of cases in
which it is possible to extract NLRI from a BGP UPDATE message. To
this end, it is required that where possible the MP_REACH_NLRI and
MP_UNREACH_NLRI attributes are utilised for encoding all NLRI
(including IPv4 Unicast), and that this attribute is included as the
first attribute of a BGP UPDATE message (as originally recommended in
[I-D.chen-ebgp-error-handling]). Such a change to the order of
inclusion of this attribute maximises the number of cases in which
NLRI can be extracted from an UPDATE. Where this is possible, it is
again required that the error handling mechanisms utilised should be
directly applied to the NLRI included in the UPDATE.
For all cases whereby NLRI can be obtained from an UPDATE message, it 2.1. Role of BGP-4 in Service Provider Networks
is expected that the requirements outlined in Section 3 should be
considered by any enhancement to the BGP-4 protocol.
In the case that it is not possible to completely parse the NLRI BGP was designed as an inter-autonomous system (AS) routing protocol.
attribute from the UPDATE message received from a peer, it is Many of the error handling mechanisms within the protocol are defined
extremely likely that this is indicative of a serious error with in order to be guarantee consistency, and correctness of information
either the process of attribute packing, or buffer usage on the between two neighbouring speakers. The assumption is made that each
remote BGP speaker. In this case, clearly, it is not possible to AS operates with many adjacencies, each propagating a relatively
apply any error handling mechanism that is limited to a specific set small amount of routing information. Through focusing on information
of NLRI, since an implementation has no knowledge of the NLRI consistency, the protocol specification prefers failure of an
included within the UPDATE message. In addition, such errors are individual routing adjacency to maintaining reachability to all NLRI
considered to be relatively fundamental to the operation of a BGP received from a particular neighbour, with the expectation that
implementation, and hence may indicate a case whereby significant alternate, less direct, paths can be selected where a failure occurs.
system errors have occurred. The current BGP-4 standard results in a The assumptions of the nature of BGP deployments resulted in the
BGP speaker restarting a session with the remote BGP speaker. specification made in [RFC4271] whereby the receipt of an erroneous
However where such an error does occur, it is required that a UPDATE message is reacted to by sending a NOTIFICATION message, and
graceful mechanism is utilised to provide a lower impact to network tearing down the adjacency with the remote speaker from whom the
operation. The requirements for enhancements of this nature to BGP-4 error was observed.
are outlined in Section 5, with the requirements outlined therein
focused on providing a means by which system integrity can be
restored whilst allowing for continued network operation.
2.1. Classifying BGP Errors and Expected Error Handling Historically, a network would deploy an interior gateway protocol
(IGP) to carry infrastructure and customer routes, and utilise an
external gateway protocol (EGP) such as BGP to propagate routes to
other autonomous systems. However, BGP's deployments have evolved
with the growth of IP-based services. To ensure route convergence
within an AS is within acceptable time bounds the amount of
information within the IGP has been minimised (typically to only
infrastructure routes). iBGP is then utilised to carry both internal,
customer and external routes within an AS. As such, this has
resulted in BGP having become an IGP, with traditional IGPs providing
only reachability between nodes within the AS for packet forwarding
and to establish iBGP sessions. This change in role within the
overall architecture of an AS has resulted in an increased robustness
requirement for BGP, with the expectation of a similar level of
robustness to that of an IGP being set. The loss of an iBGP session
can result in significant levels of unreachability internally to an
AS, especially since there are typically limited (when compared to
the Internet) signalling and forwarding paths available.
It is clearly of advantage for BGP-4 implementations to utilise a In parallel with this change of deployment, the volume and nature of
consistent set of error handling mechanisms for the different types the information carried within BGP has also changed. BGP has become
of errors that are described in Section 2, and provide consistent the ubiquitous means through which service information can be
nomenclature to refer to them. It is therefore suggested that errors propagated between devices. For instance, being utilised to carry
that are indicative of larger scale failures of a BGP speaker, and IP/MPLS service information such as Layer 3 IP VPN routes [RFC4364] ,
hence require some error handling at the session level are referred and Layer 2 Virtual Private LAN Service device membership [RFC4761].
to as 'critical' errors, whilst those errors that are identified Since these extensions to the protocol allow signalling of multiple
based on incorrect content of one of more attributes of a message are services (represented by address families within BGP), and multiple
referred to as 'semantic' errors. customer topologies (i.e., subsets of routes within each address
family) via the BGP protocol, the impact of session failure is
increased. The tear down of a single BGP session can result in a
complete outage to all customer services signalled via the session,
even where the triggering event is related to only one service or
topology being carried - reflecting a disproportional impact to all
other services and routing topologies.
The errors identified within the following sections consider only The convergence of services to IP, and BGP's changing deployment has
those errors within the specifications at the time of writing, it is resulted in a significant growth in the volume of routing information
recommended that in the definition of future extensions to the BGP-4 carried in the protocol. In numerous networks, the RIB size of
specification, the error handling behaviour (and the category within individual BGP speakers can be of the order of millions of paths.
which errors within the extension should be considered by an Particularly large RIBs are observed at BGP speakers performing
implementation) is defined. aggregation and border roles (such as ASBR, or route reflector
hierarchies). This increased volume of routes results not only in a
significant number of services being impacted during a protocol
failure, but also increases the time to recovery after re-
establishing a BGP session. The time taken to learn, compute and
distribute new paths increases the impact of failures on services
carried by the network - adding further weight to the requirement to
avoid failures, or limit the extent of their impact. Furthermore,
the impact of individual session failures is increased due to the
existence of a relatively small number of highly-critical BGP
sessions within Internet and multi-service network deployments.
These sessions propagate a high-proportion of the reachability
information - for instance, providing an Internet AS with the global
routing table from upstream providers, or connecting IP/MPLS Provider
Edge devices to route reflector hierarchies from which they are
signalled reachability for services connected elsewhere within the
routing domain. In both cases, the failure of these sessions can
result in a significant outage to customer services.
2.1.1. Critical BGP Errors For the current deployments of BGP, the behaviour described in
[RFC4271] related to handling errors in UPDATE messages is
suboptimal, and results in significant disruption to services in
modern network deployments. This document defines a set of
requirements for protocol developments, and revisions to [RFC4271] to
address these concerns through a set of generalised definitions. It
should be noted that the scope of these requirements is limited to
the handling of UPDATE messages as, at the time of writing, there is
no operational requirement to amend the means by which error handling
in session establishment, or liveliness detection are performed.
As described in this document, it is of advantage to limit the number 3. Critical and Non-Critical Errors
of 'critical' errors that occur within the protocol, therefore, based
on analysis of the processing of BGP UPDATE messages, it is required
that 'critical' error handling behaviour is applied to:
o UPDATE Message Length errors - whereby the specified overall As described in Section 2.1, the error handling behaviour described
UPDATE message length is inconsistent with sum of the Total Path in [RFC4271] is applied at a per-session level, affecting all NLRI
Attribute and Withdrawn Routes length. In this case, this is signalled via the adjacency on which an erroneous message is
indicative of message packing failure, whereby the NLRI may not be observed. In order to reduce the impact of error handling to those
correctly extracted. NLRI affected by an erroneous UPDATE, a BGP speaker MUST limit the
error handling mechanisms implemented to those NLRI contained within
an erroneous UPDATE message where it is possible to do so. Clearly,
some errors within the formation of BGP UPDATE messages may result in
it being impossible to reliably extract NLRI from the received
message, and hence the same error handling procedures may not apply.
There is therefore a requirement to classify errors based on their
impact to the BGP UPDATE message, hence messages whereby the NLRI
attribute cannot be extracted or parsed are referred to throughout
this document as Critical errors. These Critical errors are limited
to:
o Errors Parsing the NLRI attributes of an UPDATE message - where o UPDATE Message Length errors - where the specified UPDATE message
NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn length is inconsistent with the sum of the Total Path Attribute
routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes and Withdrawn Routes length. These errors relate to message
[RFC2858], it is not possible to target error handling mechanisms packing or framing, and result in cases whereby the NLRI attribute
to specific NLRI, and hence session level mechanisms must be cannot be correctly extracted from the message.
utilised.
It is expected that those requirements outlined in Section 5 are o Errors parsing the NLRI attribute of an UPDATE message - where the
utilised to provide session-level handling of those errors identified contents of the IPv4 Unicast Advertised or Withdrawn Routes
as 'critical'. attributes, or multi-protocol BGP NLRI attributes (MP_REACH_NLRI
and/or MP_UNREACH_NLRI as defined in [RFC2858]), cannot be
successfully parsed.
2.1.2. Semantic BGP Errors In the case of Critical errors is expected that error handling is
applied at a session level as per Section 5 of this document.
Where a BGP message is correctly formed, a number of cases exist All errors whereby the contained NLRI can be extracted, are referred
whereby the contents of the UPDATE are not valid - in these cases, to as Non-Critical. It is expected that the following cases fall
this represents errors that can be identified to affect specific within this category:
NLRI. The following cases are expected to be classified as semantic
errors:
o Zero or invalid length errors in path attributes excluding those o Zero or invalid length errors in path attributes, excluding those
containing NLRI, or where the length of all path attributes containing NLRI, or where the length of all path attributes
contained within the UPDATE does not correspond to the total path contained within the UPDATE does not correspond to the total path
attributes length. In this case, the NLRI can be correctly attribute length.
extracted, and hence acted upon.
o Messages where invalid data or flags are contained in a path o Messages where invalid data or flags are contained in a path
attribute that does not relate to the NLRI. attribute that does not relate to the NLRI.
o UPDATE messages missing mandatory attributes, unrecognised non- o UPDATE messages missing mandatory attributes, unrecognised non-
optional attributes or those that contain duplicate or invalid optional attributes, or those that contain duplicate or invalid
attributes (be they unsupported or unexpected). attributes (be they unsupported, or unexpected).
o Those messages where the NEXT_HOP, or MP_REACH next-hop values are
missing, length zero, or invalid for the relevant AFI/SAFI.
In these cases, it is expected that these errors can be handled
gracefully, following the requirements detailed in Section 3 and
Section 4 of this memo.
3. Avoiding use of NOTIFICATION
The error handling behaviour defined in RFC4271 is problematic due to
the limited options that are available to an implementation. When an
erroneous BGP message is received, at the current time, the
implementation must either ignore the error, or send a NOTIFICATION
message, after which it is mandatory to terminate the BGP session.
It is apparent that this requirement is at odds with that of protocol
robustness.
There is significant complexity to this requirement. The mechanism
defined in [I-D.chen-ebgp-error-handling] describes a means by which
no NOTIFICATION message is generated for all cases whereby NLRI can
be extracted from an UPDATE. The NLRI contained within the erroneous
UPDATE message is considered as though the remote BGP speaker has
provided an UPDATE marking it as withdrawn. This results in a limit
in the propagation of the invalid routing information, whilst also
ensuring that no traffic is forwarded via a previously-known path
that may no longer be valid. This mechanism is referred to as
"treat-as-withdraw".
Whilst this behaviour results in avoiding a NOTIFICATION message,
keeping other routing information advertised by the remote BGP
speaker within the RIB, it may result in unreachability for a sub-set
of the NLRI advertised by the remote speaker. Two cases should be
considered - that where the entry for a route in the Adj-RIB-In of
the neighbour propagating an erroneous packet is utilised, and that
where the route installed in the device's RIB is learnt from another
BGP speaker. In the former case, should the identified NLRI not be
treated as withdrawn, the original NLRI is utilised within the global
RIB. However, this information is potentially now invalid (i.e. it
no longer provides a valid forwarding path), whilst an alternate
(valid) path may exist in another Adj-RIB-In. By continuing to
utilise the NLRI for which the UPDATE was considered invalid, traffic
may be forwarded via an invalid path, resulting in routing loops, or
black-holing. In the second case, no impact to the forwarding of
traffic, or global RIB, is incurred, yet where treat-as-withdraw is
implemented, possibly stale routing information is purged from the
Adj-RIB-In of the neighbour propagating errors.
Whilst mechanisms such as "treat-as-withdraw" are currently
documented, the proposals are limited in their scope - particularly
in terms of restrictions to implementation only on eBGP sessions.
This limitation is made based on the view that the BGP RIB must be
consistent across an autonomous system. By implementing treat-as-
withdraw for a iBGP session, one or more routers within the
Autonomous System may not have reachability to a route, and hence
blackholing of traffic, or routing loops, may occur. It should,
however, be considered if this view is valid, in light of the manner
in which BGP is utilised within operator networks. Inconsistency in
a RIB based on a single UPDATE being treated as withdrawn may cause a
inconsistency in a single sub-topology (e.g. Layer 3 VPN service),
or a service not operating completely (in the case of an UPDATE
carrying service membership information). Where a NOTIFICATION and
teardown is utilised this is destructive to all sub-topologies in all
address family identifiers (AFIs) carried by the session in question.
Even where mechanisms such as multi-session BGP are utilised, a whole
AFI is affected by such a NOTIFICATION message. In terms of routing
operation, it is therefore far less costly to endure a situation
where a limited sub-set of routing information within an AS is
invalid, than to consider all routing information as invalid based on
a single trigger.
At the time of writing, error handling mechanisms related to
optional, transitive attributes - such as
[I-D.ietf-idr-optional-transitive] are restricted to handling only a
subset of attribute errors - whereas the operational requirement is
to expand this coverage to the widest set of errors possible (i.e.,
all semantic errors within UPDATE messages). Additionally, where
approaches applicable to a greater number of attributes are proposed
(e.g., [I-D.chen-ebgp-error-handling]), these are limited to
deployment in eBGP applications only, where requirements also exist
in intra-domain cases. As such, it is envisaged that if extended to
cover these expanded cases, these mechanisms provide a means to avoid
the transmission of a NOTIFICATION message to a remote BGP speaker,
based on a single erroneous message, where at all possible, and hence
meet this requirement. Critical errors, including those whereby the
NLRI cannot be extracted from the UPDATE message, represent cases
whereby the receiving system cannot handle the error gracefully based
on this mechanism.
4. Recovering RIB Consistency
The recommendations described in Section 3 may result in the RIB for
a topology within an AS being inconsistent across the AS' internal
routers. Alternatively, where such mechanisms are deployed at an AS
boundary, interconnects between two ASes may be inconsistent with
each other. There are therefore risks of traffic blackholing, due to
missing routing information, or forwarding loops. Whilst this is
deemed an acceptable compromise in the short term, clearly, it is
suboptimal. Therefore, a requirement exists to provide mechanisms by
which a BGP speaker is able to recover the consistency of the Adj-
RIB-In for a particular neighbour.
In the general case, the consistency of the BGP RIB can be recovered
by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is
re-advertised. A mechanism to achieve this re-advertisement is
defined within the ROUTE-REFRESH specification [RFC2918]. It is
envisaged that by requesting a refresh of all NLRI advertised by a
BGP speaker, any NLRI which has been withdrawn due to being contained
within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH
is used to directly perform a consistency check between the Adj-RIB-
Out of a remote device, and the Adj-RIB-In of the local BGP speaker,
a demarcation between the ROUTE-REFRESH, and normal UPDATE messages
is required (in order that an "end" of the refresh can be used to
identify any 'stale' NLRI) -
[I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which
the ROUTE-REFRESH mechanism can be extended to meet this requirement.
Whilst re-advertisement of the whole BGP RIB provides a means by
which withdrawn NLRI can be re-advertised, there are some scaling
implications that must be considered. In the case that a ROUTE-
REFRESH is generated, all NLRI must be re-packed into UPDATE messages
and advertised by one speaker on the BGP session, whilst the other
must receive all UPDATE messages, and validate the RIB's consistency.
In order to avoid the control-plane load, it is therefore a
requirement to utilise targeted mechanisms where possible, rather
than incurring the additional load on both the advertising and
receiving speaker of building and processing UPDATEs for the entire
contents of the RIB.
It is envisaged that during routing inconsistencies caused by
utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is
aware that some routing information was not able to be processed -
due to the fact that an UPDATE message was not parsed correctly.
Since this mechanism (as discussed in Section 3) requires the local
BGP speaker to have determined the set of NLRI for which an erroneous
UPDATE message was received, it is possible to use a targeted
mechanisms to re-request the specific NLRI that was contained within
the erroneous UPDATE message. By re-requesting, this provides the
remote BGP speaker an opportunity to re-transmit the NLRI - possibly
providing an opportunity to leverage alternative methods to build the
UPDATE message. Such a request requires extension to the existing
BGP-4 protocol, in terms of specific UPDATE generation filters with a
transient lifetime. It is envisaged that the work within
[I-D.zeng-idr-one-time-prefix-orf] provides a mechanism allowing
targeted elements of the Adj-RIB-In for a BGP neighbour to be
recovered.
It is of particular note for both means of recovering RIB consistency
described that these are effective only when considering transient
errors within an implementation - for instance, should an RFC
interpretation error within an implementation be present, regardless
of the number of times a specific UPDATE is generated, it is likely
that this error condition will persist (as it may with the existing
behaviour defined by [RFC4271]). For this reason, there is an
requirement to consider the means by which such consistency recovery
mechanisms are utilised. It is not advisable that a dynamic filter
and advertisement mechanism is triggered by all error handling events
due to the load this is likely to place on the neighbour receiving
such a request. Where this BGP speaker is a relatively centralised
device - a route reflector (as described by [RFC4456]) for example -
the act of generation of UPDATE messages with such frequency is
likely to cause disproportionate load. It is therefore an
operational requirement of such mechanisms that means of request
dampening be required by any such extension.
In cases whereby the consistency of the Adj-RIB-In is to be restored
(e.g., following the 'treat-as-withdraw' behaviour described in
Section 3), and mechanisms such as those described herein are
triggered, such a condition should be noted to an operator by means
of a specific flag, SNMP trap, or other logging mechanism. In order
to identify the subset of NLRI that are considered to be
inconsistent, this information is of operational benefit and hence
should be logged.
5. Reducing the Impact of Session Reset
Even where protocol enhancements allow errors in the BGP-4 protocol
to cease to trigger NOTIFICATION messages, and hence reset a BGP
session, it is clear that some error conditions may not be exited.
In particular, errors due to existing state, or memory structures,
associated with a specific BGP session will not be handled. It is
therefore important to consider how these error conditions are
currently handled by the protocol. It should be noted that the
following discussion and analysis considers only those NOTIFICATION
messages generated in response to errors in UPDATE messages (as
defined by Section 6.3 in [RFC4271]).
The existing NOTIFICATION behaviour triggers a reset of all elements
of the BGP-4 session, as described in Section 6 of [RFC4271]. It is
expected that session teardown requires an implementation to re-
initialise all structures and state required for session maintenance.
Clearly, there is some utility to this requirement, as error
conditions in BGP are, in general, exited from. However, this
definition is responsible for the forwarding outages within networks
utilising BGP for propagation of routing or service when each error
is experienced. The requirement described in Section 3 is intended
to reduce the cases whereby a NOTIFICATION is required, however, any
mechanism implemented as a response to this requirement by definition
cannot provide a session reset to the extent of that achieved by the
current behaviour.
In order to address this, there is a requirement for a means by which
a BGP speaker can signal that an unhandled error condition in an
UPDATE message occurred - requiring a session reset - yet also
continue to utilise the paths advertised by the neighbour that are
currently in use within the RIB. In this case, the Adj-RIB-In
received from the neighbour is not considered invalid, despite a
NOTIFICATION, and session reset, being required. This set of
requirements is akin to those answered by the BGP Graceful Restart
mechanism described in [RFC4724]. Since the operational requirement
in this case is to provide a means to achieve a complete session
restart without disrupting the forwarding path of those routes in use
within a BGP speaker's RIB, it is expected that utilising a procedure
similar to the Graceful Restart mechanism meets the error handling
requirement. By responding to an error condition (repeated or
otherwise) with a message indicating that an error that cannot be
handled has occurred, forcing session reset, whilst retaining
forwarding information within the RIB allows forwarding to all routes
within a system's RIB to continue during the period in which the
session restarts. It is envisaged that the additional complexity
introduced by the introduction of such a mechanism can be limited by
extending existing BGP messages - one such approach is proposed in
[I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the
restart lifetime, should an error condition not be transient - for
example, should an error have occurred with the BGP process, rather
than a specific of the BGP session - the remote BGP speaker is still
detected as an invalid device for forwarding.
In some cases, the erroneous condition may be due to corruption of
the Adj-RIB-Out on the advertising BGP speaker - rather than caused
by the receiving speaker's state. In these cases, where existing
structures are replayed whilst performing graceful restart
functionality, the error condition is not necessarily resolved.
Therefore, it is recommended that during a session restart event, as
described within this section, the advertising speaker purge and
rebuild RIB structures, in order to resolve any corruption within
these structures.
It should be noted that a protocol enhancement meeting this o Those messages where the NEXT_HOP, the MP_REACH_NLRI next-hop
requirement is not able to solve all error conditions - however, a values are missing, zero-length, or invalid for the relevant
complete restart of the BGP and TCP session between two BGP speakers address family.
implements an identical recovery mechanism to that which is achieved
by the existing behaviour. Where an error condition such as memory
or configuration corruption has occurred in a BGP implementation, it
is expected that a mechanism meeting this requirement continues to
detect this, by means of a bound on time for session restart to
occur. Whilst there may be some consideration that packets continue
to be forwarded through a device which can be in an failure mode of
this nature for a longer period due to this requirement, the
architecture of modern IP routers should be considered. A divided
forwarding and control plane is common in many devices, as well as
process separation for software-based devices - corruption of a
specific protocol daemon does not necessarily imply forwarding is
affected. Indeed, where forwarding behaviour of a device is
affected, it is envisaged that a failure detection mechanism (be it
Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets)
will detect such a failure in almost all cases, with the symptomatic
behaviour of such a failure being an invalid UPDATE message in very
few other cases.
6. Operational Toolset for Monitoring BGP For these Non-Critical errors, the NLRI-targeted error handling
requirements described in Section 4 should be followed.
A significant complexity that is introduced through the requirements In order to maximise the number of cases whereby the NLRI attributes
defined in this document is that of monitoring BGP session status for can be reliably extracted from a received message, where a BGP
an operator. Although the existing error handling behaviour causes a speaker supports multi-protocol extensions, the MP_REACH_NLRI and
disproportionate failure, session failure is extremely visible to MP_UNREACH_NLRI attributes SHOULD be utilised for all address
most operational personnel within a Network Operator due to both families (including IPv4 Unicast) and these attributes should be the
existing definitions of SNMP trap mechanisms for BGP, along with the first attribute contained within the UPDATE message.
forwarding impact typically caused by such a failure. By introducing
mechanisms by which errors of this nature are not as visible, this is
no longer the case. There is a requirement that where subsets of the
RIB on a device are no longer reachable from a BGP speaker, or indeed
an AS, that some visibility of this situation, alongside a mechanism
to determine the cause is available to an operator. Whilst, to some
extent, this can be solved by mandating a sub-requirement of each of
the aforementioned requirements that a BGP speaker must log where
such errors occur, and are hence handled, this does not solve all
cases. In order to clarify this requirement, the example of the
transmission of an erroneous Optional Transitive attribute can be
considered. Since, by definition, there is no requirement for all
BGP speakers to parse such an attribute, a receiving router may treat
NLRI as withdrawn based on an erroneous attribute not examined by its
neighbour. In this case, the upstream device or network, propagating
the UPDATE, has no visibility of this error. Operationally, however,
it is of interest to the upstream router operator that such invalid
information was propagated.
The requirement for logging of error conditions in transmitted BGP Where attributes are introduced by future extensions to the BGP
messages, which are visible to only the receiver, cannot be achieved protocol the error handling behaviour applied MUST be assumed that
by any existing BGP message, or capability. It is envisaged that applied to Non-Critical errors, unless otherwise specified within the
each erroneous event should be transmitted to the remote peer - per-extension memo, or the attribute relates directly to carrying
including the information as to the set of NLRI that were considered NLRI. Authors of future BGP extensions SHOULD specify the error
invalid. Whilst with some mechanisms this is achieved by default handling behaviour required for new attributes in terms of the
(for example, One-Time Prefix ORF [I-D.zeng-idr-one-time-prefix-orf] classification into a Critical or Non-Critical error on a per-
(Outbound Route Filtering) will transmit the set of routes that are attribute error basis.
required), the operator requirement is to know which routes may have
been unreachable in all cases. It is envisaged that an extension to
meet this requirement will allow for such information to be
transmitted between peers, and hence logged. Such a mechanism may
provide further utility as a either a diagnostic, or logging toolset.
As such, it is possible to divide the messages that are required in 4. Error Handling for Non-Critical Errors
order to provide further visibility into BGP for an operator. Such a
division can be made both due to the required means of message
transmission, alongside the criticality of each request.
o Messages required to replace NOTIFICATION - In cases where the 4.1. NLRI-level Error Handling Requirements
error handling mechanisms defined by [RFC4271] currently result in
a NOTIFICATION message being generated, a number of the
requirements detailed within this document result this message
being suppressed. Despite this change, the error condition's
occurrence is still of interest to an operator in order to provide
both monitoring and troubleshooting capabilities, since some form
of invalid data has been received on a session. It therefore
considered that an implementation must generate a message both
locally, and transmitted to the remote peer, based on the such a
condition. Where such a message is transmitted to the remote
peer, it is considered that the BGP session via which the
erroneous UPDATE message was received should be used as transport
to the remote peer. The information transmitted in such a message
should be minimised to allow identification of the paths which
were considered erroneous (i.e. restricting the information to
that which is directly relevant to a network operator in the case
of an error condition occurring). Any delay to convergence on the
session in question is considered to be acceptable, given the
suboptimal nature of the reception of invalid routing information
via a BGP session. Further concerns regarding such a mechanism
relate to the load generated on the BGP speaker in question,
however, it must be considered that in the case of an erroneous
UPDATE being received, and the 'treat-as-withdraw' mechanism being
utilised, where the erroneous path is removed from the Loc-RIB,
there is likely to be a requirement to generate UPDATE messages
withdrawing the route from all further BGP speakers to which the
prefix is advertised. The load generated by the generation of
such UPDATEs is likely to be much greater than that of
transmitting error information via a logging message type back to
the speaker from which it was received. It is envisaged that
light-weight BGP message-based signalling mechanisms such as the
ADVISORY message types detailed in
[I-D.ietf-idr-operational-message] provide a suitable means to
satisfy this requirement.
o Additional Diagnostic Capabilities for BGP - In a number of cases, When a Non-Critical error is detected within an UPDATE message a BGP
there is an operational requirement to further debug erroneous BGP speaker MUST NOT send a NOTIFICATION message to the remote neighbour.
UPDATE messages, along with the particulars of the state of a BGP Instead, the NLRI contained within the message MUST be considered as
speaker. For instance, where an invalid BGP UPDATE message is no longer viable until they are updated by a subsequent UPDATE
transmitted between two BGP speakers, the exact format of the message, thus treating the NLRI as withdrawn as per the treat-as-
UPDATE message is of interest to an operator, as this information withdraw mechanism described in [I-D.chen-ebgp-error-handling].
provides a clear indication of an message considered to be
erroneous by the BGP speaker to which it was transmitted. In this
case, it is considered of great utility that the entire UPDATE
message is transmitted back to the advertising speaker, in order
to allow for further debugging to occur. Whilst such information
is particularly useful to an operator, it clearly provides
information that is not key to protocol operation - for this
reason, it is expected that some of the concerns regarding the
additional complexity, and load that a BGP speaker is subjected to
is not acceptable. For this reason, it is required that where
mechanisms are developed to support this requirement, messages of
this nature can be supported both within an existing BGP session,
and via a dedicated separate session, be it BGP carrying messages
such as those defined in [I-D.ietf-idr-operational-message] or a
dedicated monitoring protocol akin to BMP described in
[I-D.ietf-grow-bmp].
Whilst the operational requirement for such monitoring tools to allow Network operators SHOULD recognise that where such behaviour is
for visibility into BGP is clearly agreed upon, the means by which implemented black-holing or looping of traffic may occur in the
such messages are transmitted between two BGP speakers is likely to period between the NLRI being treated as withdrawn, and subsequent
be dependent upon both the positions of the speakers in question (for updates, dependent upon the routing topology. It SHOULD be noted
instances, the requirements for such a protocol may differ where a that such periods of RIB inconsistency (where one speaker has
session is between two ASBRs under separate administration). The advertised a prefix, which has been treated as withdrawn by the
introduction of additional message types to the BGP protocol clearly receiving speaker) may be relatively long lived, based on situations
introduces further complexity - and leaves room for further such as an erroneous implementation at the receiver, or the error
implementation and standardisation errors that may compromise the occurring within an optional, transitive attribute not examined by
robustness of the BGP protocol. In addition, the queuing and the advertising device. In order to allow operators to select
scheduling of these BGP messages must be interleaved with the sessions on which this risk of inconsistency is acceptable, an
transmission of the key protocol messages - such as KEEPALIVE and implementation SHOULD provide means by which NLRI-level error
UPDATE packets. It is therefore a concern that should a large number handling for Non-Critical errors can be disabled on a per-session
of messages specifically for operational visibility be transmitted, basis.
this will delay the transmission of UPDATE packets, and hence
adversely affect the end-to-end convergence time for NLRI carried
within BGP. The operational requirement for why messages are
advantageous to be in-band to a protocol should also be considered.
In particular, it should be noted that where such information is to
be transmitted between administrative boundaries a BGP session
represents an existing channel between the two ASes. This channel is
considered to be secure insofar as the routing information, and
requests sent via the session are considered to come from a trusted
source. Since error information relates to both a particular
attachment, and is key to ensuring that such a session is operating
as expected, it is considered of great operational benefit that this
information is transmitted over this channel. In addition, the
overall system scalability is improved by such in-band transmission.
It is expected that erroneous information resulting in the 'treat-as-
withdraw' mechanism being utilised is relatively infrequently
transmitted between two peers (when compared to the frequency of
UPDATE messages transmission). The impact of including an additional
BGP message type for such operational visibility is relatively small
from a resource utilisation perspective - additional processing
overhead is only experienced when such a message is received. Where
a separate session is maintained, particular network elements within
a service provider topology may require hundreds, or thousands, of
additional sessions for the transmission of this information. Such
an resource consumption overhead is likely to be unacceptable to some
network operators.
For the reasons explained above, it is expected that mechanisms Since the Non-Critical error handling required within this section
specified to meet the requirements for event visibility consider the results in no NOTIFICATION message being transmitted, the fact that
relative impacts of additional monitoring sessions, or message an error has occurred and hence there may be inconsistency between
inclusion in band to BGP in order not to compromise the security, the local and remote BGP speaker MUST be flagged to the network
scalability and robustness of the BGP-4 protocol. operator through standard operational interfaces (e.g., SNMP,
syslog). The information highlighted MUST include the NLRI
identified to be contained within the error message, and SHOULD
contain a exact copy of the received message for further analysis.
7. Operational Complexities Introduced by Altering RFC4271 In order that the operator of the BGP speaker from whom an erroneous
UPDATE message has been advertised is aware of the fact that some
NLRI advertised to the remote speaker have been considered withdrawn
due to being contained within an erroneous UPDATE, a BGP speaker
SHOULD support mechanisms to report the occurrence of Non-Critical
error handling to the remote speaker. The receiving speaker SHOULD
transmit the NLRI contained within the erroneous message to the
advertising speaker. An exact copy of the received UPDATE message
SHOULD also be sent.
The existing NOTIFICATION and subsequent teardown of a BGP session The exchange of information related to events occurring as a result
upon encountering an error has the advantage that a consistent of BGP messages is not currently supported by any extension to the
approach to error handling is required of all implementations of the protocol. Clearly, where the two speakers reside within the same
BGP-4 protocol. This is of operational advantage as it provides a administrative domain, shared logging infrastructure can be utilised
clear expectation of the behaviour of the protocol. The requirements to identify the root cause of errors, however, in many cases
defined herein add further complexity to the error-handling within neighbouring BGP speakers reside within separate administrative
BGP, and hence are liable to compromise the existing deterministic domains (e.g., are ASBRs for Internet or private networks). In this
protocol behaviour. It is therefore deemed that there is a further case, mechanisms allowing transmission in-band to the BGP session
requirement to define a set of recommended behaviours based on the SHOULD be utilised (e.g., the OPERATIONAL message described in
reception of a particular class of erroneous UPDATE message, [I-D.ietf-idr-operational-message]). Such an in-band channel is
alongside highlighting some of the implementation complexities that preferred based on the BGP session representing a pre-established
may need to be handled in the case that particular recommendations trusted channel which is related to a specific BGP-speaking device
made within this memo are deployed. within a network. It is expected that the overall system scalability
of a BGP speaker is improved through utilising the existing channel,
rather than incurring overhead for maintaining many additional
logging-specific protocol sessions for relatively infrequent
messaging events when errors occur. However, the extensions
providing such a channel MUST consider their impact to base BGP
protocol functions such as the transmission of UPDATE or KEEPALIVE
messages, and SHOULD limit the volume of messaging to direct
reactions to Non-Critical errors occurring. These considerations
SHOULD be made in order to ensure that no compromise is made to the
security, scalability and robustness of BGP. Where additional BGP
monitoring information that is not suitable to be carried in-band is
required, out-of-band mechanisms such as the BMP protocol described
in [I-D.ietf-grow-bmp] could be utilised to provide further
information relating to erroneous messages.
Utilising the classes of erroneous UPDATE message described in 4.2. Recovering RIB Consistency following NLRI-level Error Handling
Section 2, the recommended behaviour for a BGP-4 implementation can
be divided into two branches. Primarily, where a semantic error is
identified, an implementation is expected to utilise the reduced-
impact error handling approach, as described in Section 3. In the
case that such an approach results in known NLRI being withdrawn from
the BGP speaker's RIB, and an implementation provides functionality
such that these errors are recovered from through an automatically
triggered means, such as those described within Section 4, some
consideration of the scalability of these recovery mechanisms is
required. Clearly, there is an computational and bandwidth overhead
associated with the re-advertisement of NLRI between two BGP speakers
- both due to the generation of UPDATE messages, their transmission
between the two speakers, and the parsing and processing into the RIB
required. This overhead is directly proportional to the number of
UPDATE messages that are required. Where a semantic error is
experienced, by definition the NLRI contained within the UPDATE can
be extracted. It is therefore possible to minimise the proportion of
the RIB that is re-advertised by targeting any recovery mechanism on
the NLRI contained within the erroneous UPDATE. Such a targeted
mechanism can be achieved through a means such as One-Time ORF, or
other means of targeting UPDATE messages not discussed within this
memo. It is recommended that where available, any automatic (or
manual) triggered recovery mechanism behaviour utilises such targeted
means in preference to any whole RIB refresh mechanism (such as
ROUTE-REFRESH).
In the case that an erroneous UPDATE has been processed through a Following NLRI being treated as withdrawn due to Non-Critical error
means such as treat-as-withdraw (described within Section 3), a handling, inconsistencies exist between the Adj-RIB-Out of the
recovering mechanism may be considered superfluous, if the assumption advertising BGP speaker, and the Adj-RIB-In of the receiving device.
is made that the RIB inconsistency will only be recovered from based These inconsistencies may result in forwarding loops or blackholing
on a path re-convergence (or change in BGP attribute) for the of traffic in some routing topologies. In order to ensure that such
advertising BGP speaker. However, where this assumption is not cases can be recovered from a means by which a validation and
considered to provide adequate recovery behaviour, and a mechanism to recovery of consistency can be achieved SHOULD be provided to an
restore RIB consistency automatically is implemented, some operator. This function may be provided through enhancing the ROUTE-
consideration must be made for where repeated erroneous messages REFRESH [RFC2918] mechanism to add means to identify the beginning
occur. In this case, in order to limit the impact to the BGP and end of a replay of the entire Adj-RIB-Out of the advertising
speaker's network operation, at a pre-defined point it is recommended speaker (as per the suggestion in
that such automatic recovery mechanisms towards the BGP speaker from [I-D.ietf-idr-bgp-enhanced-route-refresh]).
which erroneous UPDATEs are repeatedly received are suppressed, and
the fact that such suppression has occurred is highlighted to an
operator. The point at which such behaviour is suppressed is to be
defined on a per-implementation basis, taking into account feedback
from the Network Operator community based on the deployment of the
recommendations described in this document. It is expected that such
trigger points are dependent upon the mechanisms implemented for a
particular BGP-4 implementations, and the impact upon the speaker of
these means of RIB recovery.
Where critical errors are experienced, such that a session reset is As Non-Critical error handling is localised to the NLRI contained
required, the mechanism discussed in Section 5 should be used. within the erroneous UPDATE message, a targeted recovery mechanism
Again, since such a mechanism results in a restart of a BGP session, MAY be provided allowing a speaker to request re-advertisement of a
it expected that all NLRI carried over the session is re-advertised particular subset of the Adj-RIB-Out. Where such targeted refresh
as it is re-established, incurring processing overhead on both the functions are available, they SHOULD be preferred to mechanisms
advertising and receiving BGP speaker. In order to minimise the requesting re-advertisement of the whole Adj-RIB-Out based on their
consumption of control-plane computational resource on both speakers, more limited use of CPU and network resources.
it is recommended that mechanisms allowing a reduced set of BGP
UPDATE messages to be re-transmitted between two speakers are
employed wherever possible - for instance through employing
mechanisms such as those described in [I-D.ietf-idr-enhanced-gr].
In the case that repeated critical errors occur, the overhead of A BGP speaker may automatically trigger recovery mechanisms such as
performing any mechanism implemented based on the requirements in those described in this section following the receipt of an erroneous
Section 5 is incurred following each erroneous UPDATE message. Since UPDATE message identified as Non-Critical to expedite recovery. It
these mechanisms are, by definition, performed automatically in should be noted that if automatic recovery mechanisms trigger only
response to the erroneous message being received similar re-advertisement of an identical erroneous message, they are likely
considerations as to the impact to the BGP speaker must be taken into to be ineffective. Additionally, where the best-path to be
account. As such, it is expected that after a certain trigger level, advertised by remote speaker changes, this will be advertised
the ongoing receipt of critical errors within BGP UPDATE messages is directly, without a requirement for a request from the receiver.
deemed to be indicative of a long-lasting failure, and a session no However, in some cases, RIB consistency recovery mechanisms may
longer considered viable. Where such an case is experienced, it is prompt alternate UPDATE message packing, and hence allow quicker
expected that the BGP session reverts to the standard session failure recovery. Where such mechanisms are implemented, mechanisms focused
behaviour, as described in [RFC4271] and documents updating this base to smaller sets of NLRI SHOULD be preferred over those requesting the
standard. Where such a reversion is implemented this condition entire RIB. In addition, such mechanisms SHOULD have dampening
should be flagged to an network operator. The number of restart mechanisms to ensure that their impact to computational and network
attempts before the session reverts to being shut down should be resources is limited.
determined based on the overhead of the recovery mechanisms
implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is
implemented, the impact of session restart may be significantly
lower), and operational experience of the deployment of the
recommendations described in this document.
Since repeated erroneous UPDATE messages which experience critical 5. Error Handling for Critical Errors
errors may be indicative of long-lasting failure modes, it is
recommended that a back-off from restarting BGP sessions experiencing
such behaviour is implemented. As such, this is not applicable to
restart behaviour through means such as those described in Section 5
since such restarts are time-bound based on the period for which the
Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when
considering BGP Graceful Restart, such restarts are time-bound by the
Restart Time described in [RFC4724]). However, following a session
reverting to being pulled down based on repeated error conditions, it
is recommended that following restart attempts are subject to an
exponentially increasing interval between subsequent attempts. It is
therefore recommended that in such cases an implementation implements
the increasing values of IdleHoldTimer as described in the BGP-4 FSM
documented in [RFC4271].
7.1. Reducing the Network Impact of Session Teardown Where an UPDATE message containing a Critical error is received,
since the NLRI cannot be extracted, error handling mechanisms must be
applied at the per-session level. In order to limit the impact to
network operation, these session-level mechanisms MUST be applied in
a manner which allows the paths NLRI received from the remote speaker
to continue to be utilised for forwarding during the session reset
and re-establishment. It is envisaged that this requirement may be
met through extension of the BGP Graceful Restart mechanism
([RFC4724]) to be triggered by NOTIFICATION messages indicating the
occurrence of a Critical error. Such an extension allows a restart
of the TCP and BGP sessions between two speakers, in a similar manner
to the current session restart behaviour triggered by a NOTIFICATION
message. In order to maximise the level of re-initialisation which
occurs during such a restart triggered by a Critical error, BGP
speakers MAY re-initialise memory structures related to the
Adj-RIB-In and Adj-RIB-Out associated with the session on which the
erroneous UPDATE was observed.
As discussed within the preceding section, where repeated critical Where such a restart event occurs, the continued liveliness of the
UPDATE message errors are received, it is recommended that the impact remote device MAY be verified by BGP KEEPALIVE packets or other OAM
to the both advertising and receiving BGP-4 speakers be limited by functions such as Bidirectional Forwarding Detection ([RFC5880]). In
reverting to tearing the BGP-4 session experiencing such errors down. cases where the observed Critical BGP error is indicative of a wider
The BGP-4 specification presented in [RFC4271] achieves such a device failure of the remote speaker, it is expected that a BGP
session shutdown by sending a NOTIFICATION message, however, this has sessions will not re-establish correctly. Each BGP speaker SHOULD
the net result that all downstream BGP speakers (i.e. those to whom maintain a limited time window in which session restart is expected
the routes carried over the now ceased BGP session was readvertised) in order to mitigate this possibility.
must withdraw this route from their RIB, and perform a best-path
selection if required. In some cases, there may be no alternate path
available, and hence a period of time for which no valid BGP route
exists. Particularly, this is very likely to occur where an upstream
BGP speaker performs a best-path selection and advertises only a
single path to its neighbours - there is a requirement for the
upstream speaker to perform a best-path selection, and re-advertise a
new set of NLRI before the downstream system is able to converge to a
new path. It should be noted that where UPDATE messages withdrawing
NLRI are not subject to the BGP session's configured
MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements
are, this may result in a BGP speaker being without a path for a
period up to the MRAI.
Clearly, it is advantageous to avoid this period of time for which When a Critical error occurs, the network operator MUST be made aware
there may be no reachability for a set of routes, especially since of its occurrence through local logging mechanisms (e.g., SNMP traps
the BGP speaker terminating a particular session is doing so due to a or syslog). The BGP speaker receiving an UPDATE message identified
particular error handling policy. The graceful shutdown mechanism as a Critical error MUST log its occurrence and a copy of the UPDATE
detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a message. Where a inter-device messaging mechanism is implemented (as
BGP speaker is able to signal that a set of routes are to be discussed in Section Section 4.1) a copy of the erroneous UPDATE
withdrawn, and hence allow downstream systems to pre-emptively message SHOULD be transmitted to the remote speaker. Both BGP
perform a best-path selection, and hence advertise new reachability speakers MUST indicate to an operator the cause of a session restart
information in a make-before-break manner. was a Critical error in an UPDATE message.
It is therefore envisaged, that where a session is to be shutdown, Since repeated critical errors (and session restarts) may have an
based on a trigger relating to erroneous UPDATE messages being impact in overall device scaling if the failure condition is not
received (be they repeated or not) that the graceful shutdown resolved by session restart, a BGP speaker MAY choose to revert to
procedure in utilised, so as to reduce the forwarding impact of the session tear down behaviour described in the base BGP
routes received on the session being withdrawn. specification. This reversion SHOULD only be utilised after a number
of attempts which SHOULD be controllable by the network operator.
Where a session is shut down, the implementation MAY utilise a back-
off from session restart attempts (as per the IdleHoldTimer described
in the BGP FSM [RFC4271]). Where reversion to tearing down the BGP
session is performed, a speaker SHOULD limit the impact of
withdrawing prefixes from downstream speakers where possible. It is
envisaged that this can be achieved by utilising a mechanism such as
the BGP Graceful Shutdown procedure as described in
[I-D.ietf-grow-bgp-gshut].
8. IANA Considerations 6. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
9. Security Considerations 7. Security Considerations
The requirements outlined in this document provide mechanisms by The requirements outlined in this document provide mechanisms which
which erroneous BGP messages may be responded to with limited impact limit the overall impact of the response to an error in a BGP UPDATE
to forwarding operation. This is of benefit to the security of a BGP message. This is of benefit to the security of a BGP speaker.
speaker in general. Where UPDATE messages may have been propagated Without these mechanisms, where erroneous UPDATE messages relating to
by a single malicious Autonomous System or router within a network a single NLRI entry can be propagated to a BGP speaker, all other
(or the Internet default free zone - DFZ), which are then propagated NLRI carried via the same session are affected by the resulting
to all devices within the same routing domain, all other NLRI session tear-down. This may result in an AS being isolated from
available over the same session become unreachable. This mechanism particular routing domains (such as the Internet) should an UPDATE
may provide means by which an Autonomous System can be isolated from message be propagated via targeted specific paths. It is envisaged
required routing domains (such as the Internet), should the relevant by reducing the impact of the reaction of the receiving speaker to
UPDATE messages be propagated via specific paths. By reducing the these messages, the isolation can be constrained to specific sets of
impact of such failures, it is envisaged that this possibility may be NLRI, or a specific topology.
constrained to a specific set of NLRI, or a specific topology.
Some mechanisms meeting the requirements specified in this document, A number of the mechanisms meeting the requirements specified within
particularly those within Section 6 may provide further security the document (particularly those relating to operational monitoring)
concerns, however, it is envisaged that these are addressed in per- may raise further security concerns. Such concerns will be addressed
enhancement memos. during the specification of these mechanisms.
10. Acknowledgements 8. Acknowledgements
The author would like to thank the following network operators for The author would like to thank the following network operators for
their insight, and valuable input in defining the requirements for a their insight, and valuable input into defining the requirements for
variety of operational deployments of the BGP-4 protocol; Shane a variety of deployments of the BGP protocol: Shane Amante, Bruno
Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom Decraene, Rob Evans, David Freedman, Wes George, Tom Hodgson, Sven
Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Scholl and
Scholl and Ilya Varlashkin. Ilya Varlashkin.
In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, In addition, many thanks are extended to Jeff Haas, Wim Hendrickx,
Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert
Raszuk for their expertise relating to implementations of the BGP-4 Raszuk for their expertise relating to implementations of the BGP
protocol. protocol.
11. References 9. References
11.1. Normative References 9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz,
"Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000.
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918,
September 2000. September 2000.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006. Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route
Reflection: An Alternative to Full Mesh Internal BGP
(IBGP)", RFC 4456, April 2006.
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y.
Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724,
January 2007. January 2007.
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service
"Multiprotocol Extensions for BGP-4", RFC 4760, (VPLS) Using BGP for Auto-Discovery and Signaling",
January 2007. RFC 4761, January 2007.
11.2. Informational References [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, June 2010.
9.2. Informational References
[I-D.chen-ebgp-error-handling] [I-D.chen-ebgp-error-handling]
Chen, E., Mohapatra, P., and K. Patel, "Revised Error Chen, E., Mohapatra, P., and K. Patel, "Revised Error
Handling for BGP Updates from External Neighbors", Handling for BGP Updates from External Neighbors",
draft-chen-ebgp-error-handling-01 (work in progress), draft-chen-ebgp-error-handling-01 (work in progress),
September 2011. September 2011.
[I-D.ietf-grow-bgp-gshut] [I-D.ietf-grow-bgp-gshut]
Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. Francois, P., Decraene, B., Pelsser, C., Patel, K., and C.
Filsfils, "Graceful BGP session shutdown", Filsfils, "Graceful BGP session shutdown",
draft-ietf-grow-bgp-gshut-03 (work in progress), draft-ietf-grow-bgp-gshut-04 (work in progress),
December 2011. October 2012.
[I-D.ietf-grow-bmp] [I-D.ietf-grow-bmp]
Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring
Protocol", draft-ietf-grow-bmp-06 (work in progress), Protocol", draft-ietf-grow-bmp-07 (work in progress),
December 2011. October 2012.
[I-D.ietf-idr-bgp-enhanced-route-refresh] [I-D.ietf-idr-bgp-enhanced-route-refresh]
Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced
Route Refresh Capability for BGP-4", Route Refresh Capability for BGP-4",
draft-ietf-idr-bgp-enhanced-route-refresh-02 (work in draft-ietf-idr-bgp-enhanced-route-refresh-03 (work in
progress), June 2012. progress), December 2012.
[I-D.ietf-idr-bgp-gr-notification]
Patel, K., Fernando, R., and J. Scudder, "Notification
Message support for BGP Graceful Restart",
draft-ietf-idr-bgp-gr-notification-00 (work in progress),
December 2011.
[I-D.ietf-idr-enhanced-gr]
Patel, K., Chen, E., Fernando, R., and J. Scudder,
"Accelerated Routing Convergence for BGP Graceful
Restart", draft-ietf-idr-enhanced-gr-01 (work in
progress), June 2012.
[I-D.ietf-idr-operational-message] [I-D.ietf-idr-operational-message]
Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL
Message", draft-ietf-idr-operational-message-00 (work in Message", draft-ietf-idr-operational-message-00 (work in
progress), March 2012. progress), March 2012.
[I-D.ietf-idr-optional-transitive]
Scudder, J., Chen, E., Mohapatra, P., and K. Patel,
"Revised Error Handling for BGP UPDATE Messages",
draft-ietf-idr-optional-transitive-04 (work in progress),
October 2011.
[I-D.zeng-idr-one-time-prefix-orf]
Zeng, Q., Dong, J., Heitz, J., Patel, K., Shakir, R., and
Z. Huang, "One-time Address-Prefix Based Outbound Route
Filter for BGP-4", draft-zeng-idr-one-time-prefix-orf-02
(work in progress), July 2012.
[RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
June 2010.
Author's Address Author's Address
Rob Shakir Rob Shakir
BT BT
pp C3L pp C3L, BT Centre
BT Centre
81, Newgate Street 81, Newgate Street
London EC1A 7AJ London EC1A 7AJ
UK UK
Email: rob.shakir@bt.com Email: rob.shakir@bt.com
URI: http://www.bt.com/ URI: http://www.bt.com/
 End of changes. 66 change blocks. 
926 lines changed or deleted 373 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/