Internet Engineering Task Force R. Shakir Internet-Draft BT Intended status: InformationalJuly 30,December 27, 2012 Expires:January 31,June 30, 2013 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4draft-ietf-grow-ops-reqs-for-bgp-error-handling-05draft-ietf-grow-ops-reqs-for-bgp-error-handling-06 AbstractBGP-4BGP is utilised as a key intra- andinter-Autonomous Systeminter-autonomous system routing protocol in modern IP networks. The failuremodesmodes, as defined by the original protocolstandardsstandards, are based on a number of assumptions around the impact of session failure. Numerous incidents both in the global Internet routing table and withinService Providerservice provider networks have been caused by strict handling of a single invalid UPDATE message causing large-scale failures in one or moreAutonomous Systems.autonomous systems. This memo describes the current use ofBGP-4BGP withinService Providerservice provider networks, and outlines a set of requirements for further work to enhance the mechanisms available to aBGP-4BGP implementation when erroneous data is detected. Whilst this document does not provide specification of any standard, it is intended as an overview of a set of enhancements toBGP-4BGP to improve the protocol's robustness to suit its current deployment. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onJanuary 31,June 30, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1.Introduction . . . . .Requirements Language . . . . . . . . . . . . . . . . . . . . 31.1. Role of BGP-4 in Service Provider Networks .2. Problem Statement . . . . . . .3 1.2. Overview of Operator Requirements for BGP-4 Error Handling. . . . . . . . . . . . . . . 4 2.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 4 3. Critical and Non-Critical Errors . .5 2. Errors within BGP-4 UPDATE Messages. . . . . . . . . . . . . 72.1. Classifying BGP Errors and Expected4. Error Handling. . . . 8 2.1.1. Critical BGPfor Non-Critical Errors . . . . . . . . . . . .. . . . .92.1.2. Semantic BGP Errors . . . . . . .4.1. NLRI-level Error Handling Requirements . . . . . . . . . . 93. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 11 4.4.2. Recovering RIB Consistency following NLRI-level Error Handling . . . . . . . . . . . . . . . . . .13 5. Reducing the Impact of Session Reset . . . . . .. . . . . . .15 6. Operational Toolset10 5. Error Handling forMonitoring BGP . . . . . .Critical Errors . . . . . .17 7. Operational Complexities Introduced by Altering RFC4271. . .21 7.1. Reducing the Network Impact of Session Teardown. . . . .23 8.12 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . .25 9.14 7. Security Considerations . . . . . . . . . . . . . . . . . . .26 10.15 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .27 11.16 9. References . . . . . . . . . . . . . . . . . . . . . . . . . .28 11.1.17 9.1. Normative References . . . . . . . . . . . . . . . . . . .28 11.2.17 9.2. Informational References . . . . . . . . . . . . . . . . .2817 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . .3019 1.Introduction Where BGP-4 [RFC4271] is deployedRequirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Problem Statement BGP has become a key intra- and inter-domain routing protocol, deployed within both the Internet andService Provider networks, numerous incidents have been recorded due toprivate networks. The increased reliance on themanner in which [RFC4271] specifies errorsprotocol has resulted inrouting information should be handled. Whilstincreased demand for robustness - with the error handling behaviour defined inthe existing standards retains utility, the deployments of the protocol have changed within modern networks, resulting in significantly different demands for protocol robustness. Whilst a number of Internet Drafts have[RFC4271] having beenwrittenshown tobegin to enhance the behaviour of BGP-4 in termshave caused numerous incidents within live network deployments. This document provides an overview of thehandling of erroneous messages, this memo intends tocurrent deployment cases for BGP-4, and define a set of requirementsfor ongoing work. These requirements are considered from(from the perspective of aNetwork Operator, and hence this draft does not intend to define the protocol mechanisms by which suchnetwork operator) for enhancing error handlingbehaviour is to be implemented. 1.1.within the protocol. 2.1. Role of BGP-4 in Service Provider Networks BGP was designed as aninter-Autonomous Systeminter-autonomous system (AS) routingprotocol and hence manyprotocol. Many of the error handling mechanisms within the protocolspecificationaredesigneddefined in order to beconducive to this role. In general, this consideration as an inter-AS routing propagation mechanism results in the viewguarantee consistency, and correctness of information between two neighbouring speakers. The assumption is made thata BGP session propagateseach AS operates with many adjacencies, each propagating a relatively small amount ofnetwork-layer reachabilityrouting information. Through focusing on information(NLRI) between two ASes. In this case, it isconsistency, theexpectationprotocol specification prefers failure ofsession resilience for those adjacencies that are key toan individual routingcontinuity (for example, it is expected that two networks peering via BGP would connect multiple times in orderadjacency tosafeguard equipment or protocol failure). In addition, there is some expectation of multiple pathsmaintaining reachability to all NLRI received from a particularNLRI being available - it would be expectedneighbour, with the expectation thata network can fall back to utilisingalternate, less direct, paths can be selected where a failure occurs. The assumptions of the nature of BGP deployments resulted in the specification made in [RFC4271] whereby the receipt of an erroneous UPDATE message is reacted to by sending a NOTIFICATION message, and tearing down the adjacency with the remote speaker from whom the error was observed. Historically, amore direct path occurs. Traditionalnetworkarchitectureswould deploy anInterior Gateway Protocolinterior gateway protocol (IGP) to carry infrastructure and customer routes,withand utilise anExterior Gateway Protocolexternal gateway protocol (EGP) such as BGPbeing utilisedto propagatetheseroutes to otherAutonomous Systems.autonomous systems. However, BGP's deployments have evolved with the growth of IP-basedservices, this is no longer considered best practice. In order toservices. To ensurethatroute convergence within an AS is within acceptable timebounds,bounds the amount ofroutinginformationcarriedwithin the IGPis significantly reduced - and tendshas been minimised (typically tobeonly infrastructureroutes.routes). iBGP is then utilised topropagatecarry bothcustomer,internal, customer and external routes within an AS. As such,BGPthis has resulted in BGP having become an IGP, with traditional IGPsacting as a means by which to propagateproviding only reachability between nodes within therouting information which is requiredAS for packet forwarding and to establisha BGP session, and reach the egress node within the local routing domain.iBGP sessions. This change in rolepresents different requirements forwithin therobustnessoverall architecture ofBGP as a routing protocol -an AS has resulted in an increased robustness requirement for BGP, with the expectation of a similar level of robustness to that of an IGP being set.AlongThe loss of an iBGP session can result in significant levels of unreachability internally to an AS, especially since there are typically limited (when compared to the Internet) signalling and forwarding paths available. In parallel with this changein role,of deployment, the volume and nature of theIP routinginformationthat iscarried within BGP has also changed. BGP has becomeathe ubiquitous meansbythrough which service information can be propagated between devices. For instance,BGP isbeing utilised to carryroutingIP/MPLS service informationfor IP/ MPLS VPN servicessuch asdescribed in [RFC4364].Layer 3 IP VPN routes [RFC4364] , and Layer 2 Virtual Private LAN Service device membership [RFC4761]. Sincethere is an existing deployment of the protocol between PE devices in numerous networks, it has been adaptedthese extensions topropagate this routing information, as its use limitsthenumberprotocol allow signalling ofrouting protocols required onmultiple services (represented by address families within BGP), and multiple customer topologies (i.e., subsets of routes within eachdevice. This additional information being propagated represents a large change in requirement foraddress family) via theerror handling ofBGP protocol, theprotocol - whereimpact of session failureoccurs, itislikely a complete service outage for at least a subsetincreased. The tear down of anetwork's customers is experienced where an erroneous packet may have occurred withinsingle BGP session can result in adifferent sub-topology orcomplete outage to all customer services signalled via the session, evenservice (a different address family for example). For this reason, therewhere the triggering event isa significant demandrelated toavoidonly one serviceaffecting failures that may be triggered by routing information within a single sub-topologyorservice. The combination of the increased number of deployments of BGP-4 as an intra-AS routing protocol, its use for the propagation of additional types of routing and service information,topology being carried - reflecting a disproportional impact to all other services andthe growthrouting topologies. The convergence ofIPservices to IP, and BGP's changing deployment has resulted in asubstantial increasesignificant growth in the volume of routing information carriedwithin BGP-4.in the protocol. In numerous networks, the RIBsizessize of individual BGP speakers can be of the order of millions ofentries exist within individual BGP speakers, with particularly high-scale points exhibitedpaths. Particularly large RIBs are observed at BGP speakers performing aggregation and border roles (such as ASBR, orfunctionality designed improve utilisation of network resources (e.g.,route reflector hierarchies).Clearly an increase in the amount routing information carried in BGPThis increased volume of routes results not only ingreater impact toa significant number of services being impacted duringfailures, which is only amplified by a corresponding increase in recovery times. Followinga protocol failure,there is a substantialbut also increases the time to recovery after re- establishing a BGP session. The time taken to learn, compute and distribute newpaths, which results in a greater observedpaths increases the impacttoof failures on servicesaffected, and hence addscarried by the network - adding further weight to the requirement to avoidfailures altogether or, at least, mitigate their impact tofailures, or limit thenarrowest scope possible, (e.g., a specific NLRI). Whilst an argument could be made that convergence timeextent ofBGP-4 could potentially be reduced through deploymenttheir impact. Furthermore, the impact ofadditional computational resource, it is notable that solution is not necessarily straightforward from an implementation or deployment perspective, (e.g., scaling computation resources within a single address-familyindividual session failures isdifficult). Thus, significant challenges continueincreased due toexist for operators when scaling BGP-4 deployments, and hence mechanisms which improvethescalabilityexistence ofBGP-4 are very important. Both within Internet and multi-service routing architectures,a relatively small number of highly-critical BGP sessions within Internet and multi-service network deployments. These sessions propagate alarge proportionhigh-proportion of therequired routingreachability information - fornetwork operation. Forinstance, providing an Internetrouting, these are typically BGP sessions which propagateAS with the global routing table from upstream providers, or connecting IP/MPLS Provider Edge devices toan AS - failure of these sessions may have a large impact on network service, based on a single erroneous update. In an multi- service environment, typical deployments utilise a small number of core-facing BGP sessions, typically towardsroute reflectordevices. Failurehierarchies from which they are signalled reachability for services connected elsewhere within the routing domain. In both cases, the failure of these sessionsmay alsocan result in alarge impact to network operation. Clearly, the avoidance of conditions requiring these sessions to fail is of great utilitysignificant outage toany network operator, and provides further motivation forcustomer services. For therevisioncurrent deployments ofthe existing behaviour. WhilstBGP, the behaviour described in [RFC4271]is suitedrelated toensuring that BGPhandling errors in UPDATE messageswith erroneous routing informationis suboptimal, and results inare limitedsignificant disruption to services inscope (by meansmodern network deployments. This document defines a set ofsession reset), with the above considerations, it is clear that this mechanism is not suitedrequirements for protocol developments, and revisions toall deployments.[RFC4271] to address these concerns through a set of generalised definitions. Itshould, however,should be noted that thechange inscopeaffectsof these requirements is limited to the handlingonlyoferrors occurring after BGP session establishment. ThereUPDATE messages as, at the time of writing, there is nocurrentoperational requirement to amend the means by which error handling in session establishment, or livelinessdetection,detection are performed.1.2. Overview of Operator Requirements for BGP-4 Error Handling It3. Critical and Non-Critical Errors As described in Section 2.1, the error handling behaviour described in [RFC4271] is applied at a per-session level, affecting all NLRI signalled via the adjacency on which an erroneous message is observed. In order to reduce theintentionimpact ofthis documenterror handling todefinethose NLRI affected by an erroneous UPDATE, aset of criteria forBGP speaker MUST limit themanner in which a revisederror handlingmechanism in BGP-4mechanisms implemented to those NLRI contained within an erroneous UPDATE message where it isrequiredpossible toconform. The motivation fordo so. Clearly, some errors within thedefinitionformation ofthese requirements can be summarised based on certain behaviour currently presentBGP UPDATE messages may result in it being impossible to reliably extract NLRI from theprotocol that is not deemed acceptable within current operational deployments, or where there is a short-fall inreceived message, and hence thetool set available to an operator. These key requirements can be summarised as follows: o Itsame error handling procedures may not apply. There isunacceptable within modern deployments of the BGP-4 protocol thattherefore asingle erroneous UPDATE packet affects routes that it does not carry. Thisrequirementtherefore requires some modificationto classify errors based on their impact to themeans by which erroneousBGP UPDATEpacketsmessage, hence messages whereby the NLRI attribute cannot be extracted or parsed arehandled, and reactedreferred to throughout this document as Critical errors. These Critical errors are limited to: o UPDATE Message Length errors - where the specified UPDATE message length is inconsistent witha particular focus on avoidingtheusesum of theNOTIFICATION message. o It is recognised that some error conditions may occur within the BGP-4 protocol may not always be handled gracefully,Total Path Attribute and Withdrawn Routes length. These errors relate to message packing or framing, andmayresult inconditionscases wherebyan implementationthe NLRI attribute cannotrecover. In these (and similar) cases, it is undesirable forbe correctly extracted from the message. o Errors parsing the NLRI attribute of anoperator that this resetUPDATE message - where the contents of theBGP-4 session resultsIPv4 Unicast Advertised or Withdrawn Routes attributes, or multi-protocol BGP NLRI attributes (MP_REACH_NLRI and/or MP_UNREACH_NLRI as defined ininterruption to forwarding packets (by means[RFC2858]), cannot be successfully parsed. In the case ofwithdrawing routes installed by BGP-4 into a device's RIB, and subsequently FIB). To this end, thereCritical errors isa requirement to defineexpected that error handling is applied at a sessionreset mechanism which provides session re-initialisation in a non-destructive manner. o Further to the requirements to provide a more robust protocol, the current visibility into error conditions within the BGP-4 protocol is extremely limited - where further modifications tolevel as per Section 5 of thisbehaviour are to be made, complexity is likely to be added. Thus, to ensure that BGP-4 is manageable, there are requirements for mechanisms by whichdocument. All errors whereby theprotocolcontained NLRI can beexamined and monitored. This document describes each of these requirements in further depth, along with an overview of means by which they are expected to be achieved. In addition, the mechanism by which the enhancements meeting these requirementsextracted, are referred tointeractas Non-Critical. It isdiscussed. 2. Errors within BGP-4 UPDATE Messages Both through analysis of incidents occurring withexpected that theInternet DFZ, and multi-service environments utilising BGP-4 to signal service or routing information, a number of different classes of errorsfollowing cases fall withinBGP-4 UPDATE messages have been observed. In order to consider the applicability of enhanced error handling mechanisms, it is possible to divide thesethis category: o Zero or invalid length errorsinto a number of sub-classes, particularly focusing aroundin path attributes, excluding those containing NLRI, or where thelocationlength ofthe errorall path attributes contained within the UPDATEmessage. Where an UPDATE message is considered invalid by a BGP speaker duedoes not correspond toan error withinthe total path attribute length. o Messages where invalid data or flags are contained in a path attribute thatisdoes not relate to theNLRI (whereNLRI. o UPDATE messages missing mandatory attributes, unrecognised non- optional attributes, or those that contain duplicate or invalid attributes (be they unsupported, or unexpected). o Those messages where thedefinition of NLRI includes reachability information encoded inNEXT_HOP, the MP_REACH_NLRIand MP_UNREACH_NLRI attributes as specified in [RFC4760]) it is a requirement of any enhanced error handling mechanism to handle the error in a manner focused on the NLRI contained within the message found to be erroneous. Since in this case, the message received from the remote peer is syntactically valid, it is considered that such an UPDATE is indicative of erroneous data within one or more path attributes. The impact of the current behaviour defined within the protocol makes the implication that the BGP speaker from whom the message is received is now an invalid path for all NLRI announced via the session - which results in a disproportionate impact to overall network operation. In particular scenarios (such as networks with centralised BGP route reflection) such action can result in a loss of all reachability to a network. In other contexts (such as the Internet DFZ), it cannot be assumed that the BGP speaker from whom the UPDATE message is received is directly responsible for the erroneous information contained within the message. Two further error cases exist within UPDATE messages, both of which are related to the mechanisms that are applicable to messages received where some difficulty exists in parsing the entire BGP message. The two cases concern those cases where a valid NLRI attribute can be extracted, and those where such an attribute is not able to be parsed. In these cases, errors in the packing of attributes within a BGP message may have occurred. Such errors are likely indicative of an error specifically caused by the remote BGP speaker. It is, however, desirable to an operator that such errors are handled without affecting all NLRI across a BGP session. As such, there is a key requirement to maximise the number of cases in which it is possible to extract NLRI from a BGP UPDATE message. To this end, it is required that where possible the MP_REACH_NLRI and MP_UNREACH_NLRI attributes are utilised for encoding all NLRI (including IPv4 Unicast), and that this attribute is included as the first attribute of a BGP UPDATE message (as originally recommended in [I-D.chen-ebgp-error-handling]). Such a change to the order of inclusion of this attribute maximises the number of cases in which NLRI can be extracted from an UPDATE. Where this is possible, it is again required that the error handling mechanisms utilised should be directly applied to the NLRI included in the UPDATE. For all cases whereby NLRI can be obtained from an UPDATE message, it is expected that the requirements outlined in Section 3 should be considered by any enhancement to the BGP-4 protocol. In the case that it is not possible to completely parse the NLRI attribute from the UPDATE message received from a peer, it is extremely likely that this is indicative of a serious error with either the process of attribute packing, or buffer usage on the remote BGP speaker. In this case, clearly, it is not possible to apply any error handling mechanism that is limited to a specific set of NLRI, since an implementation has no knowledge of the NLRI included within the UPDATE message. In addition, such errors are considered to be relatively fundamental to the operation of a BGP implementation, and hence may indicate a case whereby significant system errors have occurred. The current BGP-4 standard results in a BGP speaker restarting a session with the remote BGP speaker. However where such an error does occur, it is required that a graceful mechanism is utilised to provide a lower impact to network operation. The requirements for enhancements of this nature to BGP-4 are outlined in Section 5, with the requirements outlined therein focused on providing a means by which system integrity can be restored whilst allowing for continued network operation. 2.1. Classifying BGP Errors and Expected Error Handling It is clearly of advantage for BGP-4 implementations to utilise a consistent set of error handling mechanisms for the different types of errors that are described in Section 2, and provide consistent nomenclature to refer to them. It is therefore suggested that errors that are indicative of larger scale failures of a BGP speaker, and hence require some error handling at the session level are referred to as 'critical' errors, whilst those errors that are identified based on incorrect content of one of more attributes of a message are referred to as 'semantic' errors. The errors identified within the following sections consider only those errors within the specifications at the time of writing, it is recommended that in the definition of future extensions to the BGP-4 specification, the error handling behaviour (and the category within which errors within the extension should be considered by an implementation) is defined. 2.1.1. Critical BGP Errors As described in this document, it is of advantage to limit the number of 'critical' errors that occur within the protocol, therefore, based on analysis of the processing of BGP UPDATE messages, it is required that 'critical' error handling behaviour is applied to: o UPDATE Message Length errors - whereby the specified overall UPDATE message length is inconsistent with sum of the Total Path Attribute and Withdrawn Routes length. In this case, this is indicative of message packing failure, whereby the NLRI may not be correctly extracted. o Errors Parsing the NLRI attributes of an UPDATE message - where NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes [RFC2858], it is not possible to target error handling mechanisms to specific NLRI, and hence session level mechanisms must be utilised. It is expected that those requirements outlined in Section 5 are utilised to provide session-level handling of those errors identified as 'critical'. 2.1.2. Semantic BGP Errors Where a BGP message is correctly formed, a number of cases exist whereby the contents of the UPDATE are not valid - in these cases, this represents errors that can be identified to affect specific NLRI. The following cases are expected to be classified as semantic errors: o Zero or invalid length errors in path attributes excluding those containing NLRI, or where the length of all path attributes contained within the UPDATE does not correspond to the total path attributes length. In this case, the NLRI can be correctly extracted, and hence acted upon. o Messages where invalid data or flags are contained in a path attribute that does not relate to the NLRI. o UPDATE messages missing mandatory attributes, unrecognised non- optional attributes or those that contain duplicate or invalid attributes (be they unsupported or unexpected). o Those messages where the NEXT_HOP, or MP_REACH next-hop values are missing, length zero, or invalid for the relevant AFI/SAFI. In these cases, it is expected that these errors can be handled gracefully, following the requirements detailed in Section 3 and Section 4 of this memo. 3. Avoiding use of NOTIFICATION The error handling behaviour defined in RFC4271 is problematic due to the limited options that are available to an implementation. When an erroneous BGP message is received, at the current time, the implementation must either ignore the error, or send a NOTIFICATION message, after which it is mandatory to terminate the BGP session. It is apparent that this requirement is at odds with that of protocol robustness. There is significant complexity to this requirement. The mechanism defined in [I-D.chen-ebgp-error-handling] describes a means by which no NOTIFICATION message is generated for all cases whereby NLRI can be extracted from an UPDATE. The NLRI contained within the erroneous UPDATE message is considered as though the remote BGP speaker has provided an UPDATE marking it as withdrawn. This results in a limit in the propagation of the invalid routing information, whilst also ensuring that no traffic is forwarded via a previously-known path that may no longer be valid. This mechanism is referred to as "treat-as-withdraw". Whilst this behaviour results in avoiding a NOTIFICATION message, keeping other routing information advertised by the remote BGP speaker within the RIB, it may result in unreachability for a sub-set of the NLRI advertised by the remote speaker. Two cases should be considered - that where the entry for a route in the Adj-RIB-In of the neighbour propagating an erroneous packet is utilised, and that where the route installed in the device's RIB is learnt from another BGP speaker. In the former case, should the identified NLRI not be treated as withdrawn, the original NLRI is utilised within the global RIB. However, this information is potentially now invalid (i.e. it no longer provides a valid forwarding path), whilst an alternate (valid) path may exist in another Adj-RIB-In. By continuing to utilise the NLRI for which the UPDATE was considered invalid, traffic may be forwarded via an invalid path, resulting in routing loops, or black-holing. In the second case, no impact to the forwarding of traffic, or global RIB, is incurred, yet where treat-as-withdraw is implemented, possibly stale routing information is purged from the Adj-RIB-In of the neighbour propagating errors. Whilst mechanisms such as "treat-as-withdraw" are currently documented, the proposals are limited in their scope - particularly in terms of restrictions to implementation only on eBGP sessions. This limitation is made based on the view that the BGP RIB must be consistent across an autonomous system. By implementing treat-as- withdraw for a iBGP session, one or more routers within the Autonomous System may not have reachability to a route, and hence blackholing of traffic, or routing loops, may occur. It should, however, be considered if this view is valid, in light of the manner in which BGP is utilised within operator networks. Inconsistency in a RIB based on a single UPDATE being treated as withdrawn may cause a inconsistency in a single sub-topology (e.g. Layer 3 VPN service), or a service not operating completely (in the case of an UPDATE carrying service membership information). Where a NOTIFICATION and teardown is utilised this is destructive to all sub-topologies in all address family identifiers (AFIs) carried by the session in question. Even where mechanisms such as multi-session BGP are utilised, a whole AFI is affected by such a NOTIFICATION message. In terms of routing operation, it is therefore far less costly to endure a situation where a limited sub-set of routing information within an AS is invalid, than to consider all routing information as invalid based on a single trigger. At the time of writing, error handling mechanisms related to optional, transitive attributes - such as [I-D.ietf-idr-optional-transitive] are restricted to handling only a subset of attribute errors - whereas the operational requirement is to expand this coverage to the widest set of errors possible (i.e., all semantic errors within UPDATE messages). Additionally, where approaches applicable to a greater number of attributes are proposed (e.g., [I-D.chen-ebgp-error-handling]), these are limited to deployment in eBGP applications only, where requirements also exist in intra-domain cases. As such, it is envisaged that if extended to cover these expanded cases, these mechanisms provide a means to avoid the transmission of a NOTIFICATION message to a remote BGP speaker, based on a single erroneous message, where at all possible, and hence meet this requirement. Critical errors, including those whereby the NLRI cannot be extracted from the UPDATE message, represent cases whereby the receiving system cannot handle the error gracefully based on this mechanism. 4. Recovering RIB Consistency The recommendations described in Section 3 may result in the RIB for a topology within an AS being inconsistent across the AS' internal routers. Alternatively, where such mechanisms are deployed at an AS boundary, interconnects between two ASes may be inconsistent with each other. There are therefore risks of traffic blackholing, due to missing routing information, or forwarding loops. Whilst this is deemed an acceptable compromise in the short term, clearly, it is suboptimal. Therefore, a requirement exists to provide mechanisms by which a BGP speaker is able to recover the consistency of the Adj- RIB-In for a particular neighbour. In the general case, the consistency of the BGP RIB can be recovered by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is re-advertised. A mechanism to achieve this re-advertisement is defined within the ROUTE-REFRESH specification [RFC2918]. It is envisaged that by requesting a refresh of all NLRI advertised by a BGP speaker, any NLRI which has been withdrawn due to being contained within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH is used to directly perform a consistency check between the Adj-RIB- Out of a remote device, and the Adj-RIB-In of the local BGP speaker, a demarcation between the ROUTE-REFRESH, and normal UPDATE messages is required (in order that an "end" of the refresh can be used to identify any 'stale' NLRI) - [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which the ROUTE-REFRESH mechanism can be extended to meet this requirement. Whilst re-advertisement of the whole BGP RIB provides a means by which withdrawn NLRI can be re-advertised, there are some scaling implications that must be considered. In the case that a ROUTE- REFRESH is generated, all NLRI must be re-packed into UPDATE messages and advertised by one speaker on the BGP session, whilst the other must receive all UPDATE messages, and validate the RIB's consistency. In order to avoid the control-plane load, it is therefore a requirement to utilise targeted mechanisms where possible, rather than incurring the additional load on both the advertising and receiving speaker of building and processing UPDATEs for the entire contents of the RIB. It is envisaged that during routing inconsistencies caused by utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is aware that some routing information was not able to be processed - due to the fact that an UPDATE message was not parsed correctly. Since this mechanism (as discussed in Section 3) requires the local BGP speaker to have determined the set of NLRI for which an erroneous UPDATE message was received, it is possible to use a targeted mechanisms to re-request the specific NLRI that was contained within the erroneous UPDATE message. By re-requesting, this provides the remote BGP speaker an opportunity to re-transmit the NLRI - possibly providing an opportunity to leverage alternative methods to build the UPDATE message. Such a request requires extension to the existing BGP-4 protocol, in terms of specific UPDATE generation filters with a transient lifetime. It is envisaged that the work within [I-D.zeng-idr-one-time-prefix-orf] provides a mechanism allowing targeted elements of the Adj-RIB-In for a BGP neighbour to be recovered. It is of particular note for both means of recovering RIB consistency described that these are effective only when considering transient errors within an implementation - for instance, should an RFC interpretation error within an implementation be present, regardless of the number of times a specific UPDATE is generated, it is likely that this error condition will persist (as it may with the existing behaviour defined by [RFC4271]). For this reason, there is an requirement to consider the means by which such consistency recovery mechanisms are utilised. It is not advisable that a dynamic filter and advertisement mechanism is triggered by all error handling events due to the load this is likely to place on the neighbour receiving such a request. Where this BGP speaker is a relatively centralised device - a route reflector (as described by [RFC4456]) for example - the act of generation of UPDATE messages with such frequency is likely to cause disproportionate load. It is therefore an operational requirement of such mechanisms that means of request dampening be required by any such extension. In cases whereby the consistency of the Adj-RIB-In is to be restored (e.g., following the 'treat-as-withdraw' behaviour described in Section 3), and mechanisms such as those described herein are triggered, such a condition should be noted to an operator by means of a specific flag, SNMP trap, or other logging mechanism. In order to identify the subset of NLRI that are considered to be inconsistent, this information is of operational benefit and hence should be logged. 5. Reducing the Impact of Session Reset Even where protocol enhancements allow errors in the BGP-4 protocol to cease to trigger NOTIFICATION messages, and hence reset a BGP session, it is clear that some error conditions may not be exited. In particular, errors due to existing state, or memory structures, associated with a specific BGP session will not be handled. It is therefore important to consider how these error conditions are currently handled by the protocol. It should be noted that the following discussion and analysis considers only those NOTIFICATION messages generated in response to errors in UPDATE messages (as defined by Section 6.3 in [RFC4271]). The existing NOTIFICATION behaviour triggers a reset of all elements of the BGP-4 session, as described in Section 6 of [RFC4271]. It is expected that session teardown requires an implementation to re- initialise all structures and state required for session maintenance. Clearly, there is some utility to this requirement, as error conditions in BGP are, in general, exited from. However, this definition is responsible for the forwarding outages within networks utilising BGP for propagation of routing or service when each error is experienced. The requirement described in Section 3 is intended to reduce the cases whereby a NOTIFICATION is required, however, any mechanism implemented as a response to this requirement by definition cannot provide a session reset to the extent of that achieved by the current behaviour. In order to address this, there is a requirement for a means by which a BGP speaker can signal that an unhandled error condition in an UPDATE message occurred - requiring a session reset - yet also continue to utilise the paths advertised by the neighbour that are currently in use within the RIB. In this case, the Adj-RIB-In received from the neighbour is not considered invalid, despite a NOTIFICATION, and session reset, being required. This set of requirements is akin to those answered by the BGP Graceful Restart mechanism described in [RFC4724]. Since the operational requirement in this case is to provide a means to achieve a complete session restart without disrupting the forwarding path of those routes in use within a BGP speaker's RIB, it is expected that utilising a procedure similar to the Graceful Restart mechanism meets the error handling requirement. By responding to an error condition (repeated or otherwise) with a message indicating that an error that cannot be handled has occurred, forcing session reset, whilst retaining forwarding information within the RIB allows forwarding to all routes within a system's RIB to continue during the period in which the session restarts. It is envisaged that the additional complexity introduced by the introduction of such a mechanism can be limited by extending existing BGP messages - one such approach is proposed in [I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the restart lifetime, should an error condition not be transient - for example, should an error have occurred with the BGP process, rather than a specific of the BGP session - the remote BGP speaker is still detected as an invalid device for forwarding. In some cases, the erroneous condition may be due to corruption of the Adj-RIB-Out on the advertising BGP speaker - rather than caused by the receiving speaker's state. In these cases, where existing structures are replayed whilst performing graceful restart functionality, the error condition is not necessarily resolved. Therefore, it is recommended that during a session restart event, as described within this section, the advertising speaker purge and rebuild RIB structures, in order to resolve any corruption within these structures. It should be noted that a protocol enhancement meeting this requirement is not able to solve all error conditions - however, a complete restart of the BGP and TCP session between two BGP speakers implements an identical recovery mechanism to that which is achieved by the existing behaviour. Where an error condition such as memory or configuration corruption has occurred in a BGP implementation, it is expected that a mechanism meeting this requirement continues to detect this, by means of a bound on time for session restart to occur. Whilst there may be some consideration that packets continue to be forwarded through a device which can be in an failure mode of this nature for a longer period due to this requirement, the architecture of modern IP routers should be considered. A divided forwarding and control plane is common in many devices, as well as process separation for software-based devices - corruption of a specific protocol daemon does not necessarily imply forwarding is affected. Indeed, where forwarding behaviour of a device is affected, it is envisaged that a failure detection mechanism (be it Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) will detect such a failure in almost all cases, with the symptomatic behaviour of such a failure being an invalid UPDATE message in very few other cases. 6. Operational Toolset for Monitoring BGP A significant complexity that is introduced through the requirements defined in this document is that of monitoring BGP session status for an operator. Although the existing error handling behaviour causes a disproportionate failure, session failure is extremely visible to most operational personnel within a Network Operator due to both existing definitions of SNMP trap mechanisms for BGP, along with the forwarding impact typically caused by such a failure. By introducing mechanisms by which errors of this nature are not as visible, this is no longer the case. There is a requirement that where subsets of the RIB on a device are no longer reachable from a BGP speaker, or indeed an AS, that some visibility of this situation, alongside a mechanism to determine the cause is available to an operator. Whilst, to some extent, this can be solved by mandating a sub-requirement of each of the aforementioned requirements that a BGP speaker must log where such errors occur, and are hence handled, this does not solve all cases. In order to clarify this requirement, the example of the transmission of an erroneous Optional Transitive attribute can be considered. Since, by definition, there is no requirement for all BGP speakers to parse such an attribute, a receiving router may treat NLRI as withdrawn based on an erroneous attribute not examined by its neighbour. In this case, the upstream device or network, propagating the UPDATE, has no visibility of this error. Operationally, however, it is of interest to the upstream router operator that such invalid information was propagated. The requirement for logging of error conditions in transmitted BGP messages, which are visible to only the receiver, cannot be achieved by any existing BGP message, or capability. It is envisaged that each erroneous event should be transmitted to the remote peer - including the information as to the set of NLRI that were considered invalid. Whilst with some mechanisms this is achieved by default (for example, One-Time Prefix ORF [I-D.zeng-idr-one-time-prefix-orf] (Outbound Route Filtering) will transmit the set of routes that are required), the operator requirement is to know which routes may have been unreachable in all cases. It is envisaged that an extension to meet this requirement will allow for such information to be transmitted between peers, and hence logged. Such a mechanism may provide further utility as a either a diagnostic, or logging toolset. As such, it is possible to divide the messages that are required in order to provide further visibility into BGP for an operator. Such a division can be made both due to the required means of message transmission, alongside the criticality of each request. o Messages required to replace NOTIFICATION - In cases where the error handling mechanisms defined by [RFC4271] currently result in a NOTIFICATION message being generated, a number of the requirements detailed within this document result this message being suppressed. Despite this change, the error condition's occurrence is still of interest to an operator in order to provide both monitoring and troubleshooting capabilities, since some form of invalid data has been received on a session. It therefore considered that an implementation must generate a message both locally, and transmitted to the remote peer, based on the such a condition. Where such a message is transmitted to the remote peer, it is considered that the BGP session via which the erroneous UPDATE message was received should be used as transport to the remote peer. The information transmitted in such a message should be minimised to allow identification of the paths which were considered erroneous (i.e. restricting the information to that which is directly relevant to a network operator in the case of an error condition occurring). Any delay to convergence on the session in question is considered to be acceptable, given the suboptimal nature of the reception of invalid routing information via a BGP session. Further concerns regarding such a mechanism relate to the load generated on the BGP speaker in question, however, it must be considered that in the case of an erroneous UPDATE being received, and the 'treat-as-withdraw' mechanism being utilised, where the erroneous path is removed from the Loc-RIB, there is likely to be a requirement to generate UPDATE messages withdrawing the route from all further BGP speakers to which the prefix is advertised. The load generated by the generation of such UPDATEs is likely to be much greater than that of transmitting error information via a logging message type back to the speaker from which it was received. It is envisaged that light-weight BGP message-based signalling mechanisms such as the ADVISORY message types detailed in [I-D.ietf-idr-operational-message] provide a suitable means to satisfy this requirement. o Additional Diagnostic Capabilities for BGP - In a number of cases, there is an operational requirement to further debug erroneous BGP UPDATE messages, along with the particulars of the state of a BGP speaker. For instance, where an invalid BGP UPDATE message is transmitted between two BGP speakers, the exact format of the UPDATE message is of interest to an operator, as this information provides a clear indication of an message considered to be erroneous by the BGP speaker to which it was transmitted. In this case, it is considered of great utility that the entire UPDATE message is transmitted back to the advertising speaker, in order to allow for further debugging to occur. Whilst such information is particularly useful to an operator, it clearly provides information that is not key to protocol operation - for this reason, it is expected that some of the concerns regarding the additional complexity, and load that a BGP speaker is subjected to is not acceptable. For this reason, it is required that where mechanisms are developed to support this requirement, messages of this nature can be supported both within an existing BGP session, and via a dedicated separate session, be it BGP carrying messages such as those defined in [I-D.ietf-idr-operational-message] or a dedicated monitoring protocol akin to BMP described in [I-D.ietf-grow-bmp]. Whilst the operational requirement for such monitoring tools to allow for visibility into BGP is clearly agreed upon, the means by which such messages are transmitted between two BGP speakers is likely to be dependent upon both the positions of the speakers in question (for instances, the requirements for such a protocol may differ where a session is between two ASBRs under separate administration). The introduction of additional message types to the BGP protocol clearly introduces further complexity - and leaves room for further implementation and standardisation errors that may compromise the robustness of the BGP protocol. In addition,next-hop values are missing, zero-length, or invalid for thequeuing and scheduling ofrelevant address family. For theseBGP messages must be interleaved with the transmission ofNon-Critical errors, thekey protocol messages - such as KEEPALIVE and UPDATE packets. It is therefore a concern thatNLRI-targeted error handling requirements described in Section 4 shoulda large number of messages specifically for operational visibilitybetransmitted, this will delayfollowed. In order to maximise thetransmissionnumber ofUPDATE packets, and hence adversely affectcases whereby theend-to-end convergence time forNLRIcarried within BGP. The operational requirement for why messages are advantageous toattributes can bein-band toreliably extracted from aprotocol should also be considered. In particular, it should be noted thatreceived message, wheresuch information is to be transmitted between administrative boundariesa BGPsession represents an existing channel between the two ASes. This channel is considered to be secure insofar as the routing information, and requests sent viaspeaker supports multi-protocol extensions, thesession are considered to come from a trusted source. Since error information relates to both a particular attachment,MP_REACH_NLRI andis key to ensuring that such a session is operating as expected, it is considered of great operational benefit that this information is transmitted over this channel. In addition, the overall system scalability is improved by such in-band transmission. It is expected that erroneous information resulting in the 'treat-as- withdraw' mechanism beingMP_UNREACH_NLRI attributes SHOULD be utilisedis relatively infrequently transmitted between two peers (when compared to the frequency of UPDATE messages transmission). The impact of including an additional BGP message type for such operational visibility is relatively small from a resource utilisation perspective - additional processing overhead is only experienced when such a message is received. Where a separate session is maintained, particular network elements within a service provider topology may require hundreds, or thousands, of additional sessionsforthe transmission of this information. Such an resource consumption overhead is likely toall address families (including IPv4 Unicast) and these attributes should beunacceptablethe first attribute contained within the UPDATE message. Where attributes are introduced by future extensions tosome network operators. Forthereasons explained above, it is expectedBGP protocol the error handling behaviour applied MUST be assumed thatmechanisms specifiedapplied tomeet the requirements for event visibility considerNon-Critical errors, unless otherwise specified within therelative impacts of additional monitoring sessions,per-extension memo, ormessage inclusion in bandthe attribute relates directly to carrying NLRI. Authors of future BGPin order not to compromiseextensions SHOULD specify thesecurity, scalability and robustnesserror handling behaviour required for new attributes in terms of theBGP-4 protocol. 7. Operational Complexities Introduced by Altering RFC4271 The existing NOTIFICATION and subsequent teardown ofclassification into aBGP session upon encountering anCritical or Non-Critical errorhas the advantage thaton aconsistent approach toper- attribute error basis. 4. Error Handling for Non-Critical Errors 4.1. NLRI-level Error Handling Requirements When a Non-Critical errorhandlingisrequired of all implementations ofdetected within an UPDATE message a BGP speaker MUST NOT send a NOTIFICATION message to theBGP-4 protocol. This is of operational advantageremote neighbour. Instead, the NLRI contained within the message MUST be considered asit providesno longer viable until they are updated by aclear expectation ofsubsequent UPDATE message, thus treating the NLRI as withdrawn as per the treat-as- withdraw mechanism described in [I-D.chen-ebgp-error-handling]. Network operators SHOULD recognise that where such behaviour is implemented black-holing or looping of traffic may occur in theprotocol. The requirements defined herein add further complexity toperiod between theerror-handling within BGP,NLRI being treated as withdrawn, andhence are liable to compromisesubsequent updates, dependent upon theexisting deterministic protocol behaviour.routing topology. Itis therefore deemedSHOULD be noted thatthere is a further requirement to define a set of recommended behaviours based on the receptionsuch periods of RIB inconsistency (where one speaker has advertised aparticular class of erroneous UPDATE message, alongside highlighting some ofprefix, which has been treated as withdrawn by the receiving speaker) may be relatively long lived, based on situations such as an erroneous implementationcomplexities that may need to be handled inat thecase that particular recommendations madereceiver, or the error occurring withinthis memo are deployed. Utilisingan optional, transitive attribute not examined by theclassesadvertising device. In order to allow operators to select sessions on which this risk oferroneous UPDATE message described in Section 2, the recommended behaviour for a BGP-4inconsistency is acceptable, an implementation SHOULD provide means by which NLRI-level error handling for Non-Critical errors can bedivided into two branches. Primarily, wheredisabled on asemantic error is identified, an implementation is expected to utiliseper-session basis. Since thereduced- impactNon-Critical error handlingapproach, as described in Section 3. In the case that such an approachrequired within this section results inknown NLRIno NOTIFICATION message beingwithdrawn fromtransmitted, theBGP speaker's RIB, and an implementation provides functionality suchfact thatthese errors are recovered from through an automatically triggered means, such as those described within Section 4, some consideration of the scalability of these recovery mechanisms is required. Clearly, there isancomputationalerror has occurred andbandwidth overhead associated with the re-advertisement of NLRIhence there may be inconsistency betweentwothe local and remote BGPspeakers - both duespeaker MUST be flagged to thegeneration of UPDATE messages, their transmission betweennetwork operator through standard operational interfaces (e.g., SNMP, syslog). The information highlighted MUST include thetwo speakers, andNLRI identified to be contained within theparsingerror message, andprocessing intoSHOULD contain a exact copy of theRIB required. This overhead is directly proportional toreceived message for further analysis. In order that thenumberoperator of the BGP speaker from whom an erroneous UPDATEmessagesmessage has been advertised is aware of the fact thatare required. Wheresome NLRI advertised to the remote speaker have been considered withdrawn due to being contained within an erroneous UPDATE, asemanticBGP speaker SHOULD support mechanisms to report the occurrence of Non-Critical erroris experienced, by definitionhandling to the remote speaker. The receiving speaker SHOULD transmit the NLRI contained within theUPDATE can be extracted. It is therefore possibleerroneous message tominimisetheproportionadvertising speaker. An exact copy of theRIB thatreceived UPDATE message SHOULD also be sent. The exchange of information related to events occurring as a result of BGP messages isre-advertisednot currently supported bytargetinganyrecovery mechanism onextension to theNLRI containedprotocol. Clearly, where the two speakers reside within theerroneous UPDATE. Such a targeted mechanismsame administrative domain, shared logging infrastructure can beachieved through a means such as One-Time ORF, or other meansutilised to identify the root cause oftargeting UPDATE messages not discussederrors, however, in many cases neighbouring BGP speakers reside within separate administrative domains (e.g., are ASBRs for Internet or private networks). In thismemo. It is recommended that where available, any automatic (or manual) triggered recovery mechanism behaviour utilises such targeted means in preferencecase, mechanisms allowing transmission in-band toany whole RIB refresh mechanism (such as ROUTE-REFRESH). Inthecase thatBGP session SHOULD be utilised (e.g., the OPERATIONAL message described in [I-D.ietf-idr-operational-message]). Such anerroneous UPDATE has been processed throughin-band channel is preferred based on the BGP session representing ameans such as treat-as-withdraw (describedpre-established trusted channel which is related to a specific BGP-speaking device withinSection 3),arecovering mechanism may be considered superfluous, if the assumptionnetwork. It ismadeexpected that theRIB inconsistency will only be recovered from based onoverall system scalability of apath re-convergence (or change inBGPattribute) forspeaker is improved through utilising theadvertising BGP speaker.existing channel, rather than incurring overhead for maintaining many additional logging-specific protocol sessions for relatively infrequent messaging events when errors occur. However,where this assumption is not consideredthe extensions providing such a channel MUST consider their impact toprovide adequate recovery behaviour,base BGP protocol functions such as the transmission of UPDATE or KEEPALIVE messages, anda mechanismSHOULD limit the volume of messaging to direct reactions torestore RIB consistency automatically is implemented, some consideration mustNon-Critical errors occurring. These considerations SHOULD be madefor where repeated erroneous messages occur. In this case,in order tolimit the impactensure that no compromise is made to the security, scalability and robustness of BGP. Where additional BGPspeaker's network operation, at a pre-defined point it is recommendedmonitoring information thatsuch automatic recoveryis not suitable to be carried in-band is required, out-of-band mechanismstowardssuch as theBGP speaker from whichBMP protocol described in [I-D.ietf-grow-bmp] could be utilised to provide further information relating to erroneousUPDATEs are repeatedly received are suppressed,messages. 4.2. Recovering RIB Consistency following NLRI-level Error Handling Following NLRI being treated as withdrawn due to Non-Critical error handling, inconsistencies exist between the Adj-RIB-Out of the advertising BGP speaker, and thefactAdj-RIB-In of the receiving device. These inconsistencies may result in forwarding loops or blackholing of traffic in some routing topologies. In order to ensure that suchsuppression has occurred is highlightedcases can be recovered from a means by which a validation and recovery of consistency can be achieved SHOULD be provided to an operator.The point at which such behaviour is suppressed is toThis function may bedefined on a per-implementation basis, taking into account feedback fromprovided through enhancing theNetwork Operator community based onROUTE- REFRESH [RFC2918] mechanism to add means to identify thedeploymentbeginning and end ofthe recommendations described in this document. It is expected that such trigger points are dependent upon the mechanisms implemented foraparticular BGP-4 implementations, andreplay of theimpact uponentire Adj-RIB-Out of the advertising speakerof these means of RIB recovery. Where critical errors are experienced, such that a session reset is required,(as per themechanism discussedsuggestion inSection 5 should be used. Again, since such[I-D.ietf-idr-bgp-enhanced-route-refresh]). As Non-Critical error handling is localised to the NLRI contained within the erroneous UPDATE message, a targeted recovery mechanismresults inMAY be provided allowing arestartspeaker to request re-advertisement of aBGP session, it expected that all NLRI carried over the session is re-advertised as it is re-established, incurring processing overhead on bothparticular subset of theadvertising and receiving BGP speaker. In orderAdj-RIB-Out. Where such targeted refresh functions are available, they SHOULD be preferred tominimise the consumptionmechanisms requesting re-advertisement ofcontrol-plane computational resourcethe whole Adj-RIB-Out based onboth speakers, it is recommended that mechanisms allowing a reduced settheir more limited use of CPU and network resources. A BGPUPDATE messages to be re-transmitted between two speakers are employed wherever possible - for instance through employingspeaker may automatically trigger recovery mechanisms such as those described in[I-D.ietf-idr-enhanced-gr]. In the case that repeated critical errors occur,this section following theoverheadreceipt ofperforming any mechanism implemented based on the requirements in Section 5 is incurred following eachan erroneous UPDATEmessage. Since these mechanisms are, by definition, performed automatically in response to the erroneousmessagebeing received similar considerationsidentified as Non-Critical tothe impact to the BGP speaker mustexpedite recovery. It should betaken into account. As such, it is expectednoted thatafter a certainif automatic recovery mechanisms triggerlevel, the ongoing receiptonly re-advertisement ofcritical errors within BGP UPDATE messages is deemedan identical erroneous message, they are likely to beindicative ofineffective. Additionally, where the best-path to be advertised by remote speaker changes, this will be advertised directly, without along-lasting failure, andrequirement for asession no longer considered viable.request from the receiver. However, in some cases, RIB consistency recovery mechanisms may prompt alternate UPDATE message packing, and hence allow quicker recovery. Where suchan case is experienced, it is expected that the BGP session revertsmechanisms are implemented, mechanisms focused to smaller sets of NLRI SHOULD be preferred over those requesting thestandard session failure behaviour, as described in [RFC4271]entire RIB. In addition, such mechanisms SHOULD have dampening mechanisms to ensure that their impact to computational anddocuments updating this base standard.network resources is limited. 5. Error Handling for Critical Errors Wheresuchan UPDATE message containing areversionCritical error isimplemented this condition shouldreceived, since the NLRI cannot beflaggedextracted, error handling mechanisms must be applied at the per-session level. In order toan network operator. The number of restart attempts beforelimit thesession revertsimpact tobeing shut down shouldnetwork operation, these session-level mechanisms MUST bedetermined based onapplied in a manner which allows theoverhead ofpaths NLRI received from therecovery mechanisms implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is implemented,remote speaker to continue to be utilised for forwarding during theimpact ofsessionrestart may be significantly lower),reset andoperational experience of the deployment of the recommendations described in this document. Since repeated erroneous UPDATE messages which experience critical errors may be indicative of long-lasting failure modes, itre-establishment. It isrecommendedenvisaged thata back-off from restarting BGP sessions experiencing such behaviour is implemented. As such,thisis not applicable to restart behaviourrequirement may be met throughmeans such as those described in Section 5 since such restarts are time-bound based on the period for whichextension of theAdj-RIB-In from a BGP speaker is maintained as valid (e.g., when consideringBGP GracefulRestart, such restarts are time-bound by theRestartTime described in [RFC4724]). However, following a session reverting to being pulled down based on repeated error conditions, it is recommended that following restart attempts are subjectmechanism ([RFC4724]) toan exponentially increasing interval between subsequent attempts. It is therefore recommended that in such cases an implementation implementsbe triggered by NOTIFICATION messages indicating theincreasing valuesoccurrence ofIdleHoldTimer as described in the BGP-4 FSM documented in [RFC4271]. 7.1. Reducing the Network Impacta Critical error. Such an extension allows a restart ofSession Teardown As discussed within the preceding section, where repeated critical UPDATE message errors are received, it is recommended thattheimpactTCP and BGP sessions between two speakers, in a similar manner to theboth advertising and receiving BGP-4 speakers be limitedcurrent session restart behaviour triggered byrevertinga NOTIFICATION message. In order totearingmaximise theBGP-4 session experiencing such errors down. The BGP-4 specification presented in [RFC4271] achieveslevel of re-initialisation which occurs during such asession shutdownrestart triggered bysendingaNOTIFICATION message, however, this has the net result that all downstreamCritical error, BGP speakers(i.e. thoseMAY re-initialise memory structures related towhomtheroutes carried overAdj-RIB-In and Adj-RIB-Out associated with thenow ceased BGPsession on which the erroneous UPDATE wasreadvertised) must withdraw this route from their RIB, and performobserved. Where such abest-path selection if required. In some cases, there mayrestart event occurs, the continued liveliness of the remote device MAY beno alternate path available, and henceverified by BGP KEEPALIVE packets or other OAM functions such as Bidirectional Forwarding Detection ([RFC5880]). In cases where the observed Critical BGP error is indicative of aperiodwider device failure oftime for which no valid BGP route exists. Particularly, thisthe remote speaker, it isvery likely to occur where an upstreamexpected that a BGP sessions will not re-establish correctly. Each BGP speakerperforms a best-path selection and advertises onlySHOULD maintain asingle path to its neighbours - therelimited time window in which session restart is expected in order to mitigate this possibility. When arequirement forCritical error occurs, theupstreamnetwork operator MUST be made aware of its occurrence through local logging mechanisms (e.g., SNMP traps or syslog). The BGP speakerto performreceiving an UPDATE message identified as abest-path selection,Critical error MUST log its occurrence andre-advertiseanew setcopy ofNLRI beforethedownstream systemUPDATE message. Where a inter-device messaging mechanism isable to converge toimplemented (as discussed in Section Section 4.1) anew path. It should be noted that where UPDATE messages withdrawing NLRI are not subjectcopy of the erroneous UPDATE message SHOULD be transmitted to the remote speaker. Both BGPsession's configured MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements are, thisspeakers MUST indicate to an operator the cause of a session restart was a Critical error in an UPDATE message. Since repeated critical errors (and session restarts) mayresulthave an impact in overall device scaling if the failure condition is not resolved by session restart, a BGP speakerbeing without a path for a period upMAY choose tothe MRAI. Clearly, it is advantageousrevert toavoid this period of time for which there maythe session tear down behaviour described in the base BGP specification. This reversion SHOULD only beno reachability forutilised after asetnumber ofroutes, especially sinceattempts which SHOULD be controllable by theBGP speaker terminatingnetwork operator. Where aparticularsession isdoing so due toshut down, the implementation MAY utilise aparticular error handling policy. The graceful shutdown mechanism detailedback- off from session restart attempts (as per the IdleHoldTimer described in[I-D.ietf-grow-bgp-gshut] provides a mechanism by which athe BGPspeaker is ableFSM [RFC4271]). Where reversion tosignal thattearing down the BGP session is performed, asetspeaker SHOULD limit the impact ofroutes are to be withdrawn, and hence allowwithdrawing prefixes from downstreamsystems to pre-emptively perform a best-path selection, and hence advertise new reachability information in a make-before-break manner.speakers where possible. It istherefore envisaged,envisaged thatwhere a session is tothis can beshutdown, based onachieved by utilising atrigger relating to erroneous UPDATE messages being received (be they repeated or not) thatmechanism such as thegraceful shutdownBGP Graceful Shutdown procedurein utilised, soasto reduce the forwarding impact of routes received on the session being withdrawn. 8.described in [I-D.ietf-grow-bgp-gshut]. 6. IANA Considerations This memo includes no request to IANA.9.7. Security Considerations The requirements outlined in this document provide mechanismsbywhicherroneous BGP messages may be responded to with limitedlimit the overall impact of the response toforwarding operation.an error in a BGP UPDATE message. This is of benefit to the security of a BGPspeaker in general. Wherespeaker. Without these mechanisms, where erroneous UPDATE messagesmay have been propagated byrelating to a singlemalicious Autonomous System or router within a network (or the Internet default free zone - DFZ), which are thenNLRI entry can be propagated toall devices within the same routing domain,a BGP speaker, all other NLRIavailable overcarried via the same sessionbecome unreachable.are affected by the resulting session tear-down. Thismechanismmayprovide means by whichresult in anAutonomous System can beAS being isolated fromrequiredparticular routing domains (such as theInternet),Internet) shouldthe relevantan UPDATEmessagesmessage be propagated via targeted specific paths.ByIt is envisaged by reducing the impact ofsuch failures, it is envisaged that this possibility maythe reaction of the receiving speaker to these messages, the isolation can be constrained toaspecificsetsets of NLRI, or a specific topology.SomeA number of the mechanisms meeting the requirements specifiedin this document, particularly thosewithinSection 6the document (particularly those relating to operational monitoring) mayprovideraise further securityconcerns, however, it is envisaged that these areconcerns. Such concerns will be addressedin per- enhancement memos. 10.during the specification of these mechanisms. 8. Acknowledgements The author would like to thank the following network operators for their insight, and valuable inputininto defining the requirements for a variety ofoperationaldeployments of theBGP-4 protocol;BGP protocol: Shane Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Scholl and Ilya Varlashkin. In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert Raszuk for their expertise relating to implementations of theBGP-4BGP protocol.11.9. References11.1.9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006.[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006.[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, January 2007.[RFC4760] Bates, T., Chandra, R., Katz, D.,[RFC4761] Kompella, K. and Y. Rekhter,"Multiprotocol Extensions"Virtual Private LAN Service (VPLS) Using BGP forBGP-4",Auto-Discovery and Signaling", RFC4760,4761, January 2007.11.2.[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, June 2010. 9.2. Informational References [I-D.chen-ebgp-error-handling] Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP Updates from External Neighbors", draft-chen-ebgp-error-handling-01 (work in progress), September 2011. [I-D.ietf-grow-bgp-gshut] Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. Filsfils, "Graceful BGP session shutdown",draft-ietf-grow-bgp-gshut-03draft-ietf-grow-bgp-gshut-04 (work in progress),December 2011.October 2012. [I-D.ietf-grow-bmp] Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring Protocol",draft-ietf-grow-bmp-06draft-ietf-grow-bmp-07 (work in progress),December 2011.October 2012. [I-D.ietf-idr-bgp-enhanced-route-refresh] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced Route Refresh Capability for BGP-4",draft-ietf-idr-bgp-enhanced-route-refresh-02 (work in progress), June 2012. [I-D.ietf-idr-bgp-gr-notification] Patel, K., Fernando, R., and J. Scudder, "Notification Message support for BGP Graceful Restart", draft-ietf-idr-bgp-gr-notification-00draft-ietf-idr-bgp-enhanced-route-refresh-03 (work in progress), December2011. [I-D.ietf-idr-enhanced-gr] Patel, K., Chen, E., Fernando, R., and J. Scudder, "Accelerated Routing Convergence for BGP Graceful Restart", draft-ietf-idr-enhanced-gr-01 (work in progress), June2012. [I-D.ietf-idr-operational-message] Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL Message", draft-ietf-idr-operational-message-00 (work in progress), March 2012.[I-D.ietf-idr-optional-transitive] Scudder, J., Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", draft-ietf-idr-optional-transitive-04 (work in progress), October 2011. [I-D.zeng-idr-one-time-prefix-orf] Zeng, Q., Dong, J., Heitz, J., Patel, K., Shakir, R., and Z. Huang, "One-time Address-Prefix Based Outbound Route Filter for BGP-4", draft-zeng-idr-one-time-prefix-orf-02 (work in progress), July 2012. [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 2010.Author's Address Rob Shakir BT ppC3LC3L, BT Centre 81, Newgate Street London EC1A 7AJ UK Email: rob.shakir@bt.com URI: http://www.bt.com/