Internet Engineering Task Force R. Shakir Internet-Draft C&W Intended status: InformationalApril 15,June 28, 2011 Expires:October 17,December 30, 2011 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4draft-ietf-grow-ops-reqs-for-bgp-error-handling-00draft-ietf-grow-ops-reqs-for-bgp-error-handling-01 Abstract BGP-4 is utilised as a key intra- and inter-Autonomous System routing protocol in modern IP networks. The failure modes as defined by the original protocol standards are based on a number of assumptions around the impact of session failure. Numerous incidents both in the global Internet routing table and within Service Provider networks have been caused by strict handling of a single invalid UPDATE message causing large-scale failures in one or more Autonomous Systems. This memo describes the current use of BGP-4 within Service Provider networks, and outlines a set of requirements for further work to enhance the mechanisms available to a BGP-4 implementation when erroneous data is detected. Whilst this document does not provide specification of any standard, it is intended as an overview of a set of enhancements to BGP-4 to improve the protocol's robustness to suit its current deployment. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onOctober 17,December 30, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 1.2. Overview of Operator Requirements for BGP-4 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 6 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . .6 3.8 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . .8 4.10 5. Reducing the Impact of Session Reset . . . . . . . . . . . . .10 5.12 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . .12 6.14 7. Operational Complexities Introduced by Altering RFC4271 . . .14 7.18 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . .17 8.21 9. Security Considerations . . . . . . . . . . . . . . . . . . .18 9.22 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .19 10.23 11. References . . . . . . . . . . . . . . . . . . . . . . . . . .20 10.1.24 11.1. Normative References . . . . . . . . . . . . . . . . . . .20 10.2.24 11.2. Informational References . . . . . . . . . . . . . . . . .2125 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . .2226 1. Introduction Where BGP-4 [RFC4271] is deployed in the Internet and Service Provider networks, numerous incidents have been recorded due to the manner in which [RFC4271] specifies errors in routing information should be handled. Whilst the behaviour defined in the existing standards retains utility, the deployments of the protocol have changed within modern networks, resulting in significantly different demands for protocol robustness. Whilst a number of Internet Drafts have been written to begin to enhance the behaviour of BGP-4 in terms of the handling of erroneous messages, thisdraftmemo intends to define a set of requirements for ongoing work. These requirements are considered from the perspective of a Network Operator, and hence this draft does not intend to define the protocol mechanisms by which such error handling behaviour is to be implemented. 1.1. Role of BGP-4 in Service Provider Networks BGP was designed as an inter-Autonomous System (AS) routing protocol and hence many of the error handling mechanisms within the protocol specification are designed to be conducive to this role. In general, this consideration as an inter-AS routing propagation mechanism results in the view that a BGP session propagates a relatively small amount of network-layer reachability information (NLRI) between two ASes. In this case, it is the expectation of session resilience for those adjacencies that are key to routing continuity (for example, it is expected that two networks peering via BGP would connect multiple times in order to safeguard equipment or protocol failure). In addition, there is some expectation of multiple paths to a particular NLRI being available - it would be expected that a network can fall back to utilising alternate, less direct, paths where a failure of a more direct path occurs. Traditional network architectures would deploy an Interior Gateway Protocol (IGP) to carry infrastructure and customer prefixes, with an Exterior Gateway Protocol (EGP) such as BGP being utilised to propagate these prefixes to other Autonomous Systems. However, with the growth of IP-based services, this is no longer considered best practice. In order to ensure that convergence is within acceptable time bounds, the amount of routing information carried within the IGP is significantly reduced - and tends to be only infrastructure prefixes. iBGP is then utilised to propagate both customer, and external prefixes within an AS. As such, BGP has become an IGP, with traditional IGPs acting as a means by which to propagate the routing information which is required to establish a BGP session, and reach the egress node within the local routing domain. This change in role presents different requirements for the robustness of BGP as a routing protocol - with the expectation of similar level of robustness to that of an IGP being set. Along with this change in role, the nature of the IP routing information that is carried has changed. BGP has become a ubiquitous means by which service information can be propagated between devices. For instance, BGP is utilised to carry routing information for IP/ MPLS VPN services as described in [RFC4364]. Since there is an existing deployment of the protocol between PE devices in numerous networks, it has been adapted to propagate this routing information, as its use limits number of routing protocols required on each device. This additional information being propagated represents a large change in requirement for the error handling of the protocol - where session failure occurs, it is likely a complete service outage for at least a subset of a network's customers is experienced where an erroneous packet may have occurred within a different sub-topology or even service (a different address family for example). For this reason, there is a significant demand to avoid service affecting failures that may be triggered by routing information within a single sub-topology or service. Both within Internet and multi-service routing architectures, a number of BGP sessions propagate a large proportion of the required routing information for network operation. For Internet routing, these are typically BGP sessions which propagate the global routing table to an AS - failure of these sessions may have a large impact on network service, based on a single erroneous update. In an multi- service environment, typical deployments utilise a small number of core-facing BGP sessions, typically towards route reflector devices. Failure of these sessions may also result in a large impact to network operation. Clearly, the avoidance of conditions requiring these sessions to fail is of great utility to any network operator, and provides further motivation for the revision of the existing behaviour. Whilst the behaviour in [RFC4271] is suited to ensuring that BGP messages with erroneous routing information in are limited in scope (by means of session reset), with the above considerations, it is clear that this mechanism is not suited to all deployments. It should, however, be noted that the change in scope affects the handling only of errors occurring after BGP session establishment. There is no current operational requirement to amend the means by which error handling in session establishment, or liveliness detection, are performed. 1.2. Overview of Operator Requirements for BGP-4 Error Handling It is the intention of this document to define a set of criteria for the manner in which a revised error handling mechanism in BGP-4 is required to conform. The motivation for the definition of these requirements can be summarised based on certain behaviour currently present in the protocol that is not deemed acceptable within current operational deployments, or where there is a short-fall in the tool set available to an operator. These key requirements can be summarised as follows: o It is unacceptable within modern deployments of the BGP-4 protocol that a single erroneous UPDATE packet affects prefixes that it does not carry. This requirement therefore requires some modification to the means by which erroneous UPDATE packets are handled, and reacted to - with a particular focus on avoiding the use of the NOTIFICATION message. o It is recognised that some error conditions may occur within the BGP-4 protocol may not always be handled gracefully, and may result in conditions whereby an implementation cannot recover. In these (and similar) cases, it is unacceptable for an operator that this reset of the BGP-4 session results in interruption to forwarding packets (by means of withdrawing prefixes installed by BGP-4 into a device's RIB, and subsequently FIB). To this end, there is a requirement to define a session reset mechanism which provides session re-initialisation in a non-destructive manner. o Further to the requirements to provide a more robust protocol, the current visibility into error conditions within the BGP-4 protocol is extremely limited - where further modifications to this behaviour are to be made, complexity is likely to be added. Thus, to ensure that BGP-4 is manageable, there are requirements for mechanisms by which the protocol can be examined and monitored. This document describes each of these requirements in further depth, along with an overview of means by which they are expected to be achieved. In addition, the mechanism by which the enhancements meeting these requirements are to interact is discussed. 2.Avoiding useErrors within BGP-4 UPDATE Messages Both through analysis ofNOTIFICATION Theincidents occurring with the Internet DFZ, and multi-service environments utilising BGP-4 to signal service or routing information, a number of different classes of errors within BGP-4 UPDATE messages have been observed. In order to consider the applicability of enhanced error handlingbehaviour defined in RFC4271mechanisms, it isproblematic duepossible to divide these errors into a number of sub-classes, particularly focusing around thelimited options that are available to an implementation. Whenlocation of the error within the UPDATE message. Where anerroneous BGPUPDATE message isreceived, atconsidered invalid by a BGP speaker due to an error within a path attribute that is not thecurrent time,NLRI (where theimplementation must either ignoredefinition of NLRI includes reachability information encoded in theerror, or send a NOTIFICATION message, after whichMP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in [RFC4760]) it ismandatory to terminate the BGP session. It is apparent that thisa requirementis at odds with thatofprotocol robustness. There is significant complexity to this requirement. Theany enhanced error handling mechanismdefinedto handle the error in[I-D.chen-ebgp-error-handling] describesameans by which no NOTIFICATION message is generated for all cases whereby NLRI can be extracted from an UPDATE. Themanner focused on the NLRI contained within theerroneous UPDATEmessage. Since in this case, the messageis considered as thoughreceived from the remoteBGP speaker has provided an UPDATE markingpeer is syntactically valid, itas withdrawn. This results in a limit in the propagation of the invalid routing information, whilst also ensuringis considered thatno trafficsuch an UPDATE isforwarded viaindicative of erroneous data within apreviously-knownpaththat may no longer be valid. This mechanism is referred toattribute - as"treat-as-withdraw". Whilst this behaviour results in avoiding a NOTIFICATION message, keeping other routing information advertised bysuch, it cannot be assumed that theremoteBGP speakerwithinfrom whom theRIB, it may result in unreachabilitymessage was received is directly responsible fora sub-set ofthe erroneous information - and hence affecting all NLRIadvertised by the remote speaker.received via a specific session is disproportionate. Two further error casesshould be considered -exist within UPDATE messages, both of which are related to the mechanisms that are applicable to messages received where some difficulty exists in parsing theentry forentire BGP message. The two cases concern those cases where aprefixvalid NLRI attribute can be extracted, and those where such an attribute is not able to be parsed. In these cases, errors in theAdj-RIB-Inpacking of attributes within a BGP message may have occurred. Such errors are likely indicative of an error specifically caused by theneighbour propagatingremote BGP speaker. It is, however, desirable to anerroneous packet is utilised, andoperator thatwheresuch errors are handled without affecting all NLRI across a BGP session. As such, there is a key requirement to maximise theprefix installednumber of cases inthe device's RIBwhich it islearntpossible to extract NLRI fromanothera BGPspeaker. In the former case, shouldUPDATE message. To this end, it is required that where possible theidentified NLRI not be treated as withdrawn, the originalMP_REACH and MP_UNREACH attributes are utilised for encoding all NLRI (including IPv4 Unicast), and that this attribute isutilised withinincluded as theglobal RIB. However,first attribute of a BGP UPDATE message (as originally recommended in [I-D.chen-ebgp-error-handling]). Such a change to the order of inclusion of this attribute maximises the number of cases in which NLRI can be extracted from an UPDATE. Where thisinformationispotentially now invalid (i.e.possible, itno longer provides a valid forwarding path), whilst an alternate (valid) path may exist in another Adj-RIB-In. By continuingis again required that the error handling mechanisms utilised should be directly applied toutilisethe NLRIfor whichincluded in theUPDATE was considered invalid, traffic mayUPDATE. For all cases whereby NLRI can beforwarded viaobtained from aninvalid path, resultingUPDATE message, it is expected that the requirements outlined inrouting loops, or black-holing.Section 3 should be considered by any enhancement to the BGP-4 protocol. In thesecond case, no impactcase that it is not possible to completely parse theforwarding of traffic, or global RIB, is incurred, yet where treat-as-withdraw is implemented, possibly stale routing information is purgedNLRI attribute from theAdj-RIB-InUPDATE message received from a peer, it is extremely likely that this is indicative of a serious error with either theneighbour propagating errors. Whilst mechanisms such as "treat-as-withdraw" are currently documented,process of attribute packing, or buffer usage on theproposals areremote BGP speaker. In this case, clearly, it is not possible to apply any error handling mechanism that is limitedin their scope - particularly in terms of restrictionsto a specific set of NLRI, since an implementationonly on eBGP sessions. This limitation is made based onhas no knowledge of theview that the BGP RIB must be consistent across an autonomous system. By implementing treat-as- withdraw for a iBGP session, one or more routersNLRI included within theAutonomous System may not have reachabilityUPDATE message. In addition, such errors are considered toa prefix, and hence blackholing of traffic, or routing loops, may occur. It should, however,beconsidered if this view is valid, in light ofrelatively fundamental to themanner in which BGP is utilised within operator networks. Inconsistency in a RIB based onoperation of asingle UPDATE being treated as withdrawnBGP implementation, and hence maycauseindicate ainconsistencycase whereby significant system errors have occurred. The current BGP-4 standard results in asingle sub-topology (e.g. Layer 3 VPN service), orBGP speaker restarting aservice not operating completely (insession with thecase ofremote BGP speaker. However where such anUPDATE carrying service membership information). Whereerror does occur, it is required that aNOTIFICATION and teardowngraceful mechanism is utilised to provide a lower impact to network operation. The requirements for enhancements of thisis destructivenature toall sub-topologiesBGP-4 are outlined inall address family identifiers (AFIs) carried bySection 5, with thesession in question. Even where mechanisms such as multi-session BGP are utilised,requirements outlined therein focused on providing awhole AFI is affectedmeans bysuch a NOTIFICATION message. In termswhich system integrity can be restored whilst allowing for continued network operation. 3. Avoiding use ofrouting operation, itNOTIFICATION The error handling behaviour defined in RFC4271 istherefore far less costlyproblematic due toendure a situation where athe limitedsub-set of routing information withinoptions that are available to anASimplementation. When an erroneous BGP message isinvalid, than to consider all routing information as invalid based on a single trigger. It is considered that, if extended to cover iBGP,received, at themechanisms described in [I-D.chen-ebgp-error-handling] and [I-D.ietf-idr-optional-transitive] provide a means to avoidcurrent time, thetransmission ofimplementation must either ignore the error, or send a NOTIFICATION message, after which it is mandatory toa remoteterminate the BGPspeaker based on a single erroneous message, wheresession. It is apparent that this requirement is atall possible, and hence meetodds with that of protocol robustness. There is significant complexity to this requirement. Thefailuremechanism defined in [I-D.chen-ebgp-error-handling] describes a means by which no NOTIFICATION message is generated for all cases whereby NLRIcannotcan be extracted from an UPDATE. The NLRI contained within the erroneous UPDATE messagerepresent a case whereby the receiving system cannot handleis considered as though theerror gracefully based on this mechanism. 3. Recovering RIB Consistency The recommendations describedremote BGP speaker has provided an UPDATE marking it as withdrawn. This results inSection 2 may resulta limit in theRIB for a topology within an AS being inconsistent acrosspropagation of theAS' internal routers. Alternatively, where such mechanisms are deployed at an AS boundary, interconnects between two ASesinvalid routing information, whilst also ensuring that no traffic is forwarded via a previously-known path that may no longer beinconsistent with each other. There are therefore risks of traffic blackholing, duevalid. This mechanism is referred tomissing routing information, or forwarding loops.as "treat-as-withdraw". Whilst thisis deemed an acceptable compromisebehaviour results inthe short term, clearly, it is suboptimal. Therefore,avoiding arequirement exists to provide mechanismsNOTIFICATION message, keeping other routing information advertised bywhich athe remote BGP speakeris able to recoverwithin theconsistencyRIB, it may result in unreachability for a sub-set of theAdj- RIB-InNLRI advertised by the remote speaker. Two cases should be considered - that where the entry for aparticular neighbour. Itprefix in the Adj-RIB-In of the neighbour propagating an erroneous packet isenvisagedutilised, and thatduring such routing inconsistencies,where thelocal BGP speakerprefix installed in the device's RIB isaware that some routing information was not able to be processed - due tolearnt from another BGP speaker. In thefact that an UPDATE message wasformer case, should the identified NLRI notparsed correctly. Ifbe treated as withdrawn, the'treat-as-withdraw' mechanism describedoriginal NLRI is utilised withinSection 2the global RIB. However, this information isutilised,potentially now invalid (i.e. itis also possible for the local BGP speakerno longer provides a valid forwarding path), whilst an alternate (valid) path may exist in another Adj-RIB-In. By continuing tohave determinedutilise theset ofNLRI for whichan erroneousthe UPDATEmessagewasreceived.considered invalid, traffic may be forwarded via an invalid path, resulting in routing loops, or black-holing. Inthis scenario, by utilising targeted mechanismsthe second case, no impact tore-requestthespecific NLRI that was unreachable, thisforwarding of traffic, or global RIB, is incurred, yet where treat-as-withdraw is implemented, possibly stale routing informationcan be re-transmittedis purged from theremote BGP speaker. Such a request requires extension toAdj-RIB-In of theexisting BGP-4 protocol,neighbour propagating errors. Whilst mechanisms such as "treat-as-withdraw" are currently documented, the proposals are limited in their scope - particularly in terms ofspecific UPDATE generation filters with a transient lifetime. Itrestrictions to implementation only on eBGP sessions. This limitation isenvisaged thatmade based on thework within [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted elements ofview that theAdj-RIB-In for aBGPneighbour toRIB must berecovered. In additionconsistent across an autonomous system. By implementing treat-as- withdraw for a iBGP session, one or more routers within the Autonomous System may not have reachability tosuch cases where specifica prefix, and hence blackholing of traffic, or routinginformation is known toloops, may occur. It should, however, beerroneous, the more general case where either a large amountconsidered if this view is valid, in light of theAdj-RIB-Inmanner in which BGP iscontainedutilised within operator networks. Inconsistency in a RIB based on a single UPDATEmessages subject to treat-as- withdraw,being treated as withdrawn may cause a inconsistency in a single sub-topology (e.g. Layer 3 VPN service), or a service not operating completely (in thespecific prefixes are unknown to the local BGP speaker must be considered. Incase of an UPDATE carrying service membership information). Where a NOTIFICATION and teardown is utilised thiscase, thereisa requirement for a BGP speakerdestructive tore-request the entire RIB advertisedall sub-topologies in all address family identifiers (AFIs) carried bya remote neighbour. In this case,the session in question. Even where mechanisms suchre-advertisementas multi-session BGP are utilised, a whole AFI isrequired,affected by such a NOTIFICATION message. In terms of routing operation, it isenvisaged thattherefore far less costly to endure aROUTE-REFRESH as per the description in [RFC2918] is utilised. [I-D.keyur-bgp-enhanced-route-refresh] providessituation where ameans by which the ROUTE-REFRESH mechanism can belimited sub-set of routing information within an AS is invalid, than to consider all routing information as invalid based on a single trigger. It is considered that, if extended to cover iBGP, the mechanisms described inorder[I-D.chen-ebgp-error-handling] and [I-D.ietf-idr-optional-transitive] provide a means to avoid the transmission of a NOTIFICATION to a remote BGP speaker based on a single erroneous message, where at all possible, and hence meet this requirement.It is of particular note for both means of recoveringThe failure cases whereby NLRI cannot be extracted from the UPDATE message represent a case whereby the receiving system cannot handle the error gracefully based on this mechanism. 4. Recovering RIBconsistencyConsistency The recommendations describedthat these are effective only when considering transitive errors within an implementation -in Section 3 may result in the RIB forinstance, should an RFC interpretation errora topology within animplementation be present, regardless ofAS being inconsistent across thenumberAS' internal routers. Alternatively, where such mechanisms are deployed at an AS boundary, interconnects between two ASes may be inconsistent with each other. There are therefore risks oftimes a specific UPDATE is generated, it is likely that this error condition will persist. Fortraffic blackholing, due to missing routing information, or forwarding loops. Whilst thisreason, thereis deemed an acceptable compromise in the short term, clearly, it is suboptimal. Therefore, a requirement exists toconsider the meansprovide mechanisms by whichsuch consistency recovery mechanisms are utilised. It is not advisable thatatransitive filter and advertisement mechanismBGP speaker istriggered by all error handling events dueable to recover theload this is likely to place onconsistency of theneighbour receiving suchAdj- RIB-In for arequest. Where thisparticular neighbour. It is envisaged that during such routing inconsistencies, the local BGP speaker isa relatively centralised deviceaware that some routing information was not able to be processed -a route reflector (asdue to the fact that an UPDATE message was not parsed correctly. If the 'treat-as-withdraw' mechanism describedby [RFC4456])within Section 3 is utilised, it is also possible forexample -theactlocal BGP speaker to have determined the set ofgenerationNLRI for which an erroneous UPDATE message was received. In this scenario, by utilising targeted mechanisms to re-request the specific NLRI that was unreachable, this routing information can be re-transmitted from the remote BGP speaker. Such a request requires extension to the existing BGP-4 protocol, in terms of specific UPDATEmessagesgeneration filters withsuch frequency is likely to cause disproportionate load.a transient lifetime. It istherefore an operational requirement of such mechanismsenvisaged thatmeansthe work within [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted elements ofrequest dampeningthe Adj-RIB-In for a BGP neighbour to berequired by anyrecovered. In addition to suchextension. 4. Reducingcases where specific routing information is known to be erroneous, theImpact of Session Reset Evenmore general case whereprotocol enhancements allow errors ineither a large amount of theBGP-4 protocolAdj-RIB-In is contained in UPDATE messages subject toceasetreat-as- withdraw, or the specific prefixes are unknown totrigger NOTIFICATION messages, and hence reset athe local BGPsession, it is clear that some error conditions may notspeaker must beexited.considered. Inparticular,this case, there is a requirement for a BGP speaker to re-request the entire RIB advertised by a remote neighbour. In this case, where such re-advertisement is required, it is envisaged that a ROUTE-REFRESH as per the description in [RFC2918] is utilised. [I-D.keyur-bgp-enhanced-route-refresh] provides a means by which the ROUTE-REFRESH mechanism can be extended in order to meet this requirement. It is of particular note for both means of recovering RIB consistency described that these are effective only when considering transitive errors within an implementation - for instance, should an RFC interpretation error within an implementation be present, regardless of the number of times a specific UPDATE is generated, it is likely that this error condition will persist. For this reason, there is an requirement to consider the means by which such consistency recovery mechanisms are utilised. It is not advisable that a transitive filter and advertisement mechanism is triggered by all error handling events due toexisting state, or memory structures, associatedthe load this is likely to place on the neighbour receiving such a request. Where this BGP speaker is a relatively centralised device - a route reflector (as described by [RFC4456]) for example - the act of generation of UPDATE messages with such frequency is likely to cause disproportionate load. It is therefore an operational requirement of such mechanisms that means of request dampening be required by any such extension. 5. Reducing the Impact of Session Reset Even where protocol enhancements allow errors in the BGP-4 protocol to cease to trigger NOTIFICATION messages, and hence reset a BGP session, it is clear that some error conditions may not be exited. In particular, errors due to existing state, or memory structures, associated with a specific BGP session will not be handled. It is therefore important to consider how these error conditions are currently handled by the protocol. It should be noted that the following discussion and analysis considers only those NOTIFICATION messages generated in response to errors in UPDATE messages (as defined by Section 6.3 in [RFC4271]). The existing NOTIFICATION behaviour triggers a reset of all elements of the BGP-4 session, as described in Section 6 of [RFC4271]. It is expected that session teardown requires an implementation to re- initialise all structures and state required for session maintenance. Clearly, there is some utility to this requirement, as error conditions in BGP are, in general, exited from. However, this definition is responsible for the forwarding outages within networks utilising BGP for route propagation when each error is experienced. The requirement described in Section 3 is intended to reduce the cases whereby a NOTIFICATION is required, however, any mechanism implemented as a response to this requirement by definition cannot provide a session reset to the extent of that achieved by the current behaviour. In order to address this, there is a requirement for a means by which a BGP speaker can signal that an unhandled error condition in an UPDATE message occurred - requiring a session reset - yet also continue to utilise the paths advertised by the neighbour that are currently in use within the RIB. In this case, the Adj-RIB-In received from the neighbour is not considered invalid, despite a NOTIFICATION, and session reset, being required. This set of requirements is akin to those answered by the BGP Graceful Restart mechanism described in [RFC4724]. Since the operational requirement in this case is to provide a means to achieve a complete session restart without disrupting the forwarding path of those prefixes in use within a BGP speaker's RIB, it is expected that utilising a procedure similar to the Graceful Restart mechanism meets the error handling requirement. By responding to an error condition (repeated or otherwise) with a message indicating that an error that cannot be handled has occurred, forcing session reset, whilst retaining forwarding information within the RIB allows forwarding to all prefixes within a system's RIB to continue, whilst the session restarts. By placing a time bound on the restart lifetime, should an error condition not be transient - for example, should an error have occurred with the BGP process, rather than a specific of the BGP session - the remote BGP speaker is still detected as an invalid device for forwarding. It should, however, be noted that a protocol enhancement meeting this requirement is not able to solve all error conditions - however, a complete restart of the BGP and TCP session between two BGP speakers implements an identical recovery mechanism to that which is achieved by the existing behaviour. Where an error condition such as memory or configuration corruption has occurred in a BGP implementation, it is expected that a mechanism meeting this requirement continues to detect this, by means of a bound on time for session restart to occur. Whilst there may be some consideration that packets continue to be forwarded through a device which can be in an failure mode of this nature for a longer period, due to this requirement, the architecture of modern IP routers should be considered. A divided forwarding and control plane is common in many devices, as well as process separation for software-based devices - corruption of a specific protocol daemon does not necessarily imply forwarding is affected. Indeed, where forwarding behaviour of a device is affected, it is envisaged that a failure detection mechanism (be it Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) will detect such a failure in almost all cases, with the symptomatic behaviour of such aspecificfailure being an invalid UPDATE message in very few other cases. 6. Operational Toolset for Monitoring BGPsession will not be handled. It is therefore important to consider how these error conditions are currently handled by the protocol. It should be notedA significant complexity that is introduced through thefollowing discussion and analysis considers only those NOTIFICATION messages generated in response to errors in UPDATE messages (asrequirements definedby Section 6.3 in [RFC4271]). The existing NOTIFICATION behaviour triggers a reset of all elements of the BGP-4 session, as describedinSection 6 of [RFC4271]. Itthis document isexpectedthat of monitoring BGP sessionteardown requires an implementation to re- initialise all structures and state requiredstatus for an operator. Although the existing error handling behaviour causes a disproportionate failure, sessionmaintenance. Clearly, therefailure issome utilityextremely visible to most operational personnel within a Network Operator due to both existing definitions of SNMP trap mechanisms for BGP, along with the forwarding impact typically caused by such a failure. By introducing mechanisms by which errors of thisrequirement,nature are not aserror conditions in BGP are, in general, exited from. However,visible, thisdefinitionisresponsible forno longer theforwarding outages within networks utilising BGP for route propagation when each errorcase. There isexperienced. Thea requirementdescribed in Section 2 is intended to reducethat where subsets of thecases wherebyRIB on aNOTIFICATION is required, however, any mechanism implemented asdevice are no longer reachable from aresponseBGP speaker, or indeed an AS, that some mechanism to determine the cause is available to an operator. Whilst, to some extent, thisrequirementcan be solved bydefinition cannot providemandating asession reset to the extentsub-requirement of each ofthat achieved bythecurrent behaviour.aforementioned requirements that a BGP speaker must log where such errors occur, and are hence handled, this does not solve all cases. In order toaddress this,clarify this requirement, the example of the transmission of an erroneous Optional Transitive attribute can be considered. Since, by definition, there isano requirement for all BGP speakers to parse such an attribute, ameansreceiving router may treat NLRI as withdrawn based on an erroneous attribute not examined bywhich a BGP speaker can signalits neighbour. In this case, the upstream device or network, propagating the UPDATE, has no visibility of this error. Operationally, however, it is of interest to the upstream router operator thatan unhandledsuch invalid information was propagated. The requirement for logging of errorconditionconditions inan UPDATE message occurred - requiring a session reset - yet also continuetransmitted BGP messages, which are visible toutiliseonly thepaths advertisedreceiver, cannot be achieved bythe neighbourany existing BGP message, or capability. It is envisaged thatare currently in use withineach erroneous event should be transmitted to theRIB. In this case,remote peer - including theAdj-RIB-In received frominformation as to theneighbour is not considered invalid, despite a NOTIFICATION, and session reset, being required. Thisset ofrequirementsNLRI that were considered invalid. Whilst with some mechanisms this isakin to those answeredachieved by default (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] (Outbound Route Filtering) will transmit theBGP Graceful Restart mechanism described in [RFC4724]. Since the operationalset of prefixes that are required), the operator requirement is to know which prefixes may have been unreachable inthis caseall cases. It is envisaged that an extension toprovide a meansmeet this requirement will allow for such information toachievebe transmitted between peers, and hence logged. Such acomplete session restart without disrupting the forwarding path of those prefixes in use withinmechanism may provide further utility as aBGP speaker's RIB,either a diagnostic, or logging toolset. As such, it isexpectedpossible to divide the messages thatutilisingare required in order to provide further visibility into BGP for an operator. Such aprocedure similardivision can be made both due to theGraceful Restart mechanism meetsrequired means of message transmission, alongside theerror handling requirement. By respondingcriticality of each request. o Messages required toanreplace NOTIFICATION - In cases where the errorcondition (repeated or otherwise) withhandling mechanisms defined by [RFC4271] currently result in a NOTIFICATION messageindicating that anbeing generated, a number of the requirements detailed within this document result this message being suppressed. Despite this change, the errorthat cannot be handledcondition's occurrence is still of interest to an operator, since some form of invalid data hasoccurred, forcingbeen received on a sessionreset, whilst retaining forwarding information within the RIB allows forwardingin order toall prefixes withinprovide both monitoring and troubleshooting capabilities. It therefore considered that an implementation must generate asystem's RIBmessage both locally, and transmitted tocontinue, whilstthesession restarts. By placing a time boundremote peer, based on therestart lifetime, should an error condition not be transient - for example, should an error have occurred with the BGP process, rather thansuch aspecific ofcondition. Where such a message is transmitted to the remote peer, it is considered that the BGP session-via which theremote BGP speaker is still detectederroneous UPDATE message was received asan invalid device for forwarding. It should, however, be noted that a protocol enhancement meeting this requirement is not abletransport tosolve all error conditions - however,the remote peer. The information transmitted in such acomplete restartmessage should be minimised to allow identification of theBGP and TCP session between two BGP speakers implements an identical recovery mechanismpaths which were considered erroneous (i.e. restricting the information to that which isachieved bydirectly relevant to a network operator in theexisting behaviour. Wherecase of an error conditionsuch as memory or configuration corruption has occurredoccurring). Any delay to convergence on the session in question is considered to be acceptable, given the suboptimal nature of the reception of invalid routing information via a BGPimplementation, it is expected thatsession. Further concerns regarding such a mechanismmeeting this requirement continuesrelate todetect this, by means of a boundthe load generated ontime for session restart to occur. Whilst there maythe BGP speaker in question, however, it must besome considerationconsidered thatpackets continue to be forwarded through a device which can beinan failure mode of this nature for a longer period, due to this requirement,thearchitecturecase ofmodern IP routers should be considered. A divided forwardingan erroneous UPDATE being received, andcontrol planethe 'treat-as-withdraw' mechanism being utilised, where the erroneous path iscommon in many devices, as well as process separation for software-based devices - corruptionremoved from the Loc-RIB, there is likely to be a requirement to generate UPDATE messages withdrawing the prefix from all further BGP speakers to which the prefix is advertised. The load generated by the generation ofa specific protocol daemon does not necessarily imply forwardingsuch UPDATEs isaffected. Indeed, where forwarding behaviourlikely to be much greater than that of transmitting error information via adevice is affected,logging message type back to the speaker from which it was received. It is envisaged thata failure detection mechanism (be it Bidirectional Forwarding Detection, or indeedlight-weight BGPKEEPALIVE packets) will detectmessage-based signalling mechanisms such as [I-D.ietf-idr-advisory] provide afailure in almost allsuitable means to satisfy this requirement. o Additional Diagnostic Capabilities for BGP - In a number of cases, there is an operational requirement to further debug erroneous BGP UPDATE messages, along with thesymptomatic behaviourparticulars of the state ofsuchafailure beingBGP speaker. For instance, where an invalid BGP UPDATE messagein very few other cases. 5. Operational Toolset for Monitoring BGP A significant complexity that is introduced through the requirements defined in this documentisthat of monitoringtransmitted between two BGPsession status for an operator. Althoughspeakers, theexisting error handling behaviour causes a disproportionate failure, session failure is extremely visible to most operational personnel within a Network Operator due to both existing definitionsexact format ofSNMP trap mechanisms for BGP, along withtheforwarding impact typically caused by such a failure. By introducing mechanisms by which errorsUPDATE message is ofthis nature are notinterest to an operator, asvisible,thisis no longer the case. There isinformation provides arequirement that where subsetsclear indication of an message considered to be erroneous by theRIB on a device are no longer reachable from aBGPspeaker, or indeed an AS,speaker to which it was transmitted. In this case, it is considered of great utility thatsome mechanismthe entire UPDATE message is transmitted back todeterminethecauseadvertising speaker, in order to allow for further debugging to occur. Whilst such information isavailableparticularly useful to anoperator. Whilst,operator, it clearly provides information that is not key tosome extent,protocol operation - for thiscan be solved by mandating a sub-requirement of eachreason, it is expected that some of theaforementioned requirementsconcerns regarding the additional complexity, and load that a BGP speakermust logis subjected to is not acceptable. For this reason, it is required that wheresuch errors occur, andmechanisms arehence handled, this does not solve all cases. In orderdeveloped toclarifysupport this requirement,the example of the transmissionmessages ofan erroneous Optional Transitive attributethis nature can beconsidered. Since, by definition, there is no requirement for all BGP speakers to parse suchsupported both within anattribute,existing BGP session, and via areceiving router may treat NLRIdedicated separate session, be it BGP carrying messages such aswithdrawn based on an erroneous attribute not examined by its neighbour. In this case, the upstream deviceDIAGNOSTIC [I-D.raszuk-bgp-diagnostic-message] ornetwork, propagating the UPDATE, has no visibility of this error. Operationally, however, it is of interestADVISORY [I-D.ietf-idr-advisory] or a dedicated monitoring protocol akin to BMP described in [I-D.ietf-grow-bmp]. Whilst theupstream router operator that such invalid information was propagated. Theoperational requirement forlogging of error conditions in transmittedsuch monitoring tools to allow for visibility into BGPmessages,is clearly agreed upon, the means by which such messages arevisibletransmitted between two BGP speakers is likely toonly the receiver, cannotbeachieved by any existing BGP message, or capability. Itdependent upon both the positions of the speakers in question (for instances, the requirements for such a protocol may differ where a session isenvisaged that each erroneous event should be transmittedbetween two ASBRs under separate administration). The introduction of additional message types to theremote peerBGP protocol clearly introduces further complexity -includingand leaves room for further implementation and standardisation errors that may compromise theinformation as torobustness of thesetBGP protocol. In addition, the queuing and scheduling ofNLRI that were considered invalid. Whilstthese BGP messages must be interleaved withsome mechanisms this is achieved by default (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] (Outbound Route Filtering) will transmitthesettransmission ofprefixes that are required),theoperator requirement is to know which prefixes may have been unreachable in all cases.key protocol messages - such as KEEPALIVE and UPDATE packets. It isenvisagedtherefore a concern thatan extension to meetshould a large number of messages specifically for operational visibility be transmitted, thisrequirementwillallowdelay the transmission of UPDATE packets, and hence adversely affect the end-to-end convergence time for NLRI carried within BGP. The operational requirement for why messages are advantageous to be in-band to a protocol should also be considered. In particular, it should be noted that where such information is to be transmitted betweenpeers, and hence logged. Such a mechanism may provide further utility as a eitheradministrative boundaries adiagnostic, or logging toolset. It shouldBGP session represents an existing channel exists between the two ASes. This channel is considered to benoted that numerous work items withinsecure insofar as theIETF exist atrouting information, and requests sent via thetime of writing that beginsession are considered to come from a trusted source. Since error information relates tosolve this requirement. Within the IDR working groupboth[I-D.raszuk-bgp-diagnostic-message]a particular attachment, and[I-D.ietf-idr-advisory] provide mechanisms by which such information can be propagated in-bandis key toan existing BGP session. Transmittingensuring that suchdiagnostica session is operating as expected, it is considered of great operational benefit that this informationin-bandisconsideredtransmitted over this channel. In addition, theoptimal meansoverall system scalability is improved bywhich to propagate details of errors presentsuch in-band transmission. It is expected that erroneous information resulting inUPDATE messages, due tothefact that no additional protocols (and hence security and trust concerns) must be configured'treat-as-withdraw' mechanism being utilised is relatively infrequently transmitted between twoAutonomous Systems (wherepeers (when compared to theerrors occur atfrequency of UPDATE messages transmission). The impact of including anAS boundary), and the load on eachadditional BGPspeakermessage type for such operational visibility isincreased only due to an additional capability, rather than anrelatively small from a resource utilisation perspective - additionalcode base, and protocol. Clearly, any mechanism implemented in-band toprocessing overhead is only experienced when such aBGPmessage is received. Where a separate session isrequiredmaintained, particular network elements within a service provider topology may require hundreds, or thousands, of additional sessions for the transmission of this information. Such an resource consumption overhead is likely to berelatively lightweight, since the information provided overunacceptable to some network operators. For thesessionreasons explained above, it isan enhancementexpected that mechanisms specified to meet theoperationalrequirements for event visibilityofconsider theprotocol, and should not disrupt core protocol operations. Other, out-of-band, mechanisms - such as that proposedrelative impacts of additional monitoring sessions, or message inclusion in[I-D.ietf-grow-bmp] are likelyband toprovide mechanisms by which further insight into BGP operation can be achieved. The fact that such a protocol is implemented independently of theBGPprotocol resultsinfurther flexibility to provide detailed protocol data, without introducing further complexityorder not to compromise theBGP protocol itself. 6.security, scalability and robustness of the BGP-4 protocol. 7. Operational Complexities Introduced by Altering RFC4271 The existing NOTIFICATION and subsequent teardown of a BGP session upon encountering an error has the advantage that a consistent approach to error handling is required of all implementations of the BGP-4 protocol. This is of operational advantage, as it provides a clear expectation of the behaviour of the protocol. The requirements defined herein add further complexity to the error-handling within BGP, and hence are liable to compromise the existing deterministic protocol behaviour. It is therefore deemed that there is a further requirement to provide a clear method by which an erroneous UPDATE should be reacted to, in order that all protocol implementations provide a consistent means by which recovery is achieved. A further complexity is introduced due to the disparate nature of the work items altering the BGP error handling behaviour - since all items are likely to be implemented as a BGP capability [RFC5492], situations are likely to occur between devices (especially those with different BGP implementations), where some of the mechanisms referenced are unsupported. This adds further barriers to a standard definition of the BGP-4 error handling behaviour. In general, the approach considered ideal upon encountering an erroneous UPDATE message can be divided into two cases - those where the NLRI can be determined from the message, and those where it cannot be. The latter case is the simpler of the two. In this case, there is a requirement for the implementation to reset the BGP session, utilising the reduced-impact approach, described in Section4.5. In the case where the remote BGP speaker is in a transient error condition related to specific peer data structures, or state, a single instance of this behaviour is likely to exit the error condition. In the case of implementation errors, it is possible that the BGP session in question may enter a continuous loop of being reset, with a partial RIB being held by one or more of the BGP speakers due to an non-deterministic order of UPDATE propagation. It is therefore a requirement that within this reduced-impact procedure any subsequent UPDATE messages that would result in further session resets are ignored. Whilst this results in a condition where an undetermined amount of the RIB is inconsistent, partial reachability is maintained. In this case, the operational toolsets discussed in Section56 is likely to provide mechanisms by which this condition can be brought to the attention of the relevant operators. This requirement to accept a partial RIB, which results in potential invalid traffic forwarding is a direct result of the deployments of BGP-4, as described in Section 1.1. The case where NLRI can be determined from an erroneous UPDATE provides further complexities. In this case, a BGP speaker is aware of the sub-set of the RIB which have been identified as being contained within invalid UPDATE messages. This allows a local BGP speaker to re-request single prefixes, utilising a mechanism such as "one-time prefix ORF". However, a similar result is achieved by re- requesting the entire RIB - albeit with greater resource requirements. It is therefore expected that the process of recovery utilises a staged set of mechanisms to attempt to restore consistency of the RIB: 1. Where available, a mechanism capable of requesting only the NLRI determined to have been contained within a invalid UPDATE should be utilised. However, since it is possible that such an error condition can be transient in nature, it is likely that more than one request is to be transmitted (assuming the first does not return a valid UPDATE message). In order to allow a deterministic process, there is a requirement for a limit on the number of specific requests transmitted to be defined. 2. Where a specific refresh mechanism is not available, a peer should re-request the entire RIB. Again, there is a requirement to limit the number of complete RIB requests that should be sent via an implementation, in order to provide a bound both on the expected level of load a device may experience, and on the time for which the RIB may be inconsistent. 3. Finally, a session reset should be performed, as per the reduced- impact NOTIFICATION requirement defined in Section4.5. At this point, a similar challenge to that discussed above exists, should the error condition persist. In this case, as defined above, there is a requirement to ignore those UPDATE messages that continue to be erroneous. It is envisaged that where limits are required, these will be defined on a per memo-basis, or within a further revision of the requirements described herein. Whilst the approach described above provides a standard means by which error recovery may be handled on a per UPDATE basis, further complexities are raised where multiple errors occur. Clearly, following this procedure causes control-plane load on both the BGP speakers - for this reason, consideration of how repeated use of the mechanisms discussed in this document is required. It is notable that errors may not occur with UPDATE messages relating to only a single NLRI, independent errors in multiple NLRIs may be experienced. For this reason, it is required that an implementation rate limits the number of error handling events sourced towards a particular neighbour. It is expected that such rate limiting, or event suppression is achieved on a per-session basis, where state information is already held, rather than on a per-prefix basis as it is envisaged that such behaviour presents significant scaling problems, and introduces further state requirements for an implementation of the protocol. It is recommended that where a flag indicative of erroneous behaviour is implemented, the state of such a value is maintained independently of session establishment.7.8. IANA Considerations This memo includes no request to IANA.8.9. Security Considerations The requirements outlined in this document provide mechanisms by which erroneous BGP messages may be responded to with limited impact to forwarding operation. This is of benefit to the security of a BGP speaker in general. Where UPDATE messages may have been propagated by a single malicious Autonomous System or router within a network (or the Internet default free zone - DFZ), which are then propagated to all devices within the same routing domain, all other NLRI available over the same session become unreachable. This mechanism may provide means by which an Autonomous System can be isolated from required routing domains (such as the Internet), should the relevant UPDATE messages be propagated via specific paths. By reducing the impact of such failures, it is envisaged that this possibility may be constrained to a specific set of NLRI, or a specific topology. Some mechanisms meeting the requirements specified in this document, particularly those within Section56 may provide further security concerns, however, it is envisaged that these are addressed in per- enhancement memos.9.10. Acknowledgements The author would like to thank Shane Amante, Bruno Decraene, Rob Evans, David Freedman, Tom Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Scholl and Ilya Varlashkin for their review and valuable feedback.10.11. References10.1.11.1. Normative References [I-D.chen-ebgp-error-handling] Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP Updates from External Neighbors", draft-chen-ebgp-error-handling-00 (work in progress), September 2010. [I-D.ietf-grow-bmp] Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring Protocol", draft-ietf-grow-bmp-05 (work in progress), December 2010. [I-D.ietf-idr-advisory] Scholl, T., Scudder, J., Steenbergen, R., and D. Freedman, "BGP Advisory Message", draft-ietf-idr-advisory-00 (work in progress), October 2009. [I-D.ietf-idr-optional-transitive] Scudder, J. and E. Chen, "Error Handling for Optional Transitive BGP Attributes", draft-ietf-idr-optional-transitive-03 (work in progress), September 2010. [I-D.keyur-bgp-enhanced-route-refresh] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced Route Refresh Capability for BGP-4", draft-keyur-bgp-enhanced-route-refresh-02 (work in progress), March 2011. [I-D.raszuk-bgp-diagnostic-message] Raszuk, R., Chen, E., and B. Decraene, "BGP Diagnostic Message", draft-raszuk-bgp-diagnostic-message-02 (work in progress), March 2011. [I-D.zeng-one-time-prefix-orf] Zeng, Q. and J. Dong, "One-time Address-Prefix Based Outbound Route Filter for BGP-4", draft-zeng-one-time-prefix-orf-01 (work in progress), October 2010. [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006. [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, January 2007. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, January 2007. [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, February 2009.10.2.11.2. Informational References [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 2010. Author's Address Rob Shakir Cable&Wireless Worldwide London UK Email:rob.shakir@cw.comrjs@cw.net URI: http://www.cw.com/