--- 1/draft-ietf-grow-ops-reqs-for-bgp-error-handling-04.txt 2012-07-30 21:14:11.109425588 +0200 +++ 2/draft-ietf-grow-ops-reqs-for-bgp-error-handling-05.txt 2012-07-30 21:14:11.161425752 +0200 @@ -1,18 +1,18 @@ Internet Engineering Task Force R. Shakir Internet-Draft BT -Intended status: Informational June 6, 2012 -Expires: December 8, 2012 +Intended status: Informational July 30, 2012 +Expires: January 31, 2013 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 - draft-ietf-grow-ops-reqs-for-bgp-error-handling-04 + draft-ietf-grow-ops-reqs-for-bgp-error-handling-05 Abstract BGP-4 is utilised as a key intra- and inter-Autonomous System routing protocol in modern IP networks. The failure modes as defined by the original protocol standards are based on a number of assumptions around the impact of session failure. Numerous incidents both in the global Internet routing table and within Service Provider networks have been caused by strict handling of a single invalid UPDATE message causing large-scale failures in one or more Autonomous @@ -34,21 +34,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on December 8, 2012. + This Internet-Draft will expire on January 31, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -56,38 +56,38 @@ to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 1.2. Overview of Operator Requirements for BGP-4 Error - Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 - 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 6 - 2.1. Classifying BGP Errors and Expected Error Handling . . . . 7 - 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 8 - 2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 8 - 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 10 - 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 12 - 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 14 - 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 16 - 7. Operational Complexities Introduced by Altering RFC4271 . . . 20 - 7.1. Reducing the Network Impact of Session Teardown . . . . . 22 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 - 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 - 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 - 11.1. Normative References . . . . . . . . . . . . . . . . . . . 27 - 11.2. Informational References . . . . . . . . . . . . . . . . . 27 - Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 + Handling . . . . . . . . . . . . . . . . . . . . . . . . . 5 + 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 7 + 2.1. Classifying BGP Errors and Expected Error Handling . . . . 8 + 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 9 + 2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 9 + 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 11 + 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 13 + 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 15 + 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 17 + 7. Operational Complexities Introduced by Altering RFC4271 . . . 21 + 7.1. Reducing the Network Impact of Session Teardown . . . . . 23 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 + 9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 + 11.1. Normative References . . . . . . . . . . . . . . . . . . . 28 + 11.2. Informational References . . . . . . . . . . . . . . . . . 28 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 30 1. Introduction Where BGP-4 [RFC4271] is deployed in the Internet and Service Provider networks, numerous incidents have been recorded due to the manner in which [RFC4271] specifies errors in routing information should be handled. Whilst the behaviour defined in the existing standards retains utility, the deployments of the protocol have changed within modern networks, resulting in significantly different demands for protocol robustness. Whilst a number of Internet Drafts @@ -109,54 +109,80 @@ ASes. In this case, it is the expectation of session resilience for those adjacencies that are key to routing continuity (for example, it is expected that two networks peering via BGP would connect multiple times in order to safeguard equipment or protocol failure). In addition, there is some expectation of multiple paths to a particular NLRI being available - it would be expected that a network can fall back to utilising alternate, less direct, paths where a failure of a more direct path occurs. Traditional network architectures would deploy an Interior Gateway - Protocol (IGP) to carry infrastructure and customer prefixes, with an + Protocol (IGP) to carry infrastructure and customer routes, with an Exterior Gateway Protocol (EGP) such as BGP being utilised to - propagate these prefixes to other Autonomous Systems. However, with + propagate these routes to other Autonomous Systems. However, with the growth of IP-based services, this is no longer considered best practice. In order to ensure that convergence is within acceptable time bounds, the amount of routing information carried within the IGP is significantly reduced - and tends to be only infrastructure - prefixes. iBGP is then utilised to propagate both customer, and - external prefixes within an AS. As such, BGP has become an IGP, with + routes. iBGP is then utilised to propagate both customer, and + external routes within an AS. As such, BGP has become an IGP, with traditional IGPs acting as a means by which to propagate the routing information which is required to establish a BGP session, and reach the egress node within the local routing domain. This change in role presents different requirements for the robustness of BGP as a routing protocol - with the expectation of similar level of robustness to that of an IGP being set. Along with this change in role, the nature of the IP routing information that is carried has changed. BGP has become a ubiquitous means by which service information can be propagated between devices. For instance, BGP is utilised to carry routing information for IP/ MPLS VPN services as described in [RFC4364]. Since there is an existing deployment of the protocol between PE devices in numerous networks, it has been adapted to propagate this routing information, - as its use limits number of routing protocols required on each + as its use limits the number of routing protocols required on each device. This additional information being propagated represents a large change in requirement for the error handling of the protocol - where session failure occurs, it is likely a complete service outage for at least a subset of a network's customers is experienced where an erroneous packet may have occurred within a different sub-topology or even service (a different address family for example). For this reason, there is a significant demand to avoid service affecting failures that may be triggered by routing information within a single sub-topology or service. + The combination of the increased number of deployments of BGP-4 as an + intra-AS routing protocol, its use for the propagation of additional + types of routing and service information, and the growth of IP + services has resulted in a substantial increase in the volume of + information carried within BGP-4. In numerous networks, RIB sizes of + the order of millions of entries exist within individual BGP + speakers, with particularly high-scale points exhibited at BGP + speakers performing aggregation or functionality designed improve + utilisation of network resources (e.g., route reflector hierarchies). + Clearly an increase in the amount routing information carried in BGP + results in greater impact to services during failures, which is only + amplified by a corresponding increase in recovery times. Following a + failure, there is a substantial recovery time to learn, compute and + distribute new paths, which results in a greater observed impact to + services affected, and hence adds further weight to the requirement + to avoid failures altogether or, at least, mitigate their impact to + the narrowest scope possible, (e.g., a specific NLRI). Whilst an + argument could be made that convergence time of BGP-4 could + potentially be reduced through deployment of additional computational + resource, it is notable that solution is not necessarily + straightforward from an implementation or deployment perspective, + (e.g., scaling computation resources within a single address-family + is difficult). Thus, significant challenges continue to exist for + operators when scaling BGP-4 deployments, and hence mechanisms which + improve the scalability of BGP-4 are very important. + Both within Internet and multi-service routing architectures, a number of BGP sessions propagate a large proportion of the required routing information for network operation. For Internet routing, these are typically BGP sessions which propagate the global routing table to an AS - failure of these sessions may have a large impact on network service, based on a single erroneous update. In an multi- service environment, typical deployments utilise a small number of core-facing BGP sessions, typically towards route reflector devices. Failure of these sessions may also result in a large impact to network operation. Clearly, the avoidance of conditions requiring @@ -179,32 +205,32 @@ It is the intention of this document to define a set of criteria for the manner in which a revised error handling mechanism in BGP-4 is required to conform. The motivation for the definition of these requirements can be summarised based on certain behaviour currently present in the protocol that is not deemed acceptable within current operational deployments, or where there is a short-fall in the tool set available to an operator. These key requirements can be summarised as follows: o It is unacceptable within modern deployments of the BGP-4 protocol - that a single erroneous UPDATE packet affects prefixes that it - does not carry. This requirement therefore requires some - modification to the means by which erroneous UPDATE packets are - handled, and reacted to - with a particular focus on avoiding the - use of the NOTIFICATION message. + that a single erroneous UPDATE packet affects routes that it does + not carry. This requirement therefore requires some modification + to the means by which erroneous UPDATE packets are handled, and + reacted to - with a particular focus on avoiding the use of the + NOTIFICATION message. o It is recognised that some error conditions may occur within the BGP-4 protocol may not always be handled gracefully, and may result in conditions whereby an implementation cannot recover. In these (and similar) cases, it is undesirable for an operator that this reset of the BGP-4 session results in interruption to - forwarding packets (by means of withdrawing prefixes installed by + forwarding packets (by means of withdrawing routes installed by BGP-4 into a device's RIB, and subsequently FIB). To this end, there is a requirement to define a session reset mechanism which provides session re-initialisation in a non-destructive manner. o Further to the requirements to provide a more robust protocol, the current visibility into error conditions within the BGP-4 protocol is extremely limited - where further modifications to this behaviour are to be made, complexity is likely to be added. Thus, to ensure that BGP-4 is manageable, there are requirements for mechanisms by which the protocol can be examined and monitored. @@ -223,33 +249,34 @@ applicability of enhanced error handling mechanisms, it is possible to divide these errors into a number of sub-classes, particularly focusing around the location of the error within the UPDATE message. Where an UPDATE message is considered invalid by a BGP speaker due to an error within a path attribute that is not the NLRI (where the definition of NLRI includes reachability information encoded in the MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in [RFC4760]) it is a requirement of any enhanced error handling mechanism to handle the error in a manner focused on the NLRI - contained within the message. Since in this case, the message - received from the remote peer is syntactically valid, it is - considered that such an UPDATE is indicative of erroneous data within - a path attribute. The impact of the current behaviour defined within - the protocol makes the implication that the BGP speaker from whom the - message is received is now an invalid path for all NLRI announced via - the session - which results in a disproportionate impact to overall - network operation. In particular scenarios (such as networks with - centralised BGP route reflection) such action can result in a loss of - all reachability to a network. In other contexts (such as the - Internet DFZ), it cannot be assumed that the BGP speaker from whom - the UPDATE message is received is directly responsible for the - erroneous information contained within the message. + contained within the message found to be erroneous. Since in this + case, the message received from the remote peer is syntactically + valid, it is considered that such an UPDATE is indicative of + erroneous data within one or more path attributes. The impact of the + current behaviour defined within the protocol makes the implication + that the BGP speaker from whom the message is received is now an + invalid path for all NLRI announced via the session - which results + in a disproportionate impact to overall network operation. In + particular scenarios (such as networks with centralised BGP route + reflection) such action can result in a loss of all reachability to a + network. In other contexts (such as the Internet DFZ), it cannot be + assumed that the BGP speaker from whom the UPDATE message is received + is directly responsible for the erroneous information contained + within the message. Two further error cases exist within UPDATE messages, both of which are related to the mechanisms that are applicable to messages received where some difficulty exists in parsing the entire BGP message. The two cases concern those cases where a valid NLRI attribute can be extracted, and those where such an attribute is not able to be parsed. In these cases, errors in the packing of attributes within a BGP message may have occurred. Such errors are likely indicative of an error specifically caused by the remote BGP speaker. It is, however, desirable to an operator that such errors @@ -378,66 +405,75 @@ provided an UPDATE marking it as withdrawn. This results in a limit in the propagation of the invalid routing information, whilst also ensuring that no traffic is forwarded via a previously-known path that may no longer be valid. This mechanism is referred to as "treat-as-withdraw". Whilst this behaviour results in avoiding a NOTIFICATION message, keeping other routing information advertised by the remote BGP speaker within the RIB, it may result in unreachability for a sub-set of the NLRI advertised by the remote speaker. Two cases should be - considered - that where the entry for a prefix in the Adj-RIB-In of + considered - that where the entry for a route in the Adj-RIB-In of the neighbour propagating an erroneous packet is utilised, and that - where the prefix installed in the device's RIB is learnt from another + where the route installed in the device's RIB is learnt from another BGP speaker. In the former case, should the identified NLRI not be treated as withdrawn, the original NLRI is utilised within the global RIB. However, this information is potentially now invalid (i.e. it no longer provides a valid forwarding path), whilst an alternate (valid) path may exist in another Adj-RIB-In. By continuing to utilise the NLRI for which the UPDATE was considered invalid, traffic may be forwarded via an invalid path, resulting in routing loops, or black-holing. In the second case, no impact to the forwarding of traffic, or global RIB, is incurred, yet where treat-as-withdraw is implemented, possibly stale routing information is purged from the Adj-RIB-In of the neighbour propagating errors. Whilst mechanisms such as "treat-as-withdraw" are currently documented, the proposals are limited in their scope - particularly in terms of restrictions to implementation only on eBGP sessions. This limitation is made based on the view that the BGP RIB must be consistent across an autonomous system. By implementing treat-as- withdraw for a iBGP session, one or more routers within the - Autonomous System may not have reachability to a prefix, and hence + Autonomous System may not have reachability to a route, and hence blackholing of traffic, or routing loops, may occur. It should, however, be considered if this view is valid, in light of the manner in which BGP is utilised within operator networks. Inconsistency in a RIB based on a single UPDATE being treated as withdrawn may cause a inconsistency in a single sub-topology (e.g. Layer 3 VPN service), or a service not operating completely (in the case of an UPDATE carrying service membership information). Where a NOTIFICATION and teardown is utilised this is destructive to all sub-topologies in all address family identifiers (AFIs) carried by the session in question. Even where mechanisms such as multi-session BGP are utilised, a whole AFI is affected by such a NOTIFICATION message. In terms of routing operation, it is therefore far less costly to endure a situation where a limited sub-set of routing information within an AS is invalid, than to consider all routing information as invalid based on a single trigger. - It is considered that, if extended to cover iBGP, the mechanisms - described in [I-D.chen-ebgp-error-handling] and - [I-D.ietf-idr-optional-transitive] provide a means to avoid the - transmission of a NOTIFICATION to a remote BGP speaker based on a - single erroneous message, where at all possible, and hence meet this - requirement. The failure cases whereby NLRI cannot be extracted from - the UPDATE message represent a case whereby the receiving system - cannot handle the error gracefully based on this mechanism. + At the time of writing, error handling mechanisms related to + optional, transitive attributes - such as + [I-D.ietf-idr-optional-transitive] are restricted to handling only a + subset of attribute errors - whereas the operational requirement is + to expand this coverage to the widest set of errors possible (i.e., + all semantic errors within UPDATE messages). Additionally, where + approaches applicable to a greater number of attributes are proposed + (e.g., [I-D.chen-ebgp-error-handling]), these are limited to + deployment in eBGP applications only, where requirements also exist + in intra-domain cases. As such, it is envisaged that if extended to + cover these expanded cases, these mechanisms provide a means to avoid + the transmission of a NOTIFICATION message to a remote BGP speaker, + based on a single erroneous message, where at all possible, and hence + meet this requirement. Critical errors, including those whereby the + NLRI cannot be extracted from the UPDATE message, represent cases + whereby the receiving system cannot handle the error gracefully based + on this mechanism. 4. Recovering RIB Consistency The recommendations described in Section 3 may result in the RIB for a topology within an AS being inconsistent across the AS' internal routers. Alternatively, where such mechanisms are deployed at an AS boundary, interconnects between two ASes may be inconsistent with each other. There are therefore risks of traffic blackholing, due to missing routing information, or forwarding loops. Whilst this is deemed an acceptable compromise in the short term, clearly, it is @@ -459,57 +495,71 @@ identify any 'stale' NLRI) - [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which the ROUTE-REFRESH mechanism can be extended to meet this requirement. Whilst re-advertisement of the whole BGP RIB provides a means by which withdrawn NLRI can be re-advertised, there are some scaling implications that must be considered. In the case that a ROUTE- REFRESH is generated, all NLRI must be re-packed into UPDATE messages and advertised by one speaker on the BGP session, whilst the other must receive all UPDATE messages, and validate the RIB's consistency. - Clearly, it is advantageous to avoid this work where possible. + In order to avoid the control-plane load, it is therefore a + requirement to utilise targeted mechanisms where possible, rather + than incurring the additional load on both the advertising and + receiving speaker of building and processing UPDATEs for the entire + contents of the RIB. It is envisaged that during routing inconsistencies caused by utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is aware that some routing information was not able to be processed - due to the fact that an UPDATE message was not parsed correctly. Since this mechanism (as discussed in Section 3) requires the local BGP speaker to have determined the set of NLRI for which an erroneous UPDATE message was received, it is possible to use a targeted mechanisms to re-request the specific NLRI that was contained within the erroneous UPDATE message. By re-requesting, this provides the remote BGP speaker an opportunity to re-transmit the NLRI - possibly providing an opportunity to leverage alternative methods to build the UPDATE message. Such a request requires extension to the existing BGP-4 protocol, in terms of specific UPDATE generation filters with a transient lifetime. It is envisaged that the work within - [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted - elements of the Adj-RIB-In for a BGP neighbour to be recovered. + [I-D.zeng-idr-one-time-prefix-orf] provides a mechanism allowing + targeted elements of the Adj-RIB-In for a BGP neighbour to be + recovered. It is of particular note for both means of recovering RIB consistency - described that these are effective only when considering transitive + described that these are effective only when considering transient errors within an implementation - for instance, should an RFC interpretation error within an implementation be present, regardless of the number of times a specific UPDATE is generated, it is likely that this error condition will persist (as it may with the existing behaviour defined by [RFC4271]). For this reason, there is an requirement to consider the means by which such consistency recovery - mechanisms are utilised. It is not advisable that a transitive - filter and advertisement mechanism is triggered by all error handling - events due to the load this is likely to place on the neighbour - receiving such a request. Where this BGP speaker is a relatively - centralised device - a route reflector (as described by [RFC4456]) - for example - the act of generation of UPDATE messages with such - frequency is likely to cause disproportionate load. It is therefore - an operational requirement of such mechanisms that means of request + mechanisms are utilised. It is not advisable that a dynamic filter + and advertisement mechanism is triggered by all error handling events + due to the load this is likely to place on the neighbour receiving + such a request. Where this BGP speaker is a relatively centralised + device - a route reflector (as described by [RFC4456]) for example - + the act of generation of UPDATE messages with such frequency is + likely to cause disproportionate load. It is therefore an + operational requirement of such mechanisms that means of request dampening be required by any such extension. + In cases whereby the consistency of the Adj-RIB-In is to be restored + (e.g., following the 'treat-as-withdraw' behaviour described in + Section 3), and mechanisms such as those described herein are + triggered, such a condition should be noted to an operator by means + of a specific flag, SNMP trap, or other logging mechanism. In order + to identify the subset of NLRI that are considered to be + inconsistent, this information is of operational benefit and hence + should be logged. + 5. Reducing the Impact of Session Reset Even where protocol enhancements allow errors in the BGP-4 protocol to cease to trigger NOTIFICATION messages, and hence reset a BGP session, it is clear that some error conditions may not be exited. In particular, errors due to existing state, or memory structures, associated with a specific BGP session will not be handled. It is therefore important to consider how these error conditions are currently handled by the protocol. It should be noted that the following discussion and analysis considers only those NOTIFICATION @@ -533,38 +583,48 @@ In order to address this, there is a requirement for a means by which a BGP speaker can signal that an unhandled error condition in an UPDATE message occurred - requiring a session reset - yet also continue to utilise the paths advertised by the neighbour that are currently in use within the RIB. In this case, the Adj-RIB-In received from the neighbour is not considered invalid, despite a NOTIFICATION, and session reset, being required. This set of requirements is akin to those answered by the BGP Graceful Restart mechanism described in [RFC4724]. Since the operational requirement in this case is to provide a means to achieve a complete session - restart without disrupting the forwarding path of those prefixes in - use within a BGP speaker's RIB, it is expected that utilising a - procedure similar to the Graceful Restart mechanism meets the error - handling requirement. By responding to an error condition (repeated - or otherwise) with a message indicating that an error that cannot be + restart without disrupting the forwarding path of those routes in use + within a BGP speaker's RIB, it is expected that utilising a procedure + similar to the Graceful Restart mechanism meets the error handling + requirement. By responding to an error condition (repeated or + otherwise) with a message indicating that an error that cannot be handled has occurred, forcing session reset, whilst retaining - forwarding information within the RIB allows forwarding to all - prefixes within a system's RIB to continue during the period in which - the session restarts. It is envisaged that the additional complexity + forwarding information within the RIB allows forwarding to all routes + within a system's RIB to continue during the period in which the + session restarts. It is envisaged that the additional complexity introduced by the introduction of such a mechanism can be limited by extending existing BGP messages - one such approach is proposed in [I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the restart lifetime, should an error condition not be transient - for example, should an error have occurred with the BGP process, rather than a specific of the BGP session - the remote BGP speaker is still detected as an invalid device for forwarding. + In some cases, the erroneous condition may be due to corruption of + the Adj-RIB-Out on the advertising BGP speaker - rather than caused + by the receiving speaker's state. In these cases, where existing + structures are replayed whilst performing graceful restart + functionality, the error condition is not necessarily resolved. + Therefore, it is recommended that during a session restart event, as + described within this section, the advertising speaker purge and + rebuild RIB structures, in order to resolve any corruption within + these structures. + It should be noted that a protocol enhancement meeting this requirement is not able to solve all error conditions - however, a complete restart of the BGP and TCP session between two BGP speakers implements an identical recovery mechanism to that which is achieved by the existing behaviour. Where an error condition such as memory or configuration corruption has occurred in a BGP implementation, it is expected that a mechanism meeting this requirement continues to detect this, by means of a bound on time for session restart to occur. Whilst there may be some consideration that packets continue to be forwarded through a device which can be in an failure mode of @@ -606,26 +666,26 @@ the UPDATE, has no visibility of this error. Operationally, however, it is of interest to the upstream router operator that such invalid information was propagated. The requirement for logging of error conditions in transmitted BGP messages, which are visible to only the receiver, cannot be achieved by any existing BGP message, or capability. It is envisaged that each erroneous event should be transmitted to the remote peer - including the information as to the set of NLRI that were considered invalid. Whilst with some mechanisms this is achieved by default - (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf] - (Outbound Route Filtering) will transmit the set of prefixes that are - required), the operator requirement is to know which prefixes may - have been unreachable in all cases. It is envisaged that an - extension to meet this requirement will allow for such information to - be transmitted between peers, and hence logged. Such a mechanism may + (for example, One-Time Prefix ORF [I-D.zeng-idr-one-time-prefix-orf] + (Outbound Route Filtering) will transmit the set of routes that are + required), the operator requirement is to know which routes may have + been unreachable in all cases. It is envisaged that an extension to + meet this requirement will allow for such information to be + transmitted between peers, and hence logged. Such a mechanism may provide further utility as a either a diagnostic, or logging toolset. As such, it is possible to divide the messages that are required in order to provide further visibility into BGP for an operator. Such a division can be made both due to the required means of message transmission, alongside the criticality of each request. o Messages required to replace NOTIFICATION - In cases where the error handling mechanisms defined by [RFC4271] currently result in a NOTIFICATION message being generated, a number of the @@ -645,21 +705,21 @@ that which is directly relevant to a network operator in the case of an error condition occurring). Any delay to convergence on the session in question is considered to be acceptable, given the suboptimal nature of the reception of invalid routing information via a BGP session. Further concerns regarding such a mechanism relate to the load generated on the BGP speaker in question, however, it must be considered that in the case of an erroneous UPDATE being received, and the 'treat-as-withdraw' mechanism being utilised, where the erroneous path is removed from the Loc-RIB, there is likely to be a requirement to generate UPDATE messages - withdrawing the prefix from all further BGP speakers to which the + withdrawing the route from all further BGP speakers to which the prefix is advertised. The load generated by the generation of such UPDATEs is likely to be much greater than that of transmitting error information via a logging message type back to the speaker from which it was received. It is envisaged that light-weight BGP message-based signalling mechanisms such as the ADVISORY message types detailed in [I-D.ietf-idr-operational-message] provide a suitable means to satisfy this requirement. o Additional Diagnostic Capabilities for BGP - In a number of cases, @@ -698,40 +758,40 @@ scheduling of these BGP messages must be interleaved with the transmission of the key protocol messages - such as KEEPALIVE and UPDATE packets. It is therefore a concern that should a large number of messages specifically for operational visibility be transmitted, this will delay the transmission of UPDATE packets, and hence adversely affect the end-to-end convergence time for NLRI carried within BGP. The operational requirement for why messages are advantageous to be in-band to a protocol should also be considered. In particular, it should be noted that where such information is to be transmitted between administrative boundaries a BGP session - represents an existing channel exists between the two ASes. This - channel is considered to be secure insofar as the routing - information, and requests sent via the session are considered to come - from a trusted source. Since error information relates to both a - particular attachment, and is key to ensuring that such a session is - operating as expected, it is considered of great operational benefit - that this information is transmitted over this channel. In addition, - the overall system scalability is improved by such in-band - transmission. It is expected that erroneous information resulting in - the 'treat-as-withdraw' mechanism being utilised is relatively - infrequently transmitted between two peers (when compared to the - frequency of UPDATE messages transmission). The impact of including - an additional BGP message type for such operational visibility is - relatively small from a resource utilisation perspective - additional - processing overhead is only experienced when such a message is - received. Where a separate session is maintained, particular network - elements within a service provider topology may require hundreds, or - thousands, of additional sessions for the transmission of this - information. Such an resource consumption overhead is likely to be - unacceptable to some network operators. + represents an existing channel between the two ASes. This channel is + considered to be secure insofar as the routing information, and + requests sent via the session are considered to come from a trusted + source. Since error information relates to both a particular + attachment, and is key to ensuring that such a session is operating + as expected, it is considered of great operational benefit that this + information is transmitted over this channel. In addition, the + overall system scalability is improved by such in-band transmission. + It is expected that erroneous information resulting in the 'treat-as- + withdraw' mechanism being utilised is relatively infrequently + transmitted between two peers (when compared to the frequency of + UPDATE messages transmission). The impact of including an additional + BGP message type for such operational visibility is relatively small + from a resource utilisation perspective - additional processing + overhead is only experienced when such a message is received. Where + a separate session is maintained, particular network elements within + a service provider topology may require hundreds, or thousands, of + additional sessions for the transmission of this information. Such + an resource consumption overhead is likely to be unacceptable to some + network operators. For the reasons explained above, it is expected that mechanisms specified to meet the requirements for event visibility consider the relative impacts of additional monitoring sessions, or message inclusion in band to BGP in order not to compromise the security, scalability and robustness of the BGP-4 protocol. 7. Operational Complexities Introduced by Altering RFC4271 The existing NOTIFICATION and subsequent teardown of a BGP session @@ -848,50 +908,50 @@ 7.1. Reducing the Network Impact of Session Teardown As discussed within the preceding section, where repeated critical UPDATE message errors are received, it is recommended that the impact to the both advertising and receiving BGP-4 speakers be limited by reverting to tearing the BGP-4 session experiencing such errors down. The BGP-4 specification presented in [RFC4271] achieves such a session shutdown by sending a NOTIFICATION message, however, this has the net result that all downstream BGP speakers (i.e. those to whom - the NLRI carried over the now ceased BGP session was readvertised) - must withdraw this NLRI from their RIB, and perform a best-path + the routes carried over the now ceased BGP session was readvertised) + must withdraw this route from their RIB, and perform a best-path selection if required. In some cases, there may be no alternate path - being available, and hence a period of time for which no valid BGP - route exists. Particularly, this is very likely to occur where an - upstream BGP speaker performs a best-path selection and advertises - only a single path to its neighbours - there is a requirement for the + available, and hence a period of time for which no valid BGP route + exists. Particularly, this is very likely to occur where an upstream + BGP speaker performs a best-path selection and advertises only a + single path to its neighbours - there is a requirement for the upstream speaker to perform a best-path selection, and re-advertise a new set of NLRI before the downstream system is able to converge to a new path. It should be noted that where UPDATE messages withdrawing NLRI are not subject to the BGP session's configured MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements are, this may result in a BGP speaker being without a path for a period up to the MRAI. Clearly, it is advantageous to avoid this period of time for which - there may be no reachability for a set of NLRI, especially since the - BGP speaker terminating a particular session is doing so due to a + there may be no reachability for a set of routes, especially since + the BGP speaker terminating a particular session is doing so due to a particular error handling policy. The graceful shutdown mechanism detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a - BGP speaker is able to signal that a set of NLRI is to be withdrawn, - and hence allow downstream systems to pre-emptively perform a best- - path selection, and hence advertise new reachability information in a - make-before-break manner. + BGP speaker is able to signal that a set of routes are to be + withdrawn, and hence allow downstream systems to pre-emptively + perform a best-path selection, and hence advertise new reachability + information in a make-before-break manner. It is therefore envisaged, that where a session is to be shutdown, based on a trigger relating to erroneous UPDATE messages being received (be they repeated or not) that the graceful shutdown - procedure in utilised, so as to reduce the forwarding impact of NLRI - received on the session being withdrawn. + procedure in utilised, so as to reduce the forwarding impact of + routes received on the session being withdrawn. 8. IANA Considerations This memo includes no request to IANA. 9. Security Considerations The requirements outlined in this document provide mechanisms by which erroneous BGP messages may be responded to with limited impact to forwarding operation. This is of benefit to the security of a BGP @@ -968,51 +1028,51 @@ December 2011. [I-D.ietf-grow-bmp] Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring Protocol", draft-ietf-grow-bmp-06 (work in progress), December 2011. [I-D.ietf-idr-bgp-enhanced-route-refresh] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced Route Refresh Capability for BGP-4", - draft-ietf-idr-bgp-enhanced-route-refresh-01 (work in - progress), December 2011. + draft-ietf-idr-bgp-enhanced-route-refresh-02 (work in + progress), June 2012. [I-D.ietf-idr-bgp-gr-notification] Patel, K., Fernando, R., and J. Scudder, "Notification Message support for BGP Graceful Restart", draft-ietf-idr-bgp-gr-notification-00 (work in progress), December 2011. [I-D.ietf-idr-enhanced-gr] Patel, K., Chen, E., Fernando, R., and J. Scudder, "Accelerated Routing Convergence for BGP Graceful - Restart", draft-ietf-idr-enhanced-gr-00 (work in - progress), December 2011. + Restart", draft-ietf-idr-enhanced-gr-01 (work in + progress), June 2012. [I-D.ietf-idr-operational-message] Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL Message", draft-ietf-idr-operational-message-00 (work in progress), March 2012. [I-D.ietf-idr-optional-transitive] Scudder, J., Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", draft-ietf-idr-optional-transitive-04 (work in progress), October 2011. - [I-D.zeng-one-time-prefix-orf] - Zeng, Q. and J. Dong, "One-time Address-Prefix Based - Outbound Route Filter for BGP-4", - draft-zeng-one-time-prefix-orf-01 (work in progress), - October 2010. + [I-D.zeng-idr-one-time-prefix-orf] + Zeng, Q., Dong, J., Heitz, J., Patel, K., Shakir, R., and + Z. Huang, "One-time Address-Prefix Based Outbound Route + Filter for BGP-4", draft-zeng-idr-one-time-prefix-orf-02 + (work in progress), July 2012. [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 2010. Author's Address Rob Shakir BT pp C3L