--- 1/draft-ietf-grow-ops-reqs-for-bgp-error-handling-03.txt 2012-06-06 17:14:12.129341388 +0200 +++ 2/draft-ietf-grow-ops-reqs-for-bgp-error-handling-04.txt 2012-06-06 17:14:12.177343330 +0200 @@ -1,18 +1,18 @@ Internet Engineering Task Force R. Shakir Internet-Draft BT -Intended status: Informational March 27, 2012 -Expires: September 28, 2012 +Intended status: Informational June 6, 2012 +Expires: December 8, 2012 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 - draft-ietf-grow-ops-reqs-for-bgp-error-handling-03 + draft-ietf-grow-ops-reqs-for-bgp-error-handling-04 Abstract BGP-4 is utilised as a key intra- and inter-Autonomous System routing protocol in modern IP networks. The failure modes as defined by the original protocol standards are based on a number of assumptions around the impact of session failure. Numerous incidents both in the global Internet routing table and within Service Provider networks have been caused by strict handling of a single invalid UPDATE message causing large-scale failures in one or more Autonomous @@ -34,21 +34,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 28, 2012. + This Internet-Draft will expire on December 8, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -59,35 +59,35 @@ described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Role of BGP-4 in Service Provider Networks . . . . . . . . 3 1.2. Overview of Operator Requirements for BGP-4 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Errors within BGP-4 UPDATE Messages . . . . . . . . . . . . . 6 2.1. Classifying BGP Errors and Expected Error Handling . . . . 7 - 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 7 + 2.1.1. Critical BGP Errors . . . . . . . . . . . . . . . . . 8 2.1.2. Semantic BGP Errors . . . . . . . . . . . . . . . . . 8 - 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 9 - 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 11 - 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 13 - 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 15 - 7. Operational Complexities Introduced by Altering RFC4271 . . . 19 - 7.1. Reducing the Network Impact of Session Teardown . . . . . 21 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 - 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 - 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 - 11.1. Normative References . . . . . . . . . . . . . . . . . . . 25 - 11.2. Informational References . . . . . . . . . . . . . . . . . 25 - Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 27 + 3. Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 10 + 4. Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 12 + 5. Reducing the Impact of Session Reset . . . . . . . . . . . . . 14 + 6. Operational Toolset for Monitoring BGP . . . . . . . . . . . . 16 + 7. Operational Complexities Introduced by Altering RFC4271 . . . 20 + 7.1. Reducing the Network Impact of Session Teardown . . . . . 22 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 + 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 + 11.1. Normative References . . . . . . . . . . . . . . . . . . . 27 + 11.2. Informational References . . . . . . . . . . . . . . . . . 27 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 1. Introduction Where BGP-4 [RFC4271] is deployed in the Internet and Service Provider networks, numerous incidents have been recorded due to the manner in which [RFC4271] specifies errors in routing information should be handled. Whilst the behaviour defined in the existing standards retains utility, the deployments of the protocol have changed within modern networks, resulting in significantly different demands for protocol robustness. Whilst a number of Internet Drafts @@ -188,21 +188,21 @@ o It is unacceptable within modern deployments of the BGP-4 protocol that a single erroneous UPDATE packet affects prefixes that it does not carry. This requirement therefore requires some modification to the means by which erroneous UPDATE packets are handled, and reacted to - with a particular focus on avoiding the use of the NOTIFICATION message. o It is recognised that some error conditions may occur within the BGP-4 protocol may not always be handled gracefully, and may result in conditions whereby an implementation cannot recover. In - these (and similar) cases, it is unacceptable for an operator that + these (and similar) cases, it is undesirable for an operator that this reset of the BGP-4 session results in interruption to forwarding packets (by means of withdrawing prefixes installed by BGP-4 into a device's RIB, and subsequently FIB). To this end, there is a requirement to define a session reset mechanism which provides session re-initialisation in a non-destructive manner. o Further to the requirements to provide a more robust protocol, the current visibility into error conditions within the BGP-4 protocol is extremely limited - where further modifications to this behaviour are to be made, complexity is likely to be added. Thus, @@ -249,24 +249,24 @@ received where some difficulty exists in parsing the entire BGP message. The two cases concern those cases where a valid NLRI attribute can be extracted, and those where such an attribute is not able to be parsed. In these cases, errors in the packing of attributes within a BGP message may have occurred. Such errors are likely indicative of an error specifically caused by the remote BGP speaker. It is, however, desirable to an operator that such errors are handled without affecting all NLRI across a BGP session. As such, there is a key requirement to maximise the number of cases in which it is possible to extract NLRI from a BGP UPDATE message. To - this end, it is required that where possible the MP_REACH and - MP_UNREACH attributes are utilised for encoding all NLRI (including - IPv4 Unicast), and that this attribute is included as the first - attribute of a BGP UPDATE message (as originally recommended in + this end, it is required that where possible the MP_REACH_NLRI and + MP_UNREACH_NLRI attributes are utilised for encoding all NLRI + (including IPv4 Unicast), and that this attribute is included as the + first attribute of a BGP UPDATE message (as originally recommended in [I-D.chen-ebgp-error-handling]). Such a change to the order of inclusion of this attribute maximises the number of cases in which NLRI can be extracted from an UPDATE. Where this is possible, it is again required that the error handling mechanisms utilised should be directly applied to the NLRI included in the UPDATE. For all cases whereby NLRI can be obtained from an UPDATE message, it is expected that the requirements outlined in Section 3 should be considered by any enhancement to the BGP-4 protocol. @@ -294,20 +294,27 @@ It is clearly of advantage for BGP-4 implementations to utilise a consistent set of error handling mechanisms for the different types of errors that are described in Section 2, and provide consistent nomenclature to refer to them. It is therefore suggested that errors that are indicative of larger scale failures of a BGP speaker, and hence require some error handling at the session level are referred to as 'critical' errors, whilst those errors that are identified based on incorrect content of one of more attributes of a message are referred to as 'semantic' errors. + The errors identified within the following sections consider only + those errors within the specifications at the time of writing, it is + recommended that in the definition of future extensions to the BGP-4 + specification, the error handling behaviour (and the category within + which errors within the extension should be considered by an + implementation) is defined. + 2.1.1. Critical BGP Errors As described in this document, it is of advantage to limit the number of 'critical' errors that occur within the protocol, therefore, based on analysis of the processing of BGP UPDATE messages, it is required that 'critical' error handling behaviour is applied to: o UPDATE Message Length errors - whereby the specified overall UPDATE message length is inconsistent with sum of the Total Path Attribute and Withdrawn Routes length. In this case, this is @@ -323,21 +330,21 @@ It is expected that those requirements outlined in Section 5 are utilised to provide session-level handling of those errors identified as 'critical'. 2.1.2. Semantic BGP Errors Where a BGP message is correctly formed, a number of cases exist whereby the contents of the UPDATE are not valid - in these cases, this represents errors that can be identified to affect specific - NLRI. The following cases are expected to be classified a semantic + NLRI. The following cases are expected to be classified as semantic errors: o Zero or invalid length errors in path attributes excluding those containing NLRI, or where the length of all path attributes contained within the UPDATE does not correspond to the total path attributes length. In this case, the NLRI can be correctly extracted, and hence acted upon. o Messages where invalid data or flags are contained in a path attribute that does not relate to the NLRI. @@ -442,23 +449,23 @@ by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is re-advertised. A mechanism to achieve this re-advertisement is defined within the ROUTE-REFRESH specification [RFC2918]. It is envisaged that by requesting a refresh of all NLRI advertised by a BGP speaker, any NLRI which has been withdrawn due to being contained within an invalid UPDATE message is re-learnt. Where a ROUTE REFRESH is used to directly perform a consistency check between the Adj-RIB- Out of a remote device, and the Adj-RIB-In of the local BGP speaker, a demarcation between the ROUTE-REFRESH, and normal UPDATE messages is required (in order that an "end" of the refresh can be used to - identify any 'stale' NLRI) - [I-D.keyur-bgp-enhanced-route-refresh] - provides a means by which the ROUTE-REFRESH mechanism can be extended - to meet this requirement. + identify any 'stale' NLRI) - + [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which + the ROUTE-REFRESH mechanism can be extended to meet this requirement. Whilst re-advertisement of the whole BGP RIB provides a means by which withdrawn NLRI can be re-advertised, there are some scaling implications that must be considered. In the case that a ROUTE- REFRESH is generated, all NLRI must be re-packed into UPDATE messages and advertised by one speaker on the BGP session, whilst the other must receive all UPDATE messages, and validate the RIB's consistency. Clearly, it is advantageous to avoid this work where possible. It is envisaged that during routing inconsistencies caused by @@ -509,66 +516,66 @@ messages generated in response to errors in UPDATE messages (as defined by Section 6.3 in [RFC4271]). The existing NOTIFICATION behaviour triggers a reset of all elements of the BGP-4 session, as described in Section 6 of [RFC4271]. It is expected that session teardown requires an implementation to re- initialise all structures and state required for session maintenance. Clearly, there is some utility to this requirement, as error conditions in BGP are, in general, exited from. However, this definition is responsible for the forwarding outages within networks - utilising BGP for route propagation when each error is experienced. - The requirement described in Section 3 is intended to reduce the - cases whereby a NOTIFICATION is required, however, any mechanism - implemented as a response to this requirement by definition cannot - provide a session reset to the extent of that achieved by the current - behaviour. + utilising BGP for propagation of routing or service when each error + is experienced. The requirement described in Section 3 is intended + to reduce the cases whereby a NOTIFICATION is required, however, any + mechanism implemented as a response to this requirement by definition + cannot provide a session reset to the extent of that achieved by the + current behaviour. In order to address this, there is a requirement for a means by which a BGP speaker can signal that an unhandled error condition in an UPDATE message occurred - requiring a session reset - yet also continue to utilise the paths advertised by the neighbour that are currently in use within the RIB. In this case, the Adj-RIB-In received from the neighbour is not considered invalid, despite a NOTIFICATION, and session reset, being required. This set of requirements is akin to those answered by the BGP Graceful Restart mechanism described in [RFC4724]. Since the operational requirement in this case is to provide a means to achieve a complete session restart without disrupting the forwarding path of those prefixes in use within a BGP speaker's RIB, it is expected that utilising a procedure similar to the Graceful Restart mechanism meets the error handling requirement. By responding to an error condition (repeated or otherwise) with a message indicating that an error that cannot be handled has occurred, forcing session reset, whilst retaining forwarding information within the RIB allows forwarding to all - prefixes within a system's RIB to continue, whilst the session - restarts. It is envisaged that the additional complexity introduced - by the introduction of such a mechanism can be limited by extending - existing BGP messages - one such approach is proposed in + prefixes within a system's RIB to continue during the period in which + the session restarts. It is envisaged that the additional complexity + introduced by the introduction of such a mechanism can be limited by + extending existing BGP messages - one such approach is proposed in - [I-D.keyupate-idr-bgp-gr-notification]. By placing a time bound on - the restart lifetime, should an error condition not be transient - - for example, should an error have occurred with the BGP process, - rather than a specific of the BGP session - the remote BGP speaker is - still detected as an invalid device for forwarding. + [I-D.ietf-idr-bgp-gr-notification]. By placing a time bound on the + restart lifetime, should an error condition not be transient - for + example, should an error have occurred with the BGP process, rather + than a specific of the BGP session - the remote BGP speaker is still + detected as an invalid device for forwarding. - It should, however, be noted that a protocol enhancement meeting this + It should be noted that a protocol enhancement meeting this requirement is not able to solve all error conditions - however, a complete restart of the BGP and TCP session between two BGP speakers implements an identical recovery mechanism to that which is achieved by the existing behaviour. Where an error condition such as memory or configuration corruption has occurred in a BGP implementation, it is expected that a mechanism meeting this requirement continues to detect this, by means of a bound on time for session restart to occur. Whilst there may be some consideration that packets continue to be forwarded through a device which can be in an failure mode of - this nature for a longer period, due to this requirement, the + this nature for a longer period due to this requirement, the architecture of modern IP routers should be considered. A divided forwarding and control plane is common in many devices, as well as process separation for software-based devices - corruption of a specific protocol daemon does not necessarily imply forwarding is affected. Indeed, where forwarding behaviour of a device is affected, it is envisaged that a failure detection mechanism (be it Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets) will detect such a failure in almost all cases, with the symptomatic behaviour of such a failure being an invalid UPDATE message in very few other cases. @@ -617,49 +624,49 @@ As such, it is possible to divide the messages that are required in order to provide further visibility into BGP for an operator. Such a division can be made both due to the required means of message transmission, alongside the criticality of each request. o Messages required to replace NOTIFICATION - In cases where the error handling mechanisms defined by [RFC4271] currently result in a NOTIFICATION message being generated, a number of the requirements detailed within this document result this message being suppressed. Despite this change, the error condition's - occurrence is still of interest to an operator, since some form of - invalid data has been received on a session in order to provide - both monitoring and troubleshooting capabilities. It therefore + occurrence is still of interest to an operator in order to provide + both monitoring and troubleshooting capabilities, since some form + of invalid data has been received on a session. It therefore considered that an implementation must generate a message both locally, and transmitted to the remote peer, based on the such a condition. Where such a message is transmitted to the remote peer, it is considered that the BGP session via which the - erroneous UPDATE message was received as transport to the remote - peer. The information transmitted in such a message should be - minimised to allow identification of the paths which were - considered erroneous (i.e. restricting the information to that - which is directly relevant to a network operator in the case of an - error condition occurring). Any delay to convergence on the + erroneous UPDATE message was received should be used as transport + to the remote peer. The information transmitted in such a message + should be minimised to allow identification of the paths which + were considered erroneous (i.e. restricting the information to + that which is directly relevant to a network operator in the case + of an error condition occurring). Any delay to convergence on the session in question is considered to be acceptable, given the suboptimal nature of the reception of invalid routing information via a BGP session. Further concerns regarding such a mechanism relate to the load generated on the BGP speaker in question, however, it must be considered that in the case of an erroneous UPDATE being received, and the 'treat-as-withdraw' mechanism being utilised, where the erroneous path is removed from the Loc-RIB, there is likely to be a requirement to generate UPDATE messages withdrawing the prefix from all further BGP speakers to which the prefix is advertised. The load generated by the generation of such UPDATEs is likely to be much greater than that of transmitting error information via a logging message type back to the speaker from which it was received. It is envisaged that light-weight BGP message-based signalling mechanisms such as the ADVISORY message types detailed in - [I-D.frs-bgp-operational-message] provide a suitable means to + [I-D.ietf-idr-operational-message] provide a suitable means to satisfy this requirement. o Additional Diagnostic Capabilities for BGP - In a number of cases, there is an operational requirement to further debug erroneous BGP UPDATE messages, along with the particulars of the state of a BGP speaker. For instance, where an invalid BGP UPDATE message is transmitted between two BGP speakers, the exact format of the UPDATE message is of interest to an operator, as this information provides a clear indication of an message considered to be erroneous by the BGP speaker to which it was transmitted. In this @@ -667,21 +674,21 @@ message is transmitted back to the advertising speaker, in order to allow for further debugging to occur. Whilst such information is particularly useful to an operator, it clearly provides information that is not key to protocol operation - for this reason, it is expected that some of the concerns regarding the additional complexity, and load that a BGP speaker is subjected to is not acceptable. For this reason, it is required that where mechanisms are developed to support this requirement, messages of this nature can be supported both within an existing BGP session, and via a dedicated separate session, be it BGP carrying messages - such as those defined in [I-D.frs-bgp-operational-message] or a + such as those defined in [I-D.ietf-idr-operational-message] or a dedicated monitoring protocol akin to BMP described in [I-D.ietf-grow-bmp]. Whilst the operational requirement for such monitoring tools to allow for visibility into BGP is clearly agreed upon, the means by which such messages are transmitted between two BGP speakers is likely to be dependent upon both the positions of the speakers in question (for instances, the requirements for such a protocol may differ where a session is between two ASBRs under separate administration). The introduction of additional message types to the BGP protocol clearly @@ -723,148 +730,158 @@ specified to meet the requirements for event visibility consider the relative impacts of additional monitoring sessions, or message inclusion in band to BGP in order not to compromise the security, scalability and robustness of the BGP-4 protocol. 7. Operational Complexities Introduced by Altering RFC4271 The existing NOTIFICATION and subsequent teardown of a BGP session upon encountering an error has the advantage that a consistent approach to error handling is required of all implementations of the - BGP-4 protocol. This is of operational advantage, as it provides a + BGP-4 protocol. This is of operational advantage as it provides a clear expectation of the behaviour of the protocol. The requirements defined herein add further complexity to the error-handling within BGP, and hence are liable to compromise the existing deterministic protocol behaviour. It is therefore deemed that there is a further - requirement to provide a clear method by which an erroneous UPDATE - should be reacted to, in order that all protocol implementations - provide a consistent means by which recovery is achieved. A further - complexity is introduced due to the disparate nature of the work - items altering the BGP error handling behaviour - since all items are - likely to be implemented as a BGP capability [RFC5492], situations - are likely to occur between devices (especially those with different - BGP implementations), where some of the mechanisms referenced are - unsupported. This adds further barriers to a standard definition of - the BGP-4 error handling behaviour. - - In general, the approach considered ideal upon encountering an - erroneous UPDATE message can be divided into two cases - those where - the NLRI can be determined from the message, and those where it - cannot be. The latter case is the simpler of the two. In this case, - there is a requirement for the implementation to reset the BGP - session, utilising the reduced-impact approach, described in - Section 5. In the case where the remote BGP speaker is in a - transient error condition related to specific peer data structures, - or state, a single instance of this behaviour is likely to exit the - error condition. In the case of implementation errors, it is - possible that the BGP session in question may enter a continuous loop - of being reset, with a partial RIB being held by one or more of the - BGP speakers due to an non-deterministic order of UPDATE propagation. - It is therefore a requirement that within this reduced-impact - procedure any subsequent UPDATE messages that would result in further - session resets are ignored. Whilst this results in a condition where - an undetermined amount of the RIB is inconsistent, partial - reachability is maintained. In this case, the operational toolsets - discussed in Section 6 is likely to provide mechanisms by which this - condition can be brought to the attention of the relevant operators. - This requirement to accept a partial RIB, which results in potential - invalid traffic forwarding is a direct result of the deployments of - BGP-4, as described in Section 1.1. - - The case where NLRI can be determined from an erroneous UPDATE - provides further complexities. In this case, a BGP speaker is aware - of the sub-set of the RIB which have been identified as being - contained within invalid UPDATE messages. This allows a local BGP - speaker to re-request single prefixes, utilising a mechanism such as - "one-time prefix ORF". However, a similar result is achieved by re- - requesting the entire RIB - albeit with greater resource - requirements. It is therefore expected that the process of recovery - utilises a staged set of mechanisms to attempt to restore consistency - of the RIB: + requirement to define a set of recommended behaviours based on the + reception of a particular class of erroneous UPDATE message, + alongside highlighting some of the implementation complexities that + may need to be handled in the case that particular recommendations + made within this memo are deployed. - 1. Where available, a mechanism capable of requesting only the NLRI - determined to have been contained within a invalid UPDATE should - be utilised. However, since it is possible that such an error - condition can be transient in nature, it is likely that more than - one request is to be transmitted (assuming the first does not - return a valid UPDATE message). In order to allow a - deterministic process, there is a requirement for a limit on the - number of specific requests transmitted to be defined. + Utilising the classes of erroneous UPDATE message described in + Section 2, the recommended behaviour for a BGP-4 implementation can + be divided into two branches. Primarily, where a semantic error is + identified, an implementation is expected to utilise the reduced- + impact error handling approach, as described in Section 3. In the + case that such an approach results in known NLRI being withdrawn from + the BGP speaker's RIB, and an implementation provides functionality + such that these errors are recovered from through an automatically + triggered means, such as those described within Section 4, some + consideration of the scalability of these recovery mechanisms is + required. Clearly, there is an computational and bandwidth overhead + associated with the re-advertisement of NLRI between two BGP speakers + - both due to the generation of UPDATE messages, their transmission + between the two speakers, and the parsing and processing into the RIB + required. This overhead is directly proportional to the number of + UPDATE messages that are required. Where a semantic error is + experienced, by definition the NLRI contained within the UPDATE can + be extracted. It is therefore possible to minimise the proportion of + the RIB that is re-advertised by targeting any recovery mechanism on + the NLRI contained within the erroneous UPDATE. Such a targeted + mechanism can be achieved through a means such as One-Time ORF, or + other means of targeting UPDATE messages not discussed within this + memo. It is recommended that where available, any automatic (or + manual) triggered recovery mechanism behaviour utilises such targeted + means in preference to any whole RIB refresh mechanism (such as + ROUTE-REFRESH). - 2. Where a specific refresh mechanism is not available, a peer - should re-request the entire RIB. Again, there is a requirement - to limit the number of complete RIB requests that should be sent - via an implementation, in order to provide a bound both on the - expected level of load a device may experience, and on the time - for which the RIB may be inconsistent. + In the case that an erroneous UPDATE has been processed through a + means such as treat-as-withdraw (described within Section 3), a + recovering mechanism may be considered superfluous, if the assumption + is made that the RIB inconsistency will only be recovered from based + on a path re-convergence (or change in BGP attribute) for the + advertising BGP speaker. However, where this assumption is not + considered to provide adequate recovery behaviour, and a mechanism to + restore RIB consistency automatically is implemented, some + consideration must be made for where repeated erroneous messages + occur. In this case, in order to limit the impact to the BGP + speaker's network operation, at a pre-defined point it is recommended + that such automatic recovery mechanisms towards the BGP speaker from + which erroneous UPDATEs are repeatedly received are suppressed, and + the fact that such suppression has occurred is highlighted to an + operator. The point at which such behaviour is suppressed is to be + defined on a per-implementation basis, taking into account feedback + from the Network Operator community based on the deployment of the + recommendations described in this document. It is expected that such + trigger points are dependent upon the mechanisms implemented for a + particular BGP-4 implementations, and the impact upon the speaker of + these means of RIB recovery. - 3. Finally, a session reset should be performed, as per the reduced- - impact NOTIFICATION requirement defined in Section 5. At this - point, a similar challenge to that discussed above exists, should - the error condition persist. In this case, as defined above, - there is a requirement to ignore those UPDATE messages that - continue to be erroneous. + Where critical errors are experienced, such that a session reset is + required, the mechanism discussed in Section 5 should be used. + Again, since such a mechanism results in a restart of a BGP session, + it expected that all NLRI carried over the session is re-advertised + as it is re-established, incurring processing overhead on both the + advertising and receiving BGP speaker. In order to minimise the + consumption of control-plane computational resource on both speakers, + it is recommended that mechanisms allowing a reduced set of BGP + UPDATE messages to be re-transmitted between two speakers are + employed wherever possible - for instance through employing + mechanisms such as those described in [I-D.ietf-idr-enhanced-gr]. - It is envisaged that where limits are required, these will be defined - on a per memo-basis, or within a further revision of the requirements - described herein. + In the case that repeated critical errors occur, the overhead of + performing any mechanism implemented based on the requirements in + Section 5 is incurred following each erroneous UPDATE message. Since + these mechanisms are, by definition, performed automatically in + response to the erroneous message being received similar + considerations as to the impact to the BGP speaker must be taken into + account. As such, it is expected that after a certain trigger level, + the ongoing receipt of critical errors within BGP UPDATE messages is + deemed to be indicative of a long-lasting failure, and a session no + longer considered viable. Where such an case is experienced, it is + expected that the BGP session reverts to the standard session failure + behaviour, as described in [RFC4271] and documents updating this base + standard. Where such a reversion is implemented this condition + should be flagged to an network operator. The number of restart + attempts before the session reverts to being shut down should be + determined based on the overhead of the recovery mechanisms + implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is + implemented, the impact of session restart may be significantly + lower), and operational experience of the deployment of the + recommendations described in this document. - Whilst the approach described above provides a standard means by - which error recovery may be handled on a per UPDATE basis, further - complexities are raised where multiple errors occur. Clearly, - following this procedure causes control-plane load on both the BGP - speakers - for this reason, consideration of how repeated use of the - mechanisms discussed in this document is required. It is notable - that errors may not occur with UPDATE messages relating to only a - single NLRI, independent errors in multiple NLRIs may be experienced. - For this reason, it is required that an implementation rate limits - the number of error handling events sourced towards a particular - neighbour. It is expected that such rate limiting, or event - suppression is achieved on a per-session basis, where state - information is already held, rather than on a per-prefix basis as it - is envisaged that such behaviour presents significant scaling - problems, and introduces further state requirements for an - implementation of the protocol. It is recommended that where a flag - indicative of erroneous behaviour is implemented, the state of such a - value is maintained independently of session establishment. + Since repeated erroneous UPDATE messages which experience critical + errors may be indicative of long-lasting failure modes, it is + recommended that a back-off from restarting BGP sessions experiencing + such behaviour is implemented. As such, this is not applicable to + restart behaviour through means such as those described in Section 5 + since such restarts are time-bound based on the period for which the + Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when + considering BGP Graceful Restart, such restarts are time-bound by the + Restart Time described in [RFC4724]). However, following a session + reverting to being pulled down based on repeated error conditions, it + is recommended that following restart attempts are subject to an + exponentially increasing interval between subsequent attempts. It is + therefore recommended that in such cases an implementation implements + the increasing values of IdleHoldTimer as described in the BGP-4 FSM + documented in [RFC4271]. 7.1. Reducing the Network Impact of Session Teardown - In some cases, where repeated erroneous UPDATE messages are received - on a BGP-4 session, it is desirable that a BGP speaker disconnects - completely from the remote peer without performing a restart, in - order to avoid the control-plane overhead of repeated session - establishment, and subsequent reset events. This behaviour may be - required after a per-session flag indicating erroneous behaviour is - set, as discussed in Section 7. The BGP-4 specification presented in - [RFC4271] achieves such a session shutdown by sending a NOTIFICATION - message, however, this has the net result that all downstream BGP - speakers (i.e. those to whom the NLRI carried over the now ceased BGP - session was readvertised) must withdraw this NLRI from their RIB, and - perform a best-path selection if required. In some cases, there may - be no alternate path being available, and hence a period of time for - which no valid BGP route exists. Particularly, this is very likely - to occur where an upstream BGP speaker performs a best-path selection - and advertises only a single path to its neighbours - there is a - requirement for the upstream speaker to perform a best-path - selection, and re-advertise a new set of NLRI before the downstream - system is able to converge to a new path. It should be noted that - where UPDATE messages withdrawing NLRI are not subject to the BGP - session's configured MinRouteAdvertisementInterval (MRAI) [RFC4271], - but re-advertisements are, this may result in a BGP speaker being - without a path for a period up to the MRAI. + As discussed within the preceding section, where repeated critical + UPDATE message errors are received, it is recommended that the impact + to the both advertising and receiving BGP-4 speakers be limited by + reverting to tearing the BGP-4 session experiencing such errors down. + The BGP-4 specification presented in [RFC4271] achieves such a + session shutdown by sending a NOTIFICATION message, however, this has + the net result that all downstream BGP speakers (i.e. those to whom + the NLRI carried over the now ceased BGP session was readvertised) + must withdraw this NLRI from their RIB, and perform a best-path + selection if required. In some cases, there may be no alternate path + being available, and hence a period of time for which no valid BGP + route exists. Particularly, this is very likely to occur where an + upstream BGP speaker performs a best-path selection and advertises + only a single path to its neighbours - there is a requirement for the + upstream speaker to perform a best-path selection, and re-advertise a + new set of NLRI before the downstream system is able to converge to a + new path. It should be noted that where UPDATE messages withdrawing + NLRI are not subject to the BGP session's configured + MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements + are, this may result in a BGP speaker being without a path for a + period up to the MRAI. Clearly, it is advantageous to avoid this period of time for which there may be no reachability for a set of NLRI, especially since the BGP speaker terminating a particular session is doing so due to a particular error handling policy. The graceful shutdown mechanism - detailed in [I-D.francois-bgp-gshut] provides a mechanism by which a + detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a BGP speaker is able to signal that a set of NLRI is to be withdrawn, and hence allow downstream systems to pre-emptively perform a best- path selection, and hence advertise new reachability information in a make-before-break manner. It is therefore envisaged, that where a session is to be shutdown, based on a trigger relating to erroneous UPDATE messages being received (be they repeated or not) that the graceful shutdown procedure in utilised, so as to reduce the forwarding impact of NLRI received on the session being withdrawn. @@ -892,27 +909,27 @@ Some mechanisms meeting the requirements specified in this document, particularly those within Section 6 may provide further security concerns, however, it is envisaged that these are addressed in per- enhancement memos. 10. Acknowledgements The author would like to thank the following network operators for their insight, and valuable input in defining the requirements for a variety of operational deployments of the BGP-4 protocol; Shane - Amante, Bruno Decraene, Rob Evans, David Freedman, Tom Hodgson, Sven - Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom Scholl and - Ilya Varlashkin. + Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom + Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom + Scholl and Ilya Varlashkin. In addition, many thanks are extended to Jeff Haas, Wim Hendrickx, - Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert Raszuk - for their expertise relating to implementations of the BGP-4 + Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert + Raszuk for their expertise relating to implementations of the BGP-4 protocol. 11. References 11.1. Normative References [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000. [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, @@ -929,65 +946,68 @@ (IBGP)", RFC 4456, April 2006. [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, January 2007. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, January 2007. - [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement - with BGP-4", RFC 5492, February 2009. - 11.2. Informational References [I-D.chen-ebgp-error-handling] Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP Updates from External Neighbors", draft-chen-ebgp-error-handling-01 (work in progress), September 2011. - [I-D.francois-bgp-gshut] - Francois, P., Decraene, B., pelsser, c., and C. Filsfils, - "Graceful BGP session shutdown", - draft-francois-bgp-gshut-01 (work in progress), - March 2009. - - [I-D.frs-bgp-operational-message] - Raszuk, R., Shakir, R., and D. Freedman, "BGP OPERATIONAL - Message", draft-frs-bgp-operational-message-00 (work in - progress), July 2011. + [I-D.ietf-grow-bgp-gshut] + Francois, P., Decraene, B., Pelsser, C., Patel, K., and C. + Filsfils, "Graceful BGP session shutdown", + draft-ietf-grow-bgp-gshut-03 (work in progress), + December 2011. [I-D.ietf-grow-bmp] Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring Protocol", draft-ietf-grow-bmp-06 (work in progress), December 2011. + [I-D.ietf-idr-bgp-enhanced-route-refresh] + Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced + Route Refresh Capability for BGP-4", + draft-ietf-idr-bgp-enhanced-route-refresh-01 (work in + progress), December 2011. + + [I-D.ietf-idr-bgp-gr-notification] + Patel, K., Fernando, R., and J. Scudder, "Notification + Message support for BGP Graceful Restart", + draft-ietf-idr-bgp-gr-notification-00 (work in progress), + December 2011. + + [I-D.ietf-idr-enhanced-gr] + Patel, K., Chen, E., Fernando, R., and J. Scudder, + "Accelerated Routing Convergence for BGP Graceful + Restart", draft-ietf-idr-enhanced-gr-00 (work in + progress), December 2011. + + [I-D.ietf-idr-operational-message] + Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL + Message", draft-ietf-idr-operational-message-00 (work in + progress), March 2012. + [I-D.ietf-idr-optional-transitive] Scudder, J., Chen, E., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", draft-ietf-idr-optional-transitive-04 (work in progress), October 2011. - [I-D.keyupate-idr-bgp-gr-notification] - Patel, K., Fernando, R., Scudder, J., and J. Haas, - "Notification Message support for BGP Graceful Restart", - draft-keyupate-idr-bgp-gr-notification-00 (work in - progress), July 2011. - - [I-D.keyur-bgp-enhanced-route-refresh] - Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced - Route Refresh Capability for BGP-4", - draft-keyur-bgp-enhanced-route-refresh-02 (work in - progress), March 2011. - [I-D.zeng-one-time-prefix-orf] Zeng, Q. and J. Dong, "One-time Address-Prefix Based Outbound Route Filter for BGP-4", draft-zeng-one-time-prefix-orf-01 (work in progress), October 2010. [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 2010.