3 October 2016

Modifying / Upgrading / Reloading a Cisco Router or Switch

Their are many ways to place routers and switches into "maintenance mode", however unlike other systems, this "maintenance" state has little to do with the device itself and more to do with what all the other networking devices on the network think of the device that is being worked on.

Their is no one-size-fits-all. I.e. you have to do whatever is required, according to the rest of your infrastructure, and the protocols used, to divert all network traffic away from and around the device.

Other mechanisms exist including ISSU (In-Service Software Upgrade), NSF/SSO (Non-Stop Forwarding)/(Stateless Switch Over) with GR (Graceful Restart), and NSR (Non-Stop Routing). These licensed technologies allow a router to be upgraded/reloaded etc without downtime, and are single node HA (High Availability) techniques.

The method described here, does not require hardware and software licenses for these premium features, and instead simply relies on having a well designed network with redundancy, whereby you simply push the production traffic onto other routers, leaving you free to take the device offline for as long as desired.

This rather coarse technique is especially useful if you want to make changes to the topology and neighbour relationships that a router might have. 
 

        1) Backup Config and Firmware. (scp preferred)
        2) Place router into maintenance mode. (divert all traffic away from and around this router)
        3) Upgrade router firmware and Reload. (if upgrading)
        4) Make architecture/config changes. (service affecting changes)
        5) Checks; BFD peers, OSPF adjacencies, Tunnels, and MPLS neighbors etc.
        6) Take Router out of maintenance mode. (OSPF and BGP)


1) Backup Config and Firmware

a) copy running-config startup-config <- Saves config to NVRAM/Flash

b) Enable SCP server on Cisco; 
This assumes you already have SSH version 2 etc working. To allow SCP file copies to work, you need to be able to login and be dropped directly into Privileged mode (enable is not needed). 

aaa new-model 
aaa authentication login default local
username <user> privilege 15 secret <secretpassword>
aaa authorization exec default local <- Drops you at '<host>#' without requiring 'enable' (recommended to remove this after works are complete)
ip scp server enable <- Enable SCP (recommended to remove this after works are complete)

c) Backup existing Firmware, Configuration and any other important files

scp admin@172.16.1.1:flash:/c3825-adventerprisek9-mz.150-1.M1.bin /var/tmp
scp admin@172.16.1.1:flash:/vlan.dat /var/tmp
scp admin@172.16.1.1:nvram:/startup-config /var/tmp

NB; Many systems will have running-config and startup-config in flash: Use the command dir all to find yours.

d) Upload new firmware (if required)

scp /var/tmp/c3825-adventerprisek9-mz.151-4.M10.bin admin@172.16.1.1:flash:/c3825-adventerprisek9-mz.151-4.M10.bin

e) Verify your uploaded firmware's MD5 matches the one shown on the Cisco site;

verify /md5 filesystem:filename [md5-hash]

2) Place Cisco router into Maintenance mode

 The general idea behind preparing a router for maintenance work is to make the all of the router's routes less preferable at the control plane, whilst keeping the forwarding plane fully functional (thus minimising change impact and route flapping).

 For IGP's (OSPF in this example), this is done by increasing all the route costs/metrics, causing other routers to recalculate their SPF's and converge traffic gracefully and without loss via other paths if available, for BGP this is done by gracefully closing all BGP neighbours.

 If you have combined routers and switches (e.g. Cisco ME3600X's), and you are using layer 2 FHRP's like VRRP, you will also need to ensure to move the FHRP Layer 2 master roles away (and possibly any connected customer equipment roles too).

 Should the maintenance expose some priority traffic which is still flowing over the router, pause the works as it's ideal an opportunity to fix and analyse any broken redundancy, before doing any reloads and actually breaking the forwarding plane ;)


Before you begin:
 It is recommended to perform as many checks as possible on the router being worked on, and all the routers connected to it, to ensure that all IGP and EGP peering's are up, and that all routes-maps, prefix-lists, redistributions etc are correct, to ensure other routers can take over all the traffic when the router under maintenance goes down.

 It is best to connect to a routers loopback address for maintenance work.

 And it's also a good idea to use "configure terminal revert timer 10" before you start and "configure confirm" after before reloading, to automatically rollback any changes should something bad happen.


 a) BGP (Layer 3 EGP);
router bgp <ASNumber>
 bgp graceful-shutdown all neighbors 200 local-preference 10 <- Send GSHUT, set Local Preference on all routes to 10, and close neighbour sessions after 200 seconds
 bgp graceful-shutdown all neighbors activate <- Activate gracefull-shut config above allowing lossless re-convergence to other paths while this router is still active.


NB; Wait at least 200 seconds before making the router unavailable or unable to route packets to ensure that no neighbours are holding routes towards this router.


 b) OSPF (Layer 3 IGP); 
router ospf 1
 max-metric router-lsa <- Increase the Router LSA Metric
 auto-cost reference-bandwidth 4294967 <- Increase all the path costs
 no default-information originate metric-type 1
 no default-information originate <- Assuming you are originating a default in IGP on this router etc..

 c) VRRP (Layer 2 FHRP);
Verify all VRRP states are "Backup".
 show vrrp
 show vrrp brief

If router is "Master", you will need to reduce the VRRP priority;
E.g. for the VRRP IP on a Vlan 26 SVI
interface Vlan 26
 vrrp 26 priority 1

You may also need to check that customer devices which are using these VRRP Next Hops have also switched their master roles away off the switches being worked on.



3/4) Make configuration changes / Upgrade / Reload router

Do your thang.
 ... reload (etc)

5) Checks

show ip interface brief
show interface status
show bfd neighbors
show ip ospf neighbor
show ip interface tunnel 0
show mpls ldp neighbor
etc...

6) Take Router out of maintenance mode 

router ospf 1
 default-information originate metric-type 1
 auto-cost reference-bandwidth 10000
 no max-metric router-lsa
 max-metric router-lsa on-startup wait-for-bgp <- Set OSPF Router LSA Metric to Max until BGP has settled down and recieved all routes

router bgp 60868
 no bgp graceful-shutdown all neighbors activate <- Bring back all BGP sessions


Dont forget to restore your Layer 2 HSRP/VRRP priorities..



http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/sec_usr_ssh/configuration/15-s/sec-usr-ssh-15-s-book/sec-secure-copy.html 
http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/configuration/xe-3s/irg-xe-3s-book/configuring-bgp-graceful-shutdown.pdf
http://www.cisco.com/c/en/us/about/security-center/ios-image-verification.html


No comments:

Post a Comment