Cisco NX-OS: vPC & Failures

vPC Orphan Ports

– Traffic from remote Orphan is allowed over Peer Link and exit via local Member
– Traffic from remote Member is allowed over Peer Link and exit via local Orphan
-Orphans ports should be avoided at all costs because PL is a bottleneck of the system

Ideal: vPC Peers only have vPC Member Ports and all downstream devices are dual attached

vPC Consistency Checks

  • Type 1 Global and Interface Consistency Check
    • if global mismatch – vPC failing to form
    • if interface mismatch – VLANs being suspended
  • Type 2 Consistency Check
    • if mismatch – log messaged but not vPC failure, but could be data plane failures

Failure: vPC peer-link failure (link loss)

  • Secondary waits for hold-timeout and keepalive timeouts trying to reach out to the Primary over Keep-alive link
  • After timers expire
    • if vPC Primary is alive:
      • disable Member port on Secondary
      • disable SVI on Secondary
      • => Secondary is disabled => force all traffic to go over Primary
    • if vPC Primary is dead:
      • promote vPC Secondary to Operational Primary
      • traffic over new vPC Primary

1. if vPC Primary is alive:

NXOS1(config)# int po50
NXOS1(config-if)# shutdown
2019 Oct 22 05:15:26 NXOS1 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: 
Peer-link going down, suspending all vPCs on secondary. 
If vfc is bound to vPC, then only ethernet vlans of that VPC shall be down.

If next, vPC Primary fails completely (fail/reboot/keepalive link is down):

  • Secondary already has Member Ports suspended and they don’t come back
  • Secondary does not continually check for vPC Primary
  • Result: now both vPC Primary and Secondary are disabled

Solution: vPC Auto-Recovery

Allows vPC Secondary to assume Primary in certain failure scenarios

power outage
both vPC peers are rebooted
one vPC does not come up, second vPC peer is ready to form vPC, but nothing on other end
with Auto-Recovery: if Peer Link doesn not come up before Auto Recovery timeout, promote myself as Primary and bring up Member ports
vPC Peer Link goes down
Secondary waits for hold-timeout and keepalive timeouts to expire
Keepalive received => suspend vPC member Ports
Primary completely fails or reboot
vPC Secondary actively checks for keepalive with Auto-Discovery feature and promotes itself to Primary and un-suspend vPC Member Ports

No preemption! No role preemption in vPC

Strong Recommendation: Always enable vPC auto-recovery on both vPC peer devices

Auto-Discovery Problem: If both Peer and Keepalive links go down, but both vPC peers are still running – Secondary promotes itself to Primary => Split Brain => issue in data traffic

Resolve: power off secondary or no feature vpc
=> redundancy in everything (power/sup etc)

vpc domain 1
Both Peer and Keep-alive Links are down