NCS5500 QoS Part 2 - Verifying Buffering in Lab and Live Networks

11 minutes read

You can find more content related to NCS5500 including routing memory management, VRF, URPF, ACLs, Netflow following this link.

Also you can find the first part of this post here:
https://xrdocs.io/ncs5500/tutorials/ncs5500-qos-part-1-understanding-packet-buffering/

Checking Buffering in action

This second blog post will take concrete examples to illustrate the concepts covered in the first part.
The NCS5500 is based on a VOQ-only, single-lookup and ingress-buffering forwarding architecture.
We will use a lab example to illustrate how the system handles bursts, then we will present the monitoring tools / counters we can use to measure where packets are buffered, and finally we will present the data collected on 500+ NPUs in production.
This should answer frequently asked questions and clarify all potential doubts.

Video

We recommend to start watching this short Youtube video first:
https://www.youtube.com/watch?v=1qXD70_cLK8

Lab test

For this first part, and following customer request, we built a large test bed:

  • NCS5508 with two line cards 36x100G-A-SE (each card is made of 4x NPU Jericho+, each one handling 9 ports 100G)
  • 27 tester ports 100GE (Spirent) connected to 27 router ports
    • 9 ports on LC 4 NPU 0
    • 9 ports on LC 4 NPU 1
    • 9 ports on LC 6 NPU 0

We generate a background / constant traffic of 80% line rate (80Gbps on each port) between two NPUs. This bi-directional traffic is displayed in purple in the diagram above.
Then, we will use the remaining 9 ports to generate peaks of traffic targeted to the ports Hu0/6/0/0-8 (shown in red in the diagram).
These bursts are lasting 100ms, every second.

On the tester, we didn’t make any specific PPM adjustment and used internal clock.

On the router, no specific configuration either. Interfaces are simply configured with IPv4 addresses (no QoS).

The tests performed are the following:

  • test 1: 80% background and all the ports bursting at 20%. That means:
    • 900ms at 80% LR
    • 100ms at 100% LR
      We verify no packets are dropped on the background or the bursts, and we also make sure with the counters that packets are exclusely handled in the OCB.
  • test 2 :
    • 80% background
    • other ports bursting at 20%
    • one single port bursts at 25%, creating a 5Gbps saturation for 100ms
      Here again, we verify no packets are dropped on the background or the bursts, but also we verify that packets are sent to the DRAM.
      It’s expected since one queue exceeds the threshold and is evicted to the external buffer.

This test is basic but has been requested by several customers to verify we had no drop in such situations. It proves it’s not the case, as designed.

Metrology

In the former test, we used specific counters to verify the buffering behavior.
Let’s review them.

We will collect the following Broadcom counters:

  • IQM_EnqueuePktCnt: total number of packets handled by the NPU
  • IDR_MMU_CREDITS: total number of packets moved to DRAM
  • IQM_EnqueueDscrdPktCnt: total number of packets dropped because of taildrop
  • IQM_RejectDramIneligiblePktCnt: total number of packets dropped because DRAM was not accessible in read, typically when the bandwidth to DRAM is saturated
  • and potentially also IDR_FullDramRejectPktsCnt and IDR_PartialDramRejectPktsCnt

Form CLI “show controller npu stats counters-all instance all location all” we can extract: ENQUEUE_PKT_CNT, MMU_IDR_PACKET_COUNTER and ENQ_DISCARDED_PACKET_COUNTER


RP/0/RP0/CPU0:ROUTER#show controller npu stats counters-all instance all location all

FIA Statistics Rack: 0, Slot: 0, Asic instance: 0

Per Block Statistics:

Ingress:

NBI RX:
  RX_TOTAL_BYTE_COUNTER          = 161392268790033002
  RX_TOTAL_PKT_COUNTER           = 164628460653364

IRE:
  CPU_PACKET_COUNTER             = 0
  NIF_PACKET_COUNTER             = 164628460651867
  OAMP_PACKET_COUNTER            = 32771143
  OLP_PACKET_COUNTER             = 4787508
  RCY_PACKET_COUNTER             = 67452938
  IRE_FDT_INTRFACE_CNT           = 192

IDR:
  MMU_IDR_PACKET_COUNTER         = 697231761913
  IDR_OCB_PACKET_COUNTER         = 1

IQM:
  ENQUEUE_PKT_CNT                = 164640311902277
  DEQUEUE_PKT_CNT                = 164640311902198
  DELETED_PKT_CNT                = 0
  ENQ_DISCARDED_PACKET_COUNTER   = 90015441

To get the DRAM reject counters, we will use:

  • show contr npu stats counters-all detail instance all location all
    or if the IOS XR version doesn’t support the “detail” option, use the following instead:
  • show controllers fia diagshell 0 “diag counters” loc 0/x/CPU0

RP/0/RP0/CPU0:ROUTER#show contr npu stats counters-all detail instance all location all | i Dram

  IDR FullDramRejectPktsCnt            :                0
  IDR FullDramRejectBytesCnt           :                0
  IDR PartialDramRejectPktsCnt         :                0
  IDR PartialDramRejectBytesCnt        :                0
  IQM0 RjctDramIneligiblePktCnt        :                0
  IQM1 RjctDramIneligiblePktCnt        :                0
  IDR FullDramRejectPktsCnt            :                0
  IDR FullDramRejectBytesCnt           :                0
  IDR PartialDramRejectPktsCnt         :                0
  IDR PartialDramRejectBytesCnt        :                0
  IQM0 RjctDramIneligiblePktCnt        :                0
  IQM1 RjctDramIneligiblePktCnt        :                0

--%--SNIP--%--SNIP--%--
  

None of these counters are available through SNMP / MIB but instead you can use streaming telemetry:

From https://github.com/YangModels/yang/blob/master/vendor/cisco/xr/653/Cisco-IOS-XR-fretta-bcm-dpa-hw-resources-oper-sub2.yang

You’ll found:

ENQUEUE_PKT_CNT: iqm-enqueue-pkt-cnt


    leaf iqm-enqueue-pkt-cnt {
      type uint64;
      description "Counts enqueued packets";

MMU_IDR_PACKET_COUNTER: idr-mmu-if-cnt


    leaf idr-mmu-if-cnt {
      type uint64;
      description "Performance counter of the MMU interface";

ENQ_DISCARDED_PACKET_COUNTER: iqm-enq-discarded-pkt-cnt


    leaf iqm-enq-discarded-pkt-cnt {
      type uint64;
      description "Counts all packets discarded at the ENQ pipe";

At the moment (Apr 2019), RjctDramIneligiblePktCnt / FullDramRejectPktsCnt / PartialDramRejectPktsCnt are not available in the data models and therefor, can’t be streamed.

Auditing real production routers

We have the counters available and we asked multiple customers (25+) to collect data from their production routers.
In total, we had information for 550 NPUs transporting live traffic in multiple network positions:

  • IP core
  • MPLS core (P/LSR)
  • Internet border (transit / peering)
  • CDN (connected to FB, Akamai, Google Cache, Netflix, …)
  • PE (L2VPN and L3VPN)
  • Aggregation
  • SPDC / ToR leaf

The data aggregated is helpful since it gives a vision of what is happening in reality.
The total amount of traffic measured is tremendous: 24,526,679,839,376,100 packets!!!
Not in lab, not in academic models / simulations, but in real routers.

With the show commands described in former section, we extracted:

  • ENQUEUE_PKT_CNT: packets transmitted in the NPU
  • MMU_IDR_PACKET_COUNTER: packets passed to DRAM
  • ENQ_DISCARDED_PACKET_COUNTER: packets taildropped
  • RjctDramIneligiblePktCnt: packets drop because of DRAM bandwidth

Dividing MMU_IDR_PACKET_COUNTER by ENQUEUE_PKT_CNT, we can compute the ratio of packets moved to DRAM.
–> 0,151%
This number is an average value and should be considered as such. It shows that indeed, the vast majority of the traffic is handled in OCB (inside the NPU).

Dividing ENQ_DISCARDED_PACKET_COUNTER by ENQUEUE_PKT_CNT, we can compute the ratio of packets taildropped.
–> 0,0358%
Having drops is normal in the life of a router. Multiple reasons here, from TCP windowing to temporary congestion situations.

Finally, RjctDramIneligiblePktCnt will tell us if the link from the NPU to the DRAM can get saturated and drops packets with production traffic.
–> not a single packet discarded in such scenario.


LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | wc -l
    1570
LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | grep " 0" | wc -l
    1570
LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | grep -v " 0" | wc -l
       0
LAPTOP: nicolas$

In this chart, we sort by numbers of ENQUEUE_PKT_CNT: it represents the most active ASICs in term of packets handled.

RankENQUEUE_PKT_CNTMMU_IDR ENQ_DISCRjctDramRatio DRAM %Ratio drops %Network roles 
1527 787 533 239 2807 369 339 0051 705 600 24600,0013960,000323IP Core
2527 731 939 299 5387 637 629 2561 692 666 18800,0014470,000321IP Core
3392 487 675 531 358111 916 953 94024 771 334 18200,0285150,006311Peering
4348 026 620 119 6251 610 856 619781 841 47900,0004630,000225IP Core
5342 309 183 713 7741 348 042 248855 820 84600,0003940,000250IP Core
6327 474 089 745 397906 227 869871 575 59900,0002770,000266IP Core
7312 691 087 570 935149 450 31913 211 540 36700,0000480,004225Peering
8309 754 368 573 783217 802 4987 194 870 81300,0000700,002323Peering
9286 534 007 041 4717 937 852 208208 639 446 20200,0027700,072815IP Core
10285 891 289 921 80417 316 635 372208 339 055 09000,0060570,072874IP Core
11282 857 716 700 6754 619 576 899213 929 762 10700,0016330,075632IP Core
12281 159 664 612 01813 756 617 725448 617 96000,0048930,000160MPLS Core
13263 035 976 927 9156 226 947 181191 620 663 09800,0023670,072850IP Core
14253 469 417 751 7064 811 103 1011 078 118 41300,0018980,000425IP Core
15253 388 455 606 0994 808 169 3871 080 970 70800,0018980,000427IP Core
16249 198 998 283 888723 211 883432 154 97400,0002900,000173IP Core
17245 552 977 886 0191 672 035 2791 336 139 65000,0006810,000544IP Core
18244 787 789 764 4291 422 882 2401 211 167 69000,0005810,000495IP Core
19244 639 576 497 5651 347 623 5351 122 306 07200,0005510,000459IP Core
20244 604 586 809 8691 935 267 4291 384 016 97600,0007910,000566IP Core
21243 908 249 497 111586 491 218410 239 24700,0002400,000168IP Core
22243 639 692 775 43117 237 153 4901 434 003 86000,0070750,000589Peering
23237 152 936 875 785448 662 754384 071 60300,0001890,000162IP Core
24224 477 013 789 64713 369 954 8921 319 318 74900,0059560,000588MPLS Core
25219 820 911 786 8391 068 821 932647 226 84600,0004860,000294IP Core
26205 119 650 216 462766 453 141568 411 77200,0003740,000277IP Core
27203 306 915 869 45161 364 621 71312 422 238 06000,0301830,006110Peering
28194 981 015 445 73819 793 539 645117 282 21300,0101520,000060MPLS Core
29182 104 629 921 870163 108 29012 704 233 68500,0000900,006976Peering
30180 871 118 289 4261 362 976 328 26938 037 126 71500,7535620,021030P+PE
31173 166 873 157 959397 471 140353 417 92900,0002300,000204IP Core
32167 311 796 856 3521 282 666 496 06936 120 954 40900,7666320,021589P+PE
33164 640 767 868 235697 231 782 29990 015 44600,4234870,000055CDN
34164 640 311 902 277697 231 761 91390 015 44100,4234880,000055CDN
35160 506 138 361 929851 487 1611 826 760 06700,0005310,001138IP Core
36158 521 030 661 4381 391 033 571 1613 987 222 20500,8775070,002515CDN
37157 286 154 450 62914 699 938 42644 060 25000,0093460,000028Peering
38154 081 895 058 387623 038 967483 267 22400,0004040,000314IP Core
39143 902 175 998 205205 944 615 947595 151 00400,1431140,000414Peering
40143 686 937 442 12212 638 644130 005 73400,0000090,000090Peering
41142 498 738 296 176649 883 0651 404 348 50500,0004560,000986IP Core
42142 426 983 443 239645 597 5681 417 441 64400,0004530,000995IP Core
43138 083 165 878 0932 778 355 33554 030 68600,0020120,000039Peering
44130 425 299 235 308235 149 102 989117 322 56200,1802940,000090CDN
45125 379 522 379 915219 241 802 48477 781 18400,1748630,000062CDN
46122 178 283 814 177122 250 1062 168 387 24400,0001000,001775Peering
47121 842 623 410 092419 677 747 284419 677 747 28400,3444420,344442P+Peering
48121 842 227 567 846419 677 746 356419 677 746 35600,3444440,344444P+Peering
49119 048 492 308 14819 756 851 8821 468 594 30300,0165960,001234Peering
50118 902 447 078 43220 140 676 2861 437 569 52300,0169390,001209Peering

For the most busiest NPUs collected, we see the DRAM ratio and taildrop ratio being actually much smaller than aggregated numbers.

How to read these numbers?

First of all, it demonstrates clearly that most of the packets are handled inside the ASIC, only a very small portion of the traffic being evicted to DRAM.

Second, with RjctDramIneligiblePktCnt being zero in EVERY data collection, we prove that bandwidth from NPU to DRAM (900Gbps unidirectional) is correctly dimensionned. It handles the real burstiness of the traffic without a single drop.

Last, the data collected represents a snapshot. It is recommended to collect these counters regularly and to analyze them with the network activity during the interval.
Having higher numbers in your network may be correlated to a particular outage or specific situation.
Having small numbers, in the other hand, is much easier to read (no drops being… no drops).

Conclusion

In conclusion, the ingress-buffering / VOQ-only model is well adapted for real networks.

We have seen “academic” studies trying to prove the contrary, but the numbers are talking here.

A sandbox, or an imaginary model are not relevant approach.

Production networks deployed all around the world, in different positions/roles, transporting Petabytes of traffic for multiple years, prove the efficiency of this architecture.

Leave a Comment