NCS5500 QoS Part 2 - Verifying Buffering in Lab and Live Networks

11 minutes read

You can find more content related to NCS5500 including routing memory management, VRF, URPF, ACLs, Netflow following this link.

Also you can find the first part of this post here:
https://xrdocs.io/ncs5500/tutorials/ncs5500-qos-part-1-understanding-packet-buffering/

Checking Buffering in action

This second blog post will take concrete examples to illustrate the concepts covered in the first part.
The NCS5500 is based on a VOQ-only, single-lookup and ingress-buffering forwarding architecture.
We will use a lab example to illustrate how the system handles bursts, then we will present the monitoring tools / counters we can use to measure where packets are buffered, and finally we will present the data collected on 500+ NPUs in production.
This should answer frequently asked questions and clarify all potential doubts.

Video

We recommend to start watching this short Youtube video first:
https://www.youtube.com/watch?v=1qXD70_cLK8

Lab test

For this first part, and following customer request, we built a large test bed:

NCS5508 with two line cards 36x100G-A-SE (each card is made of 4x NPU Jericho+, each one handling 9 ports 100G)
27 tester ports 100GE (Spirent) connected to 27 router ports
- 9 ports on LC 4 NPU 0
- 9 ports on LC 4 NPU 1
- 9 ports on LC 6 NPU 0

We generate a background / constant traffic of 80% line rate (80Gbps on each port) between two NPUs. This bi-directional traffic is displayed in purple in the diagram above.
Then, we will use the remaining 9 ports to generate peaks of traffic targeted to the ports Hu0/6/0/0-8 (shown in red in the diagram).
These bursts are lasting 100ms, every second.

On the tester, we didn’t make any specific PPM adjustment and used internal clock.

On the router, no specific configuration either. Interfaces are simply configured with IPv4 addresses (no QoS).

The tests performed are the following:

test 1: 80% background and all the ports bursting at 20%. That means:
- 900ms at 80% LR
- 100ms at 100% LR
  We verify no packets are dropped on the background or the bursts, and we also make sure with the counters that packets are exclusely handled in the OCB.
test 2 :
- 80% background
- other ports bursting at 20%
- one single port bursts at 25%, creating a 5Gbps saturation for 100ms
  Here again, we verify no packets are dropped on the background or the bursts, but also we verify that packets are sent to the DRAM.
  It’s expected since one queue exceeds the threshold and is evicted to the external buffer.

This test is basic but has been requested by several customers to verify we had no drop in such situations. It proves it’s not the case, as designed.

Metrology

In the former test, we used specific counters to verify the buffering behavior.
Let’s review them.

We will collect the following Broadcom counters:

IQM_EnqueuePktCnt: total number of packets handled by the NPU
IDR_MMU_CREDITS: total number of packets moved to DRAM
IQM_EnqueueDscrdPktCnt: total number of packets dropped because of taildrop
IQM_RejectDramIneligiblePktCnt: total number of packets dropped because DRAM was not accessible in read, typically when the bandwidth to DRAM is saturated
and potentially also IDR_FullDramRejectPktsCnt and IDR_PartialDramRejectPktsCnt

Form CLI “show controller npu stats counters-all instance all location all” we can extract: ENQUEUE_PKT_CNT, MMU_IDR_PACKET_COUNTER and ENQ_DISCARDED_PACKET_COUNTER


RP/0/RP0/CPU0:ROUTER#show controller npu stats counters-all instance all location all

FIA Statistics Rack: 0, Slot: 0, Asic instance: 0

Per Block Statistics:

Ingress:

NBI RX:
  RX_TOTAL_BYTE_COUNTER          = 161392268790033002
  RX_TOTAL_PKT_COUNTER           = 164628460653364

IRE:
  CPU_PACKET_COUNTER             = 0
  NIF_PACKET_COUNTER             = 164628460651867
  OAMP_PACKET_COUNTER            = 32771143
  OLP_PACKET_COUNTER             = 4787508
  RCY_PACKET_COUNTER             = 67452938
  IRE_FDT_INTRFACE_CNT           = 192

IDR:
  MMU_IDR_PACKET_COUNTER         = 697231761913
  IDR_OCB_PACKET_COUNTER         = 1

IQM:
  ENQUEUE_PKT_CNT                = 164640311902277
  DEQUEUE_PKT_CNT                = 164640311902198
  DELETED_PKT_CNT                = 0
  ENQ_DISCARDED_PACKET_COUNTER   = 90015441

To get the DRAM reject counters, we will use:

show contr npu stats counters-all detail instance all location all
or if the IOS XR version doesn’t support the “detail” option, use the following instead:
show controllers fia diagshell 0 “diag counters” loc 0/x/CPU0


RP/0/RP0/CPU0:ROUTER#show contr npu stats counters-all detail instance all location all | i Dram

  IDR FullDramRejectPktsCnt            :                0
  IDR FullDramRejectBytesCnt           :                0
  IDR PartialDramRejectPktsCnt         :                0
  IDR PartialDramRejectBytesCnt        :                0
  IQM0 RjctDramIneligiblePktCnt        :                0
  IQM1 RjctDramIneligiblePktCnt        :                0
  IDR FullDramRejectPktsCnt            :                0
  IDR FullDramRejectBytesCnt           :                0
  IDR PartialDramRejectPktsCnt         :                0
  IDR PartialDramRejectBytesCnt        :                0
  IQM0 RjctDramIneligiblePktCnt        :                0
  IQM1 RjctDramIneligiblePktCnt        :                0

--%--SNIP--%--SNIP--%--

None of these counters are available through SNMP / MIB but instead you can use streaming telemetry:

From https://github.com/YangModels/yang/blob/master/vendor/cisco/xr/653/Cisco-IOS-XR-fretta-bcm-dpa-hw-resources-oper-sub2.yang

You’ll found:

ENQUEUE_PKT_CNT: iqm-enqueue-pkt-cnt


    leaf iqm-enqueue-pkt-cnt {
      type uint64;
      description "Counts enqueued packets";

MMU_IDR_PACKET_COUNTER: idr-mmu-if-cnt


    leaf idr-mmu-if-cnt {
      type uint64;
      description "Performance counter of the MMU interface";

ENQ_DISCARDED_PACKET_COUNTER: iqm-enq-discarded-pkt-cnt


    leaf iqm-enq-discarded-pkt-cnt {
      type uint64;
      description "Counts all packets discarded at the ENQ pipe";

At the moment (Apr 2019), RjctDramIneligiblePktCnt / FullDramRejectPktsCnt / PartialDramRejectPktsCnt are not available in the data models and therefor, can’t be streamed.

Auditing real production routers

We have the counters available and we asked multiple customers (25+) to collect data from their production routers.
In total, we had information for 550 NPUs transporting live traffic in multiple network positions:

IP core
MPLS core (P/LSR)
Internet border (transit / peering)
CDN (connected to FB, Akamai, Google Cache, Netflix, …)
PE (L2VPN and L3VPN)
Aggregation
SPDC / ToR leaf

The data aggregated is helpful since it gives a vision of what is happening in reality.
The total amount of traffic measured is tremendous: 24,526,679,839,376,100 packets!!!
Not in lab, not in academic models / simulations, but in real routers.

With the show commands described in former section, we extracted:

ENQUEUE_PKT_CNT: packets transmitted in the NPU
MMU_IDR_PACKET_COUNTER: packets passed to DRAM
ENQ_DISCARDED_PACKET_COUNTER: packets taildropped
RjctDramIneligiblePktCnt: packets drop because of DRAM bandwidth

Dividing MMU_IDR_PACKET_COUNTER by ENQUEUE_PKT_CNT, we can compute the ratio of packets moved to DRAM.
–> 0,151%
This number is an average value and should be considered as such. It shows that indeed, the vast majority of the traffic is handled in OCB (inside the NPU).

Dividing ENQ_DISCARDED_PACKET_COUNTER by ENQUEUE_PKT_CNT, we can compute the ratio of packets taildropped.
–> 0,0358%
Having drops is normal in the life of a router. Multiple reasons here, from TCP windowing to temporary congestion situations.

Finally, RjctDramIneligiblePktCnt will tell us if the link from the NPU to the DRAM can get saturated and drops packets with production traffic.
–> not a single packet discarded in such scenario.


LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | wc -l
    1570
LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | grep " 0" | wc -l
    1570
LAPTOP: nicolas$ grep RjctDramIneligiblePktCnt * | grep -v " 0" | wc -l
       0
LAPTOP: nicolas$

In this chart, we sort by numbers of ENQUEUE_PKT_CNT: it represents the most active ASICs in term of packets handled.

Rank	ENQUEUE_PKT_CNT	MMU_IDR	ENQ_DISC	Ratio DRAM %	Ratio drops %	Network roles
1	527 787 533 239 280	7 369 339 005	1 705 600 246	0,001396	0,000323	IP Core
2	527 731 939 299 538	7 637 629 256	1 692 666 188	0,001447	0,000321	IP Core
3	392 487 675 531 358	111 916 953 940	24 771 334 182	0,028515	0,006311	Peering
4	348 026 620 119 625	1 610 856 619	781 841 479	0,000463	0,000225	IP Core
5	342 309 183 713 774	1 348 042 248	855 820 846	0,000394	0,000250	IP Core
6	327 474 089 745 397	906 227 869	871 575 599	0,000277	0,000266	IP Core
7	312 691 087 570 935	149 450 319	13 211 540 367	0,000048	0,004225	Peering
8	309 754 368 573 783	217 802 498	7 194 870 813	0,000070	0,002323	Peering
9	286 534 007 041 471	7 937 852 208	208 639 446 202	0,002770	0,072815	IP Core
10	285 891 289 921 804	17 316 635 372	208 339 055 090	0,006057	0,072874	IP Core
11	282 857 716 700 675	4 619 576 899	213 929 762 107	0,001633	0,075632	IP Core
12	281 159 664 612 018	13 756 617 725	448 617 960	0,004893	0,000160	MPLS Core
13	263 035 976 927 915	6 226 947 181	191 620 663 098	0,002367	0,072850	IP Core
14	253 469 417 751 706	4 811 103 101	1 078 118 413	0,001898	0,000425	IP Core
15	253 388 455 606 099	4 808 169 387	1 080 970 708	0,001898	0,000427	IP Core
16	249 198 998 283 888	723 211 883	432 154 974	0,000290	0,000173	IP Core
17	245 552 977 886 019	1 672 035 279	1 336 139 650	0,000681	0,000544	IP Core
18	244 787 789 764 429	1 422 882 240	1 211 167 690	0,000581	0,000495	IP Core
19	244 639 576 497 565	1 347 623 535	1 122 306 072	0,000551	0,000459	IP Core
20	244 604 586 809 869	1 935 267 429	1 384 016 976	0,000791	0,000566	IP Core
21	243 908 249 497 111	586 491 218	410 239 247	0,000240	0,000168	IP Core
22	243 639 692 775 431	17 237 153 490	1 434 003 860	0,007075	0,000589	Peering
23	237 152 936 875 785	448 662 754	384 071 603	0,000189	0,000162	IP Core
24	224 477 013 789 647	13 369 954 892	1 319 318 749	0,005956	0,000588	MPLS Core
25	219 820 911 786 839	1 068 821 932	647 226 846	0,000486	0,000294	IP Core
26	205 119 650 216 462	766 453 141	568 411 772	0,000374	0,000277	IP Core
27	203 306 915 869 451	61 364 621 713	12 422 238 060	0,030183	0,006110	Peering
28	194 981 015 445 738	19 793 539 645	117 282 213	0,010152	0,000060	MPLS Core
29	182 104 629 921 870	163 108 290	12 704 233 685	0,000090	0,006976	Peering
30	180 871 118 289 426	1 362 976 328 269	38 037 126 715	0,753562	0,021030	P+PE
31	173 166 873 157 959	397 471 140	353 417 929	0,000230	0,000204	IP Core
32	167 311 796 856 352	1 282 666 496 069	36 120 954 409	0,766632	0,021589	P+PE
33	164 640 767 868 235	697 231 782 299	90 015 446	0,423487	0,000055	CDN
34	164 640 311 902 277	697 231 761 913	90 015 441	0,423488	0,000055	CDN
35	160 506 138 361 929	851 487 161	1 826 760 067	0,000531	0,001138	IP Core
36	158 521 030 661 438	1 391 033 571 161	3 987 222 205	0,877507	0,002515	CDN
37	157 286 154 450 629	14 699 938 426	44 060 250	0,009346	0,000028	Peering
38	154 081 895 058 387	623 038 967	483 267 224	0,000404	0,000314	IP Core
39	143 902 175 998 205	205 944 615 947	595 151 004	0,143114	0,000414	Peering
40	143 686 937 442 122	12 638 644	130 005 734	0,000009	0,000090	Peering
41	142 498 738 296 176	649 883 065	1 404 348 505	0,000456	0,000986	IP Core
42	142 426 983 443 239	645 597 568	1 417 441 644	0,000453	0,000995	IP Core
43	138 083 165 878 093	2 778 355 335	54 030 686	0,002012	0,000039	Peering
44	130 425 299 235 308	235 149 102 989	117 322 562	0,180294	0,000090	CDN
45	125 379 522 379 915	219 241 802 484	77 781 184	0,174863	0,000062	CDN
46	122 178 283 814 177	122 250 106	2 168 387 244	0,000100	0,001775	Peering
47	121 842 623 410 092	419 677 747 284	419 677 747 284	0,344442	0,344442	P+Peering
48	121 842 227 567 846	419 677 746 356	419 677 746 356	0,344444	0,344444	P+Peering
49	119 048 492 308 148	19 756 851 882	1 468 594 303	0,016596	0,001234	Peering
50	118 902 447 078 432	20 140 676 286	1 437 569 523	0,016939	0,001209	Peering

For the most busiest NPUs collected, we see the DRAM ratio and taildrop ratio being actually much smaller than aggregated numbers.

How to read these numbers?

First of all, it demonstrates clearly that most of the packets are handled inside the ASIC, only a very small portion of the traffic being evicted to DRAM.

Second, with RjctDramIneligiblePktCnt being zero in EVERY data collection, we prove that bandwidth from NPU to DRAM (900Gbps unidirectional) is correctly dimensionned. It handles the real burstiness of the traffic without a single drop.

Last, the data collected represents a snapshot. It is recommended to collect these counters regularly and to analyze them with the network activity during the interval.
Having higher numbers in your network may be correlated to a particular outage or specific situation.
Having small numbers, in the other hand, is much easier to read (no drops being… no drops).

Conclusion

In conclusion, the ingress-buffering / VOQ-only model is well adapted for real networks.

We have seen “academic” studies trying to prove the contrary, but the numbers are talking here.

A sandbox, or an imaginary model are not relevant approach.

Production networks deployed all around the world, in different positions/roles, transporting Petabytes of traffic for multiple years, prove the efficiency of this architecture.

Share on

Twitter Facebook Google+ LinkedIn

Nicolas Fevrier