The round-robin rr_tx_counter was shared across CPUs leading to
significant cache thrashing at high packet rates. This patch switches
the round-robin packet counter to use a per-cpu variable to decide
the destination slave.
On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
(-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
this patch.
"perf top -e cache_misses" before:
12.31% [bonding] [k] bond_xmit_roundrobin_slave_get
10.59% [sch_fq_codel] [k] fq_codel_dequeue
9.34% [kernel] [k] skb_release_data
after:
15.42% [sch_fq_codel] [k] fq_codel_dequeue
10.06% [kernel] [k] __memset
9.12% [kernel] [k] skb_release_data
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
||
|---|---|---|
| .. | ||
| acpi | ||
| asm-generic | ||
| clocksource | ||
| crypto | ||
| drm | ||
| dt-bindings | ||
| keys | ||
| kunit | ||
| kvm | ||
| linux | ||
| math-emu | ||
| media | ||
| memory | ||
| misc | ||
| net | ||
| pcmcia | ||
| ras | ||
| rdma | ||
| scsi | ||
| soc | ||
| sound | ||
| target | ||
| trace | ||
| uapi | ||
| vdso | ||
| video | ||
| xen | ||