twx-linux/include/uapi/linux
Eric Dumazet afe4fd0624 pkt_sched: fq: Fair Queue packet scheduler
- Uses perfect flow match (not stochastic hash like SFQ/FQ_codel)
- Uses the new_flow/old_flow separation from FQ_codel
- New flows get an initial credit allowing IW10 without added delay.
- Special FIFO queue for high prio packets (no need for PRIO + FQ)
- Uses a hash table of RB trees to locate the flows at enqueue() time
- Smart on demand gc (at enqueue() time, RB tree lookup evicts old
  unused flows)
- Dynamic memory allocations.
- Designed to allow millions of concurrent flows per Qdisc.
- Small memory footprint : ~8K per Qdisc, and 104 bytes per flow.
- Single high resolution timer for throttled flows (if any).
- One RB tree to link throttled flows.
- Ability to have a max rate per flow. We might add a socket option
  to add per socket limitation.

Attempts have been made to add TCP pacing in TCP stack, but this
seems to add complex code to an already complex stack.

TCP pacing is welcomed for flows having idle times, as the cwnd
permits TCP stack to queue a possibly large number of packets.

This removes the 'slow start after idle' choice, hitting badly
large BDP flows, and applications delivering chunks of data
as video streams.

Nicely spaced packets :
Here interface is 10Gbit, but flow bottleneck is ~20Mbit

cwin is big, yet FQ avoids the typical bursts generated by TCP
(as in netperf TCP_RR -- -r 100000,100000)

15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115>
15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115>
15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115>
15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115>
15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115>
15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115>
15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115>
15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>

TCP timestamps show that most packets from B were queued in the same ms
timeframe (TSval 1159799{3,4}), but FQ managed to send them right
in time to avoid a big burst.

In slow start or steady state, very few packets are throttled [1]

FQ gets a bunch of tunables as :

  limit : max number of packets on whole Qdisc (default 10000)

  flow_limit : max number of packets per flow (default 100)

  quantum : the credit per RR round (default is 2 MTU)

  initial_quantum : initial credit for new flows (default is 10 MTU)

  maxrate : max per flow rate (default : unlimited)

  buckets : number of RB trees (default : 1024) in hash table.
               (consumes 8 bytes per bucket)

  [no]pacing : disable/enable pacing (default is enable)

All of them can be changed on a live qdisc.

$ tc qd add dev eth0 root fq help
Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
              [ quantum BYTES ] [ initial_quantum BYTES ]
              [ maxrate RATE  ] [ buckets NUMBER ]
              [ [no]pacing ]

$ tc -s -d qd
qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140
 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
 backlog 0b 0p requeues 14
  511 flows, 511 inactive, 0 throttled
  110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit

[1] Except if initial srtt is overestimated, as if using
cached srtt in tcp metrics. We'll provide a fix for this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 21:38:31 -04:00
..
byteorder
caif caif: Remove my bouncing email address. 2013-04-23 13:25:51 -04:00
can
dvb [media] demux.h: Remove duplicated enum 2013-04-08 06:53:15 -03:00
hdlc
hsi
isdn
mmc
netfilter netfilter: add SYNPROXY core/target 2013-08-28 00:27:54 +02:00
netfilter_arp
netfilter_bridge uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
netfilter_ipv4 uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
netfilter_ipv6 netfilter: fix struct ip6t_frag field description 2013-04-02 12:25:57 +02:00
nfsd
raid UAPI: fix endianness conditionals in linux/raid/md_p.h 2013-03-13 15:21:49 -07:00
spi
sunrpc
tc_act
tc_ematch
usb USB: move the definition of USB_MAXCHILDREN 2013-07-16 15:33:02 -07:00
wimax uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
a.out.h
acct.h UAPI: fix endianness conditionals in linux/acct.h 2013-03-13 15:21:48 -07:00
adb.h
adfs_fs.h
affs_hardblocks.h
agpgart.h
aio_abi.h UAPI: fix endianness conditionals in linux/aio_abi.h 2013-03-13 15:21:48 -07:00
apm_bios.h
arcfb.h
atalk.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_tcp.h
atm_zatm.h
atm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
audit.h audit: Make testing for a valid loginuid explicit. 2013-05-07 22:27:15 -04:00
auto_fs4.h
auto_fs.h
auxvec.h powerpc: Add HWCAP2 aux entry 2013-04-26 16:08:16 +10:00
ax25.h
b1lli.h
baycom.h
bcm933xx_hcs.h MIPS: BCM63XX: recognize Cable Modem firmware format 2013-07-01 15:10:53 +02:00
bfs_fs.h
binfmts.h
blkpg.h
blktrace_api.h
bpqether.h
bsg.h
btrfs.h btrfs: device delete to get errors from the kernel 2013-06-14 11:29:53 -04:00
can.h
capability.h
capi.h
cciss_defs.h
cciss_ioctl.h
cdrom.h
cgroupstats.h
chio.h
cm4000_cs.h
cn_proc.h connector: Added coredumping event to the process connector 2013-03-20 13:23:21 -04:00
coda_psdev.h
coda.h
coff.h
connector.h Drivers: hv: Add a new driver to support host initiated backup 2013-03-15 12:12:36 -07:00
const.h linux/const.h: Add _BITUL() and _BITULL() 2013-06-25 15:50:04 -07:00
cramfs_fs.h
cuda.h
cyclades.h
cycx_cfm.h
dcbnl.h
dccp.h
dlm_device.h
dlm_netlink.h
dlm_plock.h
dlm.h
dlmconstants.h
dm-ioctl.h dm: optimize use SRCU and RCU 2013-07-10 23:41:18 +01:00
dm-log-userspace.h
dn.h uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
dqblk_xfs.h
edd.h
efs_fs_sb.h
elf-em.h
elf-fdpic.h
elf.h metag: ptrace: Implement NT_METAG_TLS 2013-03-27 14:37:47 +00:00
elfcore.h
errno.h
errqueue.h
ethtool.h net: ethtool: disambiguate XCVR_* meaning 2013-05-27 22:42:50 -07:00
eventpoll.h
fadvise.h
falloc.h
fanotify.h
fb.h
fcntl.h
fd.h
fdreg.h
fib_rules.h fib_rules: fix suppressor names and default values 2013-08-03 10:40:23 -07:00
fiemap.h
filter.h filter: add ANC_PAY_OFFSET instruction for loading payload start offset 2013-03-20 13:15:45 -04:00
firewire-cdev.h firewire: fix libdc1394/FlyCap2 iso event regression 2013-07-27 20:24:36 +02:00
firewire-constants.h
flat.h
fs.h mm: make snapshotting pages for stable writes a per-bio operation 2013-04-29 15:54:33 -07:00
fsl_hypervisor.h
fuse.h fuse: add flag to turn on async direct IO 2013-05-01 14:37:21 +02:00
futex.h
gameport.h
gen_stats.h net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
genetlink.h
gfs2_ondisk.h
gigaset_dev.h
hdlc.h
hdlcdrv.h
hdreg.h
hid.h
hiddev.h
hidraw.h
hpet.h
hw_breakpoint.h
hysdn_if.h
i2c-dev.h
i2c.h
i2o-dev.h
i8k.h
icmp.h
icmpv6.h
if_addr.h
if_addrlabel.h
if_alg.h
if_arcnet.h
if_arp.h net: if_arp: add ARPHRD_NETLINK type 2013-06-24 16:39:05 -07:00
if_bonding.h
if_bridge.h uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
if_cablemodem.h if_cablemodem.h: Add parenthesis around ioctl macros 2013-05-08 13:13:30 -07:00
if_eql.h
if_ether.h net: add ETH_P_802_3_MIN 2013-03-28 01:20:42 -04:00
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h rtnl: export physical port id via RT netlink 2013-07-30 17:31:24 -07:00
if_ltalk.h
if_packet.h net: packet: add randomized fanout scheduler 2013-08-29 16:43:29 -04:00
if_phonet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h pptp: fix byte order warnings 2013-08-13 15:10:22 -07:00
if_slip.h
if_team.h
if_tun.h tun: Get skfilter layout 2013-08-21 12:21:45 -07:00
if_tunnel.h
if_vlan.h
if_x25.h
if.h
igmp.h
in6.h
in_route.h
in.h
inet_diag.h
inotify.h
input.h Input: make gamepad API keycodes more clear 2013-06-27 11:54:51 +02:00
ioctl.h
ip6_tunnel.h
ip_vs.h ipvs: SH fallback and L4 hashing 2013-06-26 18:01:46 +09:00
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-26 16:37:08 -04:00
ipc.h
ipmi_msgdefs.h
ipmi.h ipmi: remove superfluous kernel/userspace explanation 2013-02-27 19:10:21 -08:00
ipsec.h
ipv6_route.h
ipv6.h ipv6: drop fragmented ndisc packets by default (RFC 6980) 2013-08-29 15:32:08 -04:00
ipx.h
irda.h
irqnr.h
isdn_divertif.h
isdn_ppp.h
isdn.h
isdnif.h
iso_fs.h
ivtv.h
ivtvfb.h
ixjuser.h
jffs2.h
joystick.h
Kbuild Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-07-13 14:52:21 -07:00
kd.h
kdev_t.h
kernel-page-flags.h
kernel.h
kernelcapi.h
kexec.h
keyboard.h
keyctl.h
kvm_para.h
kvm.h Main features: 2013-07-03 10:31:38 -07:00
l2tp.h
limits.h
llc.h
loop.h
lp.h
magic.h hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h> 2013-05-04 15:48:44 -04:00
major.h
map_to_7segment.h
matroxfb.h
mdio.h
media.h [media] media: add support for decoder as one of media entity types 2013-03-21 14:05:31 -03:00
mei.h
mempolicy.h
meye.h
mii.h
minix_fs.h
mman.h
mmtimer.h
module.h
mqueue.h
mroute6.h
mroute.h
msdos_fs.h fatfs: add FAT_IOCTL_GET_VOLUME_ID 2013-07-09 10:33:25 -07:00
msg.h
mtio.h
n_r3964.h
nbd.h nbd: support FLUSH requests 2013-02-27 19:10:22 -08:00
ncp_fs.h
ncp_mount.h
ncp_no.h
ncp.h
neighbour.h vxlan: generalize forwarding tables 2013-03-17 12:23:46 -04:00
net_dropmon.h
net_tstamp.h
net.h
netconf.h
netdevice.h
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h
netfilter_ipv6.h
netfilter.h
netlink_diag.h netlink: add RX/TX-ring support to netlink diag 2013-04-19 14:57:58 -04:00
netlink.h netlink: mmaped netlink: ring setup 2013-04-19 14:57:57 -04:00
netrom.h
nfc.h NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD 2013-07-31 01:19:43 +02:00
nfs2.h
nfs3.h
nfs4_mount.h
nfs4.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfs.h
nfsacl.h
nl80211.h nl80211/cfg80211: add channel switch command 2013-08-01 18:30:28 +02:00
nubus.h
nvram.h
omap3isp.h
omapfb.h
oom.h
openvswitch.h openvswitch: Add SCTP support 2013-08-26 14:03:13 -07:00
packet_diag.h sock_diag: allow to dump bpf filters 2013-04-29 13:21:30 -04:00
param.h
parport.h
patchkey.h
pci_regs.h PCI: Fix comment typo for PCI_EXP_LNKCAP_CLKPM 2013-05-29 14:46:24 -06:00
pci.h
perf_event.h perf/x86/intel: Support Haswell/v4 LBR format 2013-06-19 14:43:35 +02:00
personality.h
pfkeyv2.h
pg.h
phantom.h
phonet.h
pkt_cls.h
pkt_sched.h pkt_sched: fq: Fair Queue packet scheduler 2013-08-29 21:38:31 -04:00
pktcdvd.h
pmu.h
poll.h
posix_types.h
ppdev.h
ppp_defs.h
ppp-comp.h
ppp-ioctl.h
pps.h
prctl.h
ptp_clock.h
ptrace.h ptrace: add ability to get/set signal-blocked mask 2013-07-03 16:08:01 -07:00
qnx4_fs.h
qnxtypes.h
quota.h
radeonfb.h
random.h
raw.h
rds.h
reboot.h
reiserfs_fs.h
reiserfs_xattr.h
resource.h
rfkill.h rfkill: Add NFC to the list of supported radios 2013-04-12 16:54:38 +02:00
romfs_fs.h
rose.h
route.h
rtc.h
rtnetlink.h tcp: introduce a per-route knob for quick ack 2013-06-19 23:06:51 -07:00
scc.h
sched.h
screen_info.h
sctp.h net: sctp: trivial: update mailing list address 2013-07-24 17:53:38 -07:00
sdla.h
seccomp.h
securebits.h
selinux_netlink.h
sem.h
serial_core.h ARM SoC late changes 2013-07-02 14:42:51 -07:00
serial_reg.h
serial.h
serio.h
shm.h
signal.h
signalfd.h
snmp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
sock_diag.h
socket.h
sockios.h
som.h
sonet.h
sonypi.h
sound.h
soundcard.h
stat.h
stddef.h
string.h
suspend_ioctls.h
swab.h
synclink.h
sysctl.h
sysinfo.h
taskstats.h
tcp_metrics.h
tcp.h tcp: TCP_NOTSENT_LOWAT socket option 2013-07-24 17:54:48 -07:00
telephony.h
termios.h
time.h timekeeping: Add CLOCK_TAI clockid 2013-03-22 16:19:59 -07:00
times.h
timex.h
tiocl.h
tipc_config.h tipc: update code comments to reflect new uapi header path 2013-06-17 15:53:00 -07:00
tipc.h tipc: update code comments to reflect new uapi header path 2013-06-17 15:53:00 -07:00
toshiba.h
tty_flags.h
tty.h
types.h
udf_fs_i.h
udp.h
uhid.h
uinput.h
uio.h
ultrasound.h
un.h
unistd.h
unix_diag.h net: fix *_DIAG_MAX constants 2013-03-21 12:36:33 -04:00
usbdevice_fs.h
utime.h
utsname.h
uuid.h
uvcvideo.h
v4l2-common.h
v4l2-controls.h [media] v4l2-controls.h: fix copy-and-paste error in comment 2013-06-28 15:07:16 -03:00
v4l2-dv-timings.h [media] v4l2-dv-timings.h: add 480i59.94 and 576i50 CEA-861-E timings 2013-04-14 19:56:36 -03:00
v4l2-mediabus.h [media] soc_camera: Add RGB666 & RGB888 formats 2013-04-04 19:40:08 -03:00
v4l2-subdev.h
veth.h
vfio.h vfio Updates for v3.11 2013-07-10 14:50:08 -07:00
vhost.h tcm_vhost: header split up 2013-05-02 13:40:15 +03:00
videodev2.h [media] v4l2-core: remove support for obsolete VIDIOC_DBG_G_CHIP_IDENT 2013-06-21 10:46:44 -03:00
virtio_9p.h
virtio_balloon.h virtio: do not export "u16" and "u64" to userspace 2013-04-02 16:42:58 +10:30
virtio_blk.h
virtio_config.h virtio: VIRTIO_F_ANY_LAYOUT feature 2013-07-09 10:47:45 +09:30
virtio_console.h Simple warning fix for module sections. If too late to pull, no big deal. 2013-07-03 13:09:06 -07:00
virtio_ids.h caif_virtio: Introduce caif over virtio 2013-03-20 14:06:06 +10:30
virtio_net.h uapi: Convert some uses of 6 to ETH_ALEN 2013-08-02 12:33:54 -07:00
virtio_pci.h virtio_pci: better macro exported in uapi 2013-05-20 12:08:09 +09:30
virtio_ring.h
virtio_rng.h
vm_sockets.h VSOCK: Split vm_sockets.h into kernel/uapi 2013-03-08 12:24:48 -05:00
vt.h
wait.h
wanrouter.h
watchdog.h
wimax.h
wireless.h
x25.h
xattr.h hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes 2013-02-27 19:10:10 -08:00
xfrm.h xfrm: allow to avoid copying DSCP during encapsulation 2013-03-06 07:02:45 +01:00