twx-linux/include/net
Talal Ahmad f1a456f8f3 net: avoid double accounting for pure zerocopy skbs
Track skbs with only zerocopy data and avoid charging them to kernel
memory to correctly account the memory utilization for msg_zerocopy.
All of the data in such skbs is held in user pages which are already
accounted to user. Before this change, they are charged again in
kernel in __zerocopy_sg_from_iter. The charging in kernel is
excessive because data is not being copied into skb frags. This
excessive charging can lead to kernel going into memory pressure
state which impacts all sockets in the system adversely. Mark pure
zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
charge/uncharge for data in such skbs.

Initially, an skb is marked pure zerocopy when it is empty and in
zerocopy path. skb can then change from a pure zerocopy skb to mixed
data skb (zerocopy and copy data) if it is at tail of write queue and
there is room available in it and non-zerocopy data is being sent in
the next sendmsg call. At this time sk_mem_charge is done for the pure
zerocopied data and the pure zerocopy flag is unmarked. We found that
this happens very rarely on workloads that pass MSG_ZEROCOPY.

A pure zerocopy skb can later be coalesced into normal skb if they are
next to each other in queue but this patch prevents coalescing from
happening. This avoids complexity of charging when skb downgrades from
pure zerocopy to mixed. This is also rare.

In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in
tcp_skb_entail for an skb without data.

Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
with zerocopy showed that before this patch the 'sock' variable in
memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
change it is 0. This is due to no charge to sk_forward_alloc for
zerocopy data and shows memory utilization for kernel is lowered.

Signed-off-by: Talal Ahmad <talalahmad@google.com>
Acked-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 16:33:27 -07:00
..
9p 9p: apply review requests for fid refcounting 2020-11-19 17:21:34 +01:00
bluetooth Bluetooth: Rename driver .prevent_wake to .wakeup 2021-10-01 15:46:15 -07:00
caif net: remove the caif_hsi driver 2021-07-01 13:19:48 -07:00
iucv net/af_iucv: don't track individual TX skbs for TRANS_HIPER sockets 2021-01-28 20:36:21 -08:00
netfilter netfilter: nft_payload: support for inner header matching / mangling 2021-11-01 09:31:03 +01:00
netns netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 2021-09-28 13:04:55 +02:00
nfc nfc: hci: pass callback data param as pointer in nci_request() 2021-08-02 15:11:37 +01:00
phonet
sctp sctp: fix transport encap_port update in sctp_vtag_verify 2021-10-15 11:21:10 +01:00
tc_act net/sched: act_vlan: Fix modify to allow 0 2021-06-01 16:54:42 -07:00
6lowpan.h 6lowpan: Replace zero-length array with flexible-array member 2020-02-28 14:51:30 +01:00
act_api.h net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types 2021-10-18 12:54:41 +01:00
addrconf.h net: bridge: mcast: fix broken length + header check for MRDv6 Adv. 2021-04-27 14:02:06 -07:00
af_ieee802154.h
af_rxrpc.h afs: Don't truncate iter during data fetch 2021-04-23 10:17:26 +01:00
af_unix.h af_unix: Add unix_stream_proto for sockmap 2021-08-16 18:43:39 -07:00
af_vsock.h af_vsock: rest of SEQPACKET support 2021-06-11 13:32:46 -07:00
ah.h
amt.h amt: add mld report message handler 2021-11-01 13:36:09 +00:00
arp.h net: avoid potential false sharing in neighbor related code 2019-11-06 16:14:48 -08:00
atmclip.h
ax25.h ax25: constify dev_addr passing 2021-10-13 09:40:45 -07:00
ax88796.h ax88796: export ax_NS8390_init() hook 2021-08-03 13:05:25 +01:00
bareudp.h bareudp: Reverted support to enable & disable rx metadata collection 2020-07-21 18:30:47 -07:00
bond_3ad.h bonding: add new option lacp_active 2021-08-03 11:50:22 +01:00
bond_alb.h bonding/alb: Add helper functions to get the xmit slave 2020-05-01 12:15:37 -07:00
bond_options.h bonding: add new option lacp_active 2021-08-03 11:50:22 +01:00
bonding.h bonding: remove extraneous definitions from bonding.h 2021-08-11 14:57:31 -07:00
bpf_sk_storage.h bpf: struct sock is declared twice in bpf_sk_storage header 2021-03-26 17:43:55 +01:00
busy_poll.h net: avoid dirtying sk->sk_napi_id 2021-10-25 18:02:12 -07:00
calipso.h
cfg80211-wext.h
cfg80211.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
cfg802154.h cfg802154: Replace zero-length array with flexible-array member 2020-02-29 14:39:08 +01:00
checksum.h csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() 2021-06-10 11:45:14 -04:00
cipso_ipv4.h cipso: Remove unused inline functions 2020-07-15 07:45:24 -07:00
cls_cgroup.h bpf: Allow to retrieve cgroup v1 classid from v2 hooks 2020-03-27 19:40:38 -07:00
codel_impl.h fq_codel: generalise ce_threshold marking for subset of traffic 2021-10-20 15:24:36 -07:00
codel_qdisc.h
codel.h fq_codel: generalise ce_threshold marking for subset of traffic 2021-10-20 15:24:36 -07:00
compat.h net/ipv4/ipv6: Replace one-element arraya with flexible-array members 2021-08-05 11:46:42 +01:00
datalink.h llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
dcbevent.h
dcbnl.h
devlink.h ethtool: don't drop the rtnl_lock half way thru the ioctl 2021-11-01 13:26:07 +00:00
dn_dev.h
dn_fib.h net: convert fib_treeref from int to refcount_t 2021-07-30 15:33:24 +02:00
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h decnet: constify dev_addr passing 2021-10-13 09:40:46 -07:00
dsa.h net: dsa: populate supported_interfaces member 2021-11-01 13:06:32 +00:00
dsfield.h ipv6: Annotate bitwise IPv6 dsfield pointer cast 2019-12-16 16:09:44 -08:00
dst_cache.h
dst_metadata.h net: validate lwtstate->data before returning from skb_tunnel_info() 2021-07-09 13:55:53 -07:00
dst_ops.h net/dst: use a smaller percpu_counter batch for dst entries accounting 2020-05-08 21:33:33 -07:00
dst.h sk_buff: track dst status in slow_gro 2021-07-29 12:18:11 +01:00
erspan.h erspan: Add type I version 0 support. 2020-05-05 13:23:29 -07:00
esp.h ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
espintcp.h xfrm: espintcp: save and call old ->sk_destruct 2020-04-20 07:34:16 +02:00
ethoc.h
failover.h
fib_notifier.h ipv6: Remove old route notifications and convert listeners 2019-12-24 22:37:30 -08:00
fib_rules.h fib: use indirect call wrappers in the most common fib_rules_ops 2020-07-28 17:42:31 -07:00
firewire.h
flow_dissector.h flow_dissector: constify raw input data argument 2021-03-14 14:46:32 -07:00
flow_offload.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
flow.h flow: fix object-size-mismatch warning in flowi{4,6}_to_flowi_common() 2021-09-02 11:44:19 +01:00
fou.h
fq_impl.h net/fq_impl: do not maintain a backlog-sorted list of flows 2021-01-21 13:33:45 +01:00
fq.h net/fq_impl: do not maintain a backlog-sorted list of flows 2021-01-21 13:33:45 +01:00
garp.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
gen_stats.h net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
genetlink.h mptcp: avoid lock_fast usage in accept path 2021-02-12 16:31:46 -08:00
geneve.h
gre.h ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
gro_cells.h
gro.h gro: add combined call_gro_receive() + INDIRECT_CALL_INET() helper 2021-03-18 19:51:12 -07:00
gtp.h
gue.h GUE: Fix a typo 2020-06-22 21:12:44 -07:00
hwbm.h net: hwbm: if CONFIG_NET_HWBM unset, make stub functions static 2019-10-25 16:24:32 -07:00
icmp.h ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
ieee80211_radiotap.h mac80211: Use flex-array for radiotap header bitmap 2021-08-13 09:58:25 +02:00
ieee802154_netdev.h
if_inet6.h ipv6: add IFLA_INET6_RA_MTU to expose mtu value 2021-08-27 17:29:18 -07:00
ife.h
ila.h
inet6_connection_sock.h
inet6_hashtables.h net: Track socket refcounts in skb_steal_sock() 2020-03-30 13:45:04 -07:00
inet_common.h bpf: Allow rewriting to ports under ip_unprivileged_port_start 2021-01-27 18:18:15 -08:00
inet_connection_sock.h tcp: switch orphan_count to bare per-cpu counters 2021-10-15 11:28:34 +01:00
inet_ecn.h net: add skb_get_dsfield() helper 2021-10-15 11:33:08 +01:00
inet_frag.h inet: frags: batch fqdir destroy works 2020-12-12 15:08:54 -08:00
inet_hashtables.h tcp: seq_file: Replace listening_hash with lhash2 2021-07-23 16:44:57 -07:00
inet_sock.h tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex 2021-10-25 18:02:12 -07:00
inet_timewait_sock.h tcp: honor SO_PRIORITY in TIME_WAIT state 2019-09-27 12:05:02 +02:00
inetpeer.h
ioam6.h ipv6: ioam: Distinguish input and output for hop-limit 2021-10-04 12:53:35 +01:00
ip6_checksum.h tcp: remove indirect calls for icsk->icsk_af_ops->send_check 2020-06-20 17:47:53 -07:00
ip6_fib.h ipv6: correct comments about fib6_node sernum 2021-08-24 09:59:01 +01:00
ip6_route.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-05 15:08:47 -07:00
ip6_tunnel.h
ip_fib.h net: ipv4: Fix rtnexthop len when RTA_FLOW is present 2021-09-24 14:07:10 +01:00
ip_tunnels.h ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
ip_vs.h ipvs: add sysctl_run_estimation to support disable estimation 2021-10-07 19:52:58 +02:00
ip.h ipv4: guard IP_MINTTL with a static key 2021-10-25 18:02:14 -07:00
ipcomp.h
ipconfig.h
ipv6_frag.h ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module 2020-11-19 10:49:50 -08:00
ipv6_stubs.h ipv6: add ipv6_dev_find to stubs 2021-03-30 13:29:39 -07:00
ipv6.h ipv6: guard IPV6_MINHOPCOUNT with a static key 2021-10-25 18:02:13 -07:00
iw_handler.h
kcm.h
l3mdev.h l3mdev: add infrastructure for table to VRF mapping 2020-06-20 17:22:22 -07:00
lag.h
lapb.h net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: fix sk_buff leak in llc_conn_service() 2019-10-08 13:23:05 -07:00
llc_if.h llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
llc_pdu.h net: llc: fix skb_over_panic 2021-07-27 13:05:56 +01:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
lwtunnel.h netfilter: add netfilter hooks to SRv6 data plane 2021-08-30 01:51:36 +02:00
mac80211.h Quite a few changes: 2021-10-22 10:20:56 -07:00
mac802154.h
macsec.h net: macsec: fix the length used to copy the key for offloading 2021-06-24 12:41:12 -07:00
mctp.h mctp: Pass flow data & flow release events to drivers 2021-10-29 13:23:51 +01:00
mctpdevice.h mctp: Pass flow data & flow release events to drivers 2021-10-29 13:23:51 +01:00
mip6.h net: mip6: Replace zero-length array with flexible-array member 2020-03-02 11:16:27 -08:00
mld.h mld: add new workqueues for process mld events 2021-03-26 15:14:56 -07:00
mpls_iptunnel.h net: mpls: Replace zero-length array with flexible-array member 2020-02-28 12:08:37 -08:00
mpls.h net: Make mpls_entry_encode() available for generic users 2020-05-29 21:20:20 -07:00
mptcp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
mrp.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
ncsi.h
ndisc.h ipv6: constify dev_addr passing 2021-10-13 09:40:46 -07:00
neighbour.h net: annotate data-race in neigh_output() 2021-10-26 13:44:18 +01:00
net_failover.h
net_namespace.h netfilter: remove xt pernet data 2021-08-01 12:00:51 +02:00
net_ratelimit.h
netevent.h
netlabel.h
netlink.h net: netlink: add the case when nlh is NULL 2021-07-27 11:43:50 +01:00
netprio_cgroup.h netprio: use css ID instead of cgroup ID 2019-11-12 08:18:03 -08:00
netrom.h
nexthop.h net: ipv4: Fix rtnexthop len when RTA_FLOW is present 2021-09-24 14:07:10 +01:00
nl802154.h
nsh.h
p8022.h
page_pool.h page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA 2021-10-15 10:54:20 +01:00
pie.h pie: realign comment 2020-03-04 13:25:55 -08:00
ping.h
pkt_cls.h net: sch_tbf: Add a graft command 2021-10-19 12:24:51 +01:00
pkt_sched.h net: prevent user from passing illegal stab size 2021-09-26 11:09:07 +01:00
pptp.h
protocol.h net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
psample.h psample: Add a fwd declaration for skbuff 2021-08-09 15:34:21 -07:00
psnap.h
raw.h
rawv6.h
red.h sch_red: fix off-by-one checks in red_check_params() 2021-03-25 17:40:43 -07:00
regulatory.h net/wireless: regulatory.h: drop duplicate word in comment 2020-07-31 09:24:23 +02:00
request_sock.h tcp: bpf: Optionally store mac header in TCP_SAVE_SYN 2020-08-24 14:35:00 -07:00
rose.h rose: constify dev_addr passing 2021-10-13 09:40:45 -07:00
route.h lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
rpl.h net: ipv6: Use struct_size() helper and kcalloc() 2020-06-23 20:27:09 -07:00
rsi_91x.h
rtnetlink.h net: add extack arg for link ops 2021-08-04 10:01:26 +01:00
rtnh.h
sch_generic.h net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap() 2021-10-27 17:09:26 -07:00
scm.h fs: Move __scm_install_fd() to __receive_fd() 2020-07-13 11:03:44 -07:00
secure_seq.h
seg6_hmac.h
seg6_local.h
seg6.h seg6: fix seg6_validate_srh() to avoid slab-out-of-bounds 2020-06-04 15:39:32 -07:00
selftests.h net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
slhc_vj.h
smc.h net/smc: introduce CHID callback for ISM devices 2020-09-28 15:19:03 -07:00
snmp.h net/tls: add skeleton of MIB statistics 2019-10-05 16:29:00 -07:00
sock_reuseport.h tcp: Add reuseport_migrate_sock() to select a new listener. 2021-06-15 18:01:05 +02:00
sock.h tcp: rename sk_wmem_free_skb 2021-11-01 16:33:27 -07:00
Space.h wan: remove sbni/granch driver 2021-08-03 13:05:26 +01:00
stp.h
strparser.h
switchdev.h net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device 2021-10-27 14:54:02 +01:00
tcp_states.h
tcp.h net: avoid double accounting for pure zerocopy skbs 2021-11-01 16:33:27 -07:00
timewait_sock.h
tipc.h
tls_toe.h net/tls: rename tls_hw_* functions tls_toe_* 2019-10-04 14:07:07 -07:00
tls.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-10-28 10:43:58 -07:00
transp_v6.h tcp: move ipv4_specific to tcp include file 2020-06-23 20:10:15 -07:00
tso.h net: tso: cache transport header length 2020-06-18 20:46:23 -07:00
tun_proto.h
udp_tunnel.h udp: call udp_encap_enable for v6 sockets when enabling encap 2021-02-04 18:37:14 -08:00
udp.h net: multicast: calculate csum of looped-back and forwarded packets 2021-10-26 13:09:22 +01:00
udplite.h
vsock_addr.h vsock: remove include/linux/vm_sockets.h file 2019-11-14 18:12:17 -08:00
vxlan.h net: sched: only keep the available bits when setting vxlan md->gbp 2020-09-14 16:49:39 -07:00
wext.h
x25.h net/x25: add new state X25_STATE_5 2019-12-09 10:28:43 -08:00
x25device.h
xdp_priv.h page_pool: do not release pool until inflight == 0. 2019-11-16 12:39:10 -08:00
xdp_sock_drv.h xsk: Batched buffer allocation for the pool 2021-09-28 00:18:34 +02:00
xdp_sock.h xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
xdp.h bpf, xdp, docs: Correct some English grammar and spelling 2021-09-30 23:23:49 +02:00
xfrm.h xfrm: Add possibility to set the default to block if we have no policy 2021-07-21 09:49:19 +02:00
xsk_buff_pool.h xsk: Optimize for aligned case 2021-09-28 00:18:35 +02:00