twx-linux/include/net
Tom Herbert 47d3d7ac65 ipv6: Implement limits on Hop-by-Hop and Destination options
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel]                    [k] __fib6_clean_all
  21.63%  [kernel]                    [k] ip6_parse_tlv
   4.21%  [kernel]                    [k] __local_bh_enable_ip
   2.18%  [kernel]                    [k] ip6_pol_route.isra.39
   1.98%  [kernel]                    [k] fib6_walk_continue
   1.88%  [kernel]                    [k] _raw_write_lock_bh
   1.65%  [kernel]                    [k] dst_release

This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
  - Limit the number of options in a Hop-by-Hop or Destination options
    extension header.
  - Limit the byte length of a Hop-by-Hop or Destination options
    extension header.
  - Disallow unrecognized options in a Hop-by-Hop or Destination
    options extension header.

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

These limits have being proposed in draft-ietf-6man-rfc6434-bis.

Tested (by Martin Lau)

I tested out 1 thread (i.e. one raw_udp process).

I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.

v2;
  - Code and documention cleanup.
  - Change references of RFC2460 to be RFC8200.
  - Add reference to RFC6434-bis where the limits will be in standard.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 09:50:22 +09:00
..
9p 9p: Implement show_options 2017-07-11 06:08:58 -04:00
bluetooth Bluetooth: Use bt_dev_err and bt_dev_info when possible 2017-10-30 12:25:45 +02:00
caif
iucv
netfilter netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" 2017-09-08 18:55:50 +02:00
netns ipv6: Implement limits on Hop-by-Hop and Destination options 2017-11-03 09:50:22 +09:00
nfc NFC: Add nfc_dbg() macro 2017-04-05 10:15:20 +02:00
phonet net: phonet: mark phonet_protocol as const 2017-10-07 23:15:08 +01:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-30 21:09:24 +09:00
tc_act sched: act: ife: update parameters via rcu handling 2017-10-12 22:23:03 -07:00
6lowpan.h 6lowpan: Fix IID format for Bluetooth 2017-04-12 22:02:36 +02:00
act_api.h net: sched: introduce per-egress action device callbacks 2017-10-11 20:15:43 -07:00
addrconf.h net: display hw address of source machine during ipv6 DAD failure 2017-11-01 20:53:49 +09:00
af_ieee802154.h
af_rxrpc.h rxrpc: Provide functions for allowing cleaner handling of signals 2017-10-18 11:42:48 +01:00
af_unix.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-21 03:38:43 +01:00
af_vsock.h VSOCK: use TCP state constants for sk_state 2017-10-05 18:44:17 -07:00
ah.h
arp.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
atmclip.h
ax25.h net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
ax88796.h
bond_3ad.h bonding: 3ad: apply ad_actor settings changes immediately 2016-02-09 04:45:49 -05:00
bond_alb.h
bond_options.h bonding: Prevent duplicate userspace notification 2017-05-27 18:51:41 -04:00
bonding.h bonding: remove rtmsg_ifinfo called after bond_lower_state_changed 2017-10-25 10:54:39 +09:00
busy_poll.h net: fix compilation when busy poll is not enabled 2017-08-11 14:59:24 -07:00
calipso.h net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
cfg80211-wext.h
cfg80211.h cfg80211/nl80211: add a port authorized event 2017-10-02 14:08:27 +02:00
cfg802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
checksum.h csum: eliminate sparse warning in remcsum_unadjust() 2017-01-20 12:12:13 -05:00
cipso_ipv4.h net, ipv4: convert cipso_v4_doi.refcount from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
cls_cgroup.h cls_cgroup: get sk_classid only from full sockets 2016-04-19 20:09:25 -04:00
codel_impl.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
codel_qdisc.h net_sched: fq_codel: cache skb->truesize into skb->cb 2016-06-25 12:19:35 -04:00
codel.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
compat.h packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
datalink.h
dcbevent.h
dcbnl.h
devlink.h devlink: Add IPv6 header for dpipe 2017-08-31 14:42:19 -07:00
dn_dev.h
dn_fib.h net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
dn_neigh.h
dn_nsp.h net/decnet: Convert timers to use timer_setup() 2017-10-18 12:39:36 +01:00
dn_route.h
dn.h net/decnet: Convert timers to use timer_setup() 2017-10-18 12:39:36 +01:00
dsa.h net: dsa: remove port masks 2017-10-28 00:00:09 +09:00
dsfield.h
dst_cache.h net: add dst_cache support 2016-02-16 20:21:48 -05:00
dst_metadata.h bpf: don't rely on the verifier lock for metadata_dst allocation 2017-10-10 12:30:16 -07:00
dst_ops.h net: add confirm_neigh method to dst_ops 2017-02-07 13:07:46 -05:00
dst.h net: updating dst lastusage is an unlikely event. 2017-10-27 11:52:52 +09:00
erspan.h gre: introduce native tunnel support for ERSPAN 2017-08-22 14:29:30 -07:00
esp.h esp6: Reorganize esp_output 2017-04-14 10:06:42 +02:00
ethoc.h
fib_notifier.h net: Add extack to fib_notifier_info 2017-11-01 11:50:43 +09:00
fib_rules.h net: fib_rules: Implement notification logic in core 2017-08-03 15:35:59 -07:00
firewire.h
flow_dissector.h flow_dissector: Cleanup control flow 2017-09-05 11:40:08 -07:00
flow.h net: Extend struct flowi6 with multipath hash 2017-08-24 18:21:17 -07:00
fou.h fou: Add encap ops for IPv6 tunnels 2016-05-20 18:03:16 -04:00
fq_impl.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-30 21:09:24 +09:00
fq.h fq: support filtering a given tin 2017-10-11 09:49:34 +02:00
garp.h
gen_stats.h net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
genetlink.h genetlink: remove ops_list from genetlink header. 2017-06-05 10:54:55 -04:00
geneve.h net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
gre.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-18 01:17:32 -04:00
gro_cells.h gro_cells: move to net/core/gro_cells.c 2017-02-08 14:38:18 -05:00
gtp.h gtp: #define #define _GTP_H_ and not #define _GTP_H 2016-07-25 17:55:43 -07:00
gue.h
hwbm.h net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
icmp.h net: snmp: kill STATS_BH macros 2016-04-27 22:48:25 -04:00
ieee80211_radiotap.h wireless: radiotap: rewrite the radiotap header file 2017-01-25 16:00:33 +01:00
ieee802154_netdev.h mac802154: constify ieee802154_llsec_ops structure 2016-01-04 20:40:41 +01:00
if_inet6.h net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
ife.h net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ila.h ila: Add generic ILA translation facility 2015-12-15 23:25:20 -05:00
inet6_connection_sock.h inet: drop ->bind_conflict 2017-01-18 13:04:28 -05:00
inet6_hashtables.h net: ipv6: add second dif to inet6 socket lookups 2017-08-07 11:39:22 -07:00
inet_common.h net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_connection_sock.h inet/connection_sock: Convert timers to use timer_setup() 2017-10-18 12:39:55 +01:00
inet_ecn.h net-ipv6: remove unused IP6_ECN_clear() function 2017-10-01 17:55:54 -07:00
inet_frag.h inet: frags: Convert timers to use timer_setup() 2017-10-18 12:39:55 +01:00
inet_hashtables.h net: ipv4: add second dif to inet socket lookups 2017-08-07 11:39:21 -07:00
inet_sock.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-30 21:09:24 +09:00
inet_timewait_sock.h ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.h inetpeer: remove AVL implementation in favor of RB tree 2017-07-17 08:59:01 -07:00
ip6_checksum.h ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short 2016-03-13 23:55:13 -04:00
ip6_fib.h ipv6: take care of rt6_stats 2017-10-07 21:22:58 +01:00
ip6_route.h ipv6: prepare fib6_age() for exception table 2017-10-07 21:22:57 +01:00
ip6_tunnel.h ip6_tunnel: Allow policy-based routing through tunnels 2017-04-21 13:21:30 -04:00
ip_fib.h net: ipv4: remove fib_weight 2017-09-29 06:19:32 +01:00
ip_tunnels.h ipv4: speedup ipv6 tunnels dismantle 2017-09-19 16:32:24 -07:00
ip_vs.h ipvs: remove unused function ip_vs_set_state_timeout 2017-04-28 12:00:10 +02:00
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
ipcomp.h
ipconfig.h
ipv6.h ipv6: Implement limits on Hop-by-Hop and Destination options 2017-11-03 09:50:22 +09:00
ipx.h net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
iw_handler.h wext: uninline stream addition functions 2017-01-13 09:38:42 +01:00
kcm.h kcm: Use stream parser 2016-08-17 19:36:23 -04:00
l3mdev.h net: ipv4: Do not drop to make_route if oif is l3mdev 2016-10-13 12:05:26 -04:00
lapb.h net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
lib80211.h
llc_c_ac.h net: LLC: Convert timers to use timer_setup() 2017-10-25 12:06:25 +09:00
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
lwtunnel.h net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
mac80211.h mac80211: add documentation to ieee80211_rx_ba_offl() 2017-09-21 11:42:00 +02:00
mac802154.h ieee802154: cleanup WARN_ON for fc fetch 2016-07-08 13:23:12 +02:00
mip6.h
mld.h
mpls_iptunnel.h net: mpls: Increase max number of labels for lwt encap 2017-04-01 20:21:44 -07:00
mpls.h openvswitch: use mpls_hdr 2016-10-03 02:00:22 -04:00
mrp.h
ncsi.h net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
ndisc.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
neighbour.h neigh: make strucrt neigh_table::entry_size unsigned int 2017-09-25 20:36:17 -07:00
net_namespace.h net: core: Make the FIB notification chain generic 2017-08-03 15:35:59 -07:00
net_ratelimit.h
netevent.h neigh: Send a notification when DELAY_PROBE_TIME changes 2016-07-05 09:06:29 -07:00
netlabel.h net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
netlink.h netlink: fix nla_put_{u8,u16,u32} for KASAN 2017-09-25 20:18:27 -07:00
netprio_cgroup.h net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
netrom.h net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nexthop.h
nl802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
nsh.h net: add NSH header structures and helpers 2017-08-29 15:16:52 -07:00
p8022.h
ping.h net: ping: make ping_v6_sendmsg static 2016-03-23 22:09:58 -04:00
pkt_cls.h net: sched: remove ndo_setup_tc check from tc_can_offload 2017-11-02 16:10:39 +09:00
pkt_sched.h net/sched: Add support for HW offloading for CBS 2017-10-27 09:49:24 -07:00
pptp.h pptp: Refactor the struct and macros of PPTP codes 2016-08-15 10:55:53 -07:00
protocol.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
psample.h net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
psnap.h
raw.h net: ipv4: add second dif to raw socket lookups 2017-08-07 11:39:21 -07:00
rawv6.h net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
red.h ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
regulatory.h
request_sock.h tcp: socket option to set TCP fast open key 2017-10-20 13:21:36 +01:00
rose.h
route.h udp: perform source validation for mcast early demux 2017-10-01 03:55:47 +01:00
rtnetlink.h rtnetlink: remove __rtnl_af_unregister 2017-10-04 10:33:59 -07:00
sch_generic.h net: sched: Identify hardware traffic classes using classid 2017-10-31 10:45:45 -07:00
scm.h sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
secure_seq.h tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
seg6_hmac.h ipv6: sr: add core files for SR HMAC support 2016-11-09 20:40:06 -05:00
seg6.h ipv6: sr: add support for ip4ip6 encapsulation 2017-08-25 17:10:23 -07:00
slhc_vj.h
smc.h smc: netlink interface for SMC sockets 2017-01-09 16:07:41 -05:00
snmp.h net: snmp: fix 64bit stats on 32bit arches 2016-04-28 11:49:45 -04:00
sock_reuseport.h soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.h net/sock: Update sk rcu iterator macro. 2017-10-24 18:46:22 +09:00
Space.h
stp.h
strparser.h strparser: Use delayed work instead of timer for msg timeout 2017-10-25 10:37:11 +09:00
switchdev.h net: bridge: Notify on bridge device mrouter state changes 2017-10-09 10:18:11 -07:00
tcp_states.h
tcp.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-02 15:23:39 +09:00
timewait_sock.h
tls.h tls: kernel TLS support 2017-06-15 12:12:40 -04:00
transp_v6.h ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
tso.h net: define the TSO header size in net/tso.h 2017-08-23 20:42:09 -07:00
tun_proto.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
udp_tunnel.h net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udp.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
udplite.h udp: use a separate rx queue for packet reception 2017-05-16 15:41:29 -04:00
vsock_addr.h
vxlan.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
wext.h dev_ioctl: copy only the smaller struct iwreq for wext 2017-06-14 13:52:44 +02:00
wimax.h
x25.h net, x25: convert x25_neigh.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
x25device.h
xfrm.h xfrm: make xfrm_replay_state_esn_len() return unsigned int 2017-09-25 07:14:06 +02:00