twx-linux/include/net
Vladimir Oltean 161ca59d39 net: dsa: reference count the MDB entries at the cross-chip notifier level
Ever since the cross-chip notifiers were introduced, the design was
meant to be simplistic and just get the job done without worrying too
much about dangling resources left behind.

For example, somebody installs an MDB entry on sw0p0 in this daisy chain
topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
sw1p4 and sw2p4.

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]

Then the same person deletes that MDB entry. The cross-chip notifier for
deletion only matches sw0p0:

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]

Why?

Because the DSA links are 'trunk' ports, if we just go ahead and delete
the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
entries when they are still needed. Just consider the fact that somebody
does:

- add a multicast MAC address towards sw0p0 [ via the cross-chip
  notifiers it gets installed on the DSA links too ]
- add the same multicast MAC address towards sw0p1 (another port of that
  same switch)
- delete the same multicast MAC address from sw0p0.

At this point, if we deleted the MAC address from the DSA links, it
would be flooded, even though there is still an entry on switch 0 which
needs it not to.

So that is why deletions only match the targeted source port and nothing
on DSA links. Of course, dangling resources means that the hardware
tables will eventually run out given enough additions/removals, but hey,
at least it's simple.

But there is a bigger concern which needs to be addressed, and that is
our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
on the upstream port, and a similar thing happens on deletion:
dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
upstream port.

When there are 2 VLAN-unaware bridges spanning the same switch (which is
a use case DSA proudly supports), each bridge will install its own
SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
user ports enslaved to br0 and the user ports enslaved to br1. Not good.
The host-trapped multicast addresses installed by br1 will be deleted
when any state changes in br0 (IGMP timers expire, or ports leave, etc).

To avoid this, we could of course go the route of the zero-sum game and
delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
design is to just admit that on shared ports like DSA links and CPU
ports, we should be reference counting calls, even if this consumes some
dynamic memory which DSA has traditionally avoided. On the flip side,
the hardware tables of switches are limited in size, so it would be good
if the OS managed them properly instead of having them eventually
overflow.

To address the memory usage concern, we only apply the refcounting of
MDB entries on ports that are really shared (CPU ports and DSA links)
and not on user ports. In a typical single-switch setup, this means only
the CPU port (and the host MDB entries are not that many, really).

The name of the newly introduced data structures (dsa_mac_addr) is
chosen in such a way that will be reusable for host FDB entries (next
patch).

With this change, we can finally have the same matching logic for the
MDB additions and deletions, as well as for their host-trapped variants.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29 10:46:23 -07:00
..
9p 9p: apply review requests for fid refcounting 2020-11-19 17:21:34 +01:00
bluetooth Bluetooth: Fix Set Extended (Scan Response) Data 2021-06-26 07:12:44 +02:00
caif net: caif: add proper error handling 2021-06-03 15:05:06 -07:00
iucv net/af_iucv: don't track individual TX skbs for TRANS_HIPER sockets 2021-01-28 20:36:21 -08:00
netfilter netfilter: conntrack: pass hook state to log functions 2021-06-18 14:47:43 +02:00
netns Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git 2021-06-28 13:17:16 -07:00
nfc NFC: nci: fix memory leak in nci_allocate_device 2021-05-17 13:56:29 -07:00
phonet
sctp sctp: do black hole detection in search complete state 2021-06-24 12:58:03 -07:00
tc_act net/sched: act_vlan: Fix modify to allow 0 2021-06-01 16:54:42 -07:00
6lowpan.h
act_api.h net: sched: fix err handler in tcf_action_init() 2021-04-08 13:47:33 -07:00
addrconf.h net: bridge: mcast: fix broken length + header check for MRDv6 Adv. 2021-04-27 14:02:06 -07:00
af_ieee802154.h
af_rxrpc.h afs: Don't truncate iter during data fetch 2021-04-23 10:17:26 +01:00
af_unix.h
af_vsock.h af_vsock: rest of SEQPACKET support 2021-06-11 13:32:46 -07:00
ah.h
arp.h
atmclip.h
ax25.h
ax88796.h
bareudp.h
bond_3ad.h
bond_alb.h
bond_options.h
bonding.h net: bonding: Use per-cpu rr_tx_counter 2021-06-15 11:26:15 -07:00
bpf_sk_storage.h bpf: struct sock is declared twice in bpf_sk_storage header 2021-03-26 17:43:55 +01:00
busy_poll.h net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error 2020-12-01 15:51:19 +01:00
calipso.h
cfg80211-wext.h
cfg80211.h Lots of changes: 2021-06-28 13:06:12 -07:00
cfg802154.h
checksum.h saner calling conventions for csum_and_copy_..._user() 2020-08-20 15:45:15 -04:00
cipso_ipv4.h
cls_cgroup.h
codel_impl.h
codel_qdisc.h
codel.h
compat.h compat: always include linux/compat.h from net/compat.h 2020-11-23 13:31:54 -08:00
datalink.h
dcbevent.h
dcbnl.h
devlink.h net: core: devlink: add dropped stats traps field 2021-06-14 13:04:25 -07:00
dn_dev.h
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h
dsa.h net: dsa: reference count the MDB entries at the cross-chip notifier level 2021-06-29 10:46:23 -07:00
dsfield.h
dst_cache.h
dst_metadata.h
dst_ops.h
dst.h net: Consolidate common blackhole dst ops 2021-03-10 12:24:18 -08:00
erspan.h
esp.h
espintcp.h
ethoc.h
failover.h
fib_notifier.h
fib_rules.h
firewire.h
flow_dissector.h flow_dissector: constify raw input data argument 2021-03-14 14:46:32 -07:00
flow_offload.h net: flow_offload: add FLOW_ACTION_PPPOE_PUSH 2021-03-24 12:48:39 -07:00
flow.h flow: remove spi key from flowi struct 2021-04-19 12:25:11 +02:00
fou.h
fq_impl.h net/fq_impl: do not maintain a backlog-sorted list of flows 2021-01-21 13:33:45 +01:00
fq.h net/fq_impl: do not maintain a backlog-sorted list of flows 2021-01-21 13:33:45 +01:00
garp.h
gen_stats.h
genetlink.h mptcp: avoid lock_fast usage in accept path 2021-02-12 16:31:46 -08:00
geneve.h
gre.h ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
gro_cells.h
gro.h gro: add combined call_gro_receive() + INDIRECT_CALL_INET() helper 2021-03-18 19:51:12 -07:00
gtp.h
gue.h
hwbm.h
icmp.h ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
ieee80211_radiotap.h mac80211: add radiotap flag to assure frames are not reordered 2020-11-06 11:01:01 +01:00
ieee802154_netdev.h
if_inet6.h mld: add mc_lock for protecting per-interface mld data 2021-03-26 15:14:56 -07:00
ife.h
ila.h
inet6_connection_sock.h
inet6_hashtables.h
inet_common.h bpf: Allow rewriting to ports under ip_unprivileged_port_start 2021-01-27 18:18:15 -08:00
inet_connection_sock.h tcp: relookup sock for RST+ACK packets handled by obsolete req sock 2021-03-15 14:34:29 -07:00
inet_ecn.h inet_ecn: Use csum16_add() helper for IP_ECN_set_* helpers 2020-12-14 18:38:58 -08:00
inet_frag.h inet: frags: batch fqdir destroy works 2020-12-12 15:08:54 -08:00
inet_hashtables.h tcp: fix race condition when creating child sockets from syncookies 2020-11-23 16:32:33 -08:00
inet_sock.h inet: remove inet_sk_copy_descendant() 2020-08-26 07:33:19 -07:00
inet_timewait_sock.h
inetpeer.h
ip6_checksum.h
ip6_fib.h IPv6: Add "offload failed" indication to routes 2021-02-08 16:47:03 -08:00
ip6_route.h net: allow user to set metric on default route learned via Router Advertisement 2021-01-26 18:39:45 -08:00
ip6_tunnel.h
ip_fib.h ipv4: Add a sysctl to control multipath hash fields 2021-05-18 13:27:32 -07:00
ip_tunnels.h Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-11-19 19:08:46 -08:00
ip_vs.h netfilter: move handlers to net/ip_vs.h 2021-02-04 18:37:57 -08:00
ip.h inet: constify inet_sdif() argument 2020-11-10 17:56:54 -08:00
ipcomp.h
ipconfig.h
ipv6_frag.h ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module 2020-11-19 10:49:50 -08:00
ipv6_stubs.h ipv6: add ipv6_dev_find to stubs 2021-03-30 13:29:39 -07:00
ipv6.h ipv6: Add a sysctl to control multipath hash fields 2021-05-18 13:27:32 -07:00
ipx.h
iw_handler.h
kcm.h
l3mdev.h
lag.h
lapb.h net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
lwtunnel.h
mac80211.h mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154.h
macsec.h
mip6.h
mld.h mld: add new workqueues for process mld events 2021-03-26 15:14:56 -07:00
mpls_iptunnel.h
mpls.h
mptcp.h mptcp: add allow_join_id0 in mptcp_out_options 2021-06-22 14:36:01 -07:00
mrp.h
ncsi.h
ndisc.h ipv6: ndisc: adjust ndisc_ifinfo_sysctl_change prototype 2020-08-24 06:40:07 -07:00
neighbour.h net: Exempt multicast addresses from five-second neighbor lifetime 2020-11-13 14:24:39 -08:00
net_failover.h
net_namespace.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
net_ratelimit.h
netevent.h
netlabel.h
netlink.h treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
netprio_cgroup.h
netrom.h
nexthop.h nexthop: Rename artifacts related to legacy multipath nexthop groups 2021-03-28 17:53:39 -07:00
nl802154.h
nsh.h
p8022.h
page_pool.h page_pool: Allow drivers to hint on SKB recycling 2021-06-07 14:11:47 -07:00
pie.h
ping.h
pkt_cls.h net: zero-initialize tc skb extension on allocation 2021-05-25 15:36:42 -07:00
pkt_sched.h net: sched: fix tx action rescheduling issue during deactivation 2021-05-14 15:05:46 -07:00
pptp.h
protocol.h net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
psample.h psample: Add additional metadata attributes 2021-03-14 15:00:43 -07:00
psnap.h
raw.h
rawv6.h
red.h sch_red: fix off-by-one checks in red_check_params() 2021-03-25 17:40:43 -07:00
regulatory.h
request_sock.h tcp: bpf: Optionally store mac header in TCP_SAVE_SYN 2020-08-24 14:35:00 -07:00
rose.h
route.h lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
rpl.h
rsi_91x.h
rtnetlink.h rtnetlink: add alloc() method to rtnl_link_ops 2021-06-12 13:16:45 -07:00
rtnh.h
sch_generic.h net: sched: remove qdisc->empty for lockless qdisc 2021-06-23 12:17:35 -07:00
scm.h
secure_seq.h
seg6_hmac.h
seg6_local.h
seg6.h
selftests.h net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
slhc_vj.h
smc.h net/smc: introduce CHID callback for ISM devices 2020-09-28 15:19:03 -07:00
snmp.h
sock_reuseport.h tcp: Add reuseport_migrate_sock() to select a new listener. 2021-06-15 18:01:05 +02:00
sock.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
Space.h
stp.h
strparser.h
switchdev.h net: switchdev: add a context void pointer to struct switchdev_notifier_info 2021-06-28 14:09:03 -07:00
tcp_states.h
tcp.h tcp: export timestamp helpers for mptcp 2021-06-04 14:08:09 -07:00
timewait_sock.h
tipc.h
tls_toe.h
tls.h net/tls: Remove the __TLS_DEC_STATS() macro. 2021-06-23 13:47:55 -07:00
transp_v6.h
tso.h
tun_proto.h
udp_tunnel.h udp: call udp_encap_enable for v6 sockets when enabling encap 2021-02-04 18:37:14 -08:00
udp.h skmsg: Pass psock pointer to ->psock_update_sk_prot() 2021-04-12 17:34:27 +02:00
udplite.h
vsock_addr.h
vxlan.h net: sched: only keep the available bits when setting vxlan md->gbp 2020-09-14 16:49:39 -07:00
wext.h
x25.h
x25device.h
xdp_priv.h
xdp_sock_drv.h xsk: Introduce batched Tx descriptor interfaces 2020-11-17 22:07:40 +01:00
xdp_sock.h xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
xdp.h xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00
xfrm.h Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git 2021-06-28 13:17:16 -07:00
xsk_buff_pool.h xsk: Fix race in SKB mode transmit with shared cq 2020-12-18 16:10:21 +01:00