twx-linux/net
Vladimir Oltean 8ca07176ab net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
Currently DSA has an issue with FDB entries pointing towards the bridge
in the presence of br_fdb_replay() being called at port join and leave
time.

In particular, each bridge port will ask for a replay for the FDB
entries pointing towards the bridge when it joins, and for another
replay when it leaves.

This means that for example, a bridge with 4 switch ports will notify
DSA 4 times of the bridge MAC address.

But if the MAC address of the bridge changes during the normal runtime
of the system, the bridge notifies switchdev [ once ] of the deletion of
the old MAC address as a local FDB towards the bridge, and of the
insertion [ again once ] of the new MAC address as a local FDB.

This is a problem, because DSA keeps the old MAC address as a host FDB
entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the
old MAC address will not be deleted. Additionally, the new MAC address
will only be installed with refcount 1, and when the first switch port
leaves the bridge (leaving 3 others as still members), it will delete
with it the new MAC address of the bridge from the local FDB entries
kept by DSA (because the br_fdb_replay call on deletion will bring the
entry's refcount from 1 to 0).

So the problem, really, is that the number of br_fdb_replay() calls is
not matched with the refcount that a host FDB is offloaded to DSA during
normal runtime.

An elegant way to solve the problem would be to make the switchdev
notification emitted by br_fdb_change_mac_address() result in a host FDB
kept by DSA which has a refcount exactly equal to the number of ports
under that bridge. Then, no matter how many DSA ports join or leave that
bridge, the host FDB entry will always be deleted when there are exactly
zero remaining DSA switch ports members of the bridge.

To implement the proposed solution, we remember that the switchdev
objects and port attributes have some helpers provided by switchdev,
which can be optionally called by drivers:
switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set.
These helpers:
- fan out a switchdev object/attribute emitted for the bridge towards
  all the lower interfaces that pass the check_cb().
- fan out a switchdev object/attribute emitted for a bridge port that is
  a LAG towards all the lower interfaces that pass the check_cb().

In other words, this is the model we need for the FDB events too:
something that will keep an FDB entry emitted towards a physical port as
it is, but translate an FDB entry emitted towards the bridge into N FDB
entries, one per physical port.

Of course, there are many differences between fanning out a switchdev
object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB
entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards
a LAG should be treated specially, because FDB entries are unicast, we
can't just install the same address towards 3 destinations. It is
imaginable that drivers might want to treat this case specifically, so
create some methods for this case and do not recurse into the LAG lower
ports, just the bridge ports.

DSA also listens for FDB entries on "foreign" interfaces, aka interfaces
bridged with us which are not part of our hardware domain: think an
Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA
installs host FDB entries. However, there we have the same problem
(those host FDB entries are installed with a refcount of only 1) and an
even bigger one which we did not have with FDB entries towards the
bridge:

br_fdb_replay() is currently not called for FDB entries on foreign
interfaces, just for the physical port and for the bridge itself.

So when DSA sniffs an address learned by the software bridge towards a
foreign interface like an e1000 port, and then that e1000 leaves the
bridge, DSA remains with the dangling host FDB address. That will be
fixed separately by replaying all FDB entries and not just the ones
towards the port and the bridge.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 07:04:27 -07:00
..
6lowpan
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net/802/garp: fix memleak in garp_request_join() 2021-07-01 11:21:57 -07:00
8021q memcg: enable accounting for VLAN group array 2021-07-20 06:00:38 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bpf bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUN 2021-07-07 19:51:13 -07:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: vlan: add mcast snooping control 2021-07-20 05:41:20 -07:00
caif net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core memcg: enable accounting for scm_fp_list objects 2021-07-20 06:00:38 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp memcg: enable accounting for inet_bin_bucket cache 2021-07-20 06:00:38 -07:00
decnet decnet: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
dns_resolver
dsa net: switchdev: introduce helper for checking dynamically learned FDB entries 2021-07-20 07:04:27 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool net: sock: extend SO_TIMESTAMPING for PHC binding 2021-07-01 13:08:18 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 ieee802154: fix error return code in ieee802154_llsec_getparams() 2021-06-03 10:59:49 +02:00
ife
ipv4 Merge branch 'veth-flexible-channel-numbers' 2021-07-20 06:23:13 -07:00
ipv6 memcg: ipv6/sit: account and don't WARN on ip_tunnel_prl structs allocation 2021-07-20 06:00:38 -07:00
iucv s390: iucv: Avoid field over-reading memcpy() 2021-07-01 15:54:01 -07:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter netfilter: nft_last: incorrect arithmetics when restoring last used 2021-07-06 14:15:13 +02:00
netlabel netlabel: Fix memory leak in netlbl_mgmt_add_common 2021-06-15 11:19:04 -07:00
netlink netlink: Deal with ESRCH error in nlmsg_notify() 2021-07-20 11:45:09 +02:00
netrom net: netrom: Fix fall-through warnings for Clang 2021-05-17 19:57:08 -05:00
nfc TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
nsh
openvswitch openvswitch: Introduce per-cpu upcall dispatch 2021-07-16 11:06:33 -07:00
packet Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
phonet
psample
qrtr net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
rds Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
sched net/sched: Remove unnecessary if statement 2021-07-16 10:46:35 -07:00
sctp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
smc net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
switchdev net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE 2021-07-20 07:04:27 -07:00
tipc tipc: keep the skb in rcv queue until the whole data is read 2021-07-16 17:28:09 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-15 22:40:10 -07:00
vmw_vsock Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
wireless cfg80211: Support hidden AP discovery over 6GHz band 2021-06-23 13:05:09 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile
socket.c net: socket: support hardware timestamp conversion to PHC bound 2021-07-01 13:08:18 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00