[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACwEKLp42TwpK_3FEp85bq81eA1zg3777guNMonW9cm2i7aN2Q@mail.gmail.com>
Date: Mon, 22 Dec 2025 17:20:02 +0900
From: 小池悠生 <yuki.koike@...-cybersecurity.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: 戸田晃太 <kota.toda@...-cybersecurity.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH net] bonding: Fix header_ops type confusion
Hello, Eric and other maintainers,
I'm deeply sorry to have left the patch suggestion for this long period.
I became extremely busy, and that took its toll on my health, causing
me to take sick leave for nearly half a year (And my colleague Kota
had been waiting for me to come back).
As fortunately I've recovered and returned to work, I hope to move
forward with this matter as well.
Recalling the issue Eric raised, I understand it was a concern about
potential race conditions arising from the `bond_header_ops` and
`header_slave_dev` added to the `struct bonding`. For example, one
could imagine a situation where `header_slave_dev` is rewritten to a
different type, and at that exact moment a function from the old
`bond_header_ops` gets called, or vice versa.
However, I am actually skeptical that this can happen. The reason is
that `bond_setup_by_slave` is only called when there are no slaves at
all:
```
bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
struct netlink_ext_ack *extack)
...
if (!bond_has_slaves(bond)) {
...
if (slave_dev->type != ARPHRD_ETHER)
bond_setup_by_slave(bond_dev, slave_dev);
```
In other words, in order to trigger a race condition, one would need
to remove the slave once and make the slave list empty first. However,
as shown below, in `bond_release_and_destroy`, when the slave list
becomes empty, it appears that the bond interface itself is removed.
This makes it seem impossible to "quickly remove a slave and
re-register it.":
```
static int bond_slave_netdev_event(unsigned long event,
struct net_device *slave_dev)
...
switch (event) {
case NETDEV_UNREGISTER:
if (bond_dev->type != ARPHRD_ETHER)
bond_release_and_destroy(bond_dev, slave_dev);
...
}
...
/* First release a slave and then destroy the bond if no more slaves are left.
* Must be under rtnl_lock when this function is called.
*/
static int bond_release_and_destroy(struct net_device *bond_dev,
struct net_device *slave_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
int ret;
ret = __bond_release_one(bond_dev, slave_dev, false, true);
if (ret == 0 && !bond_has_slaves(bond) &&
bond_dev->reg_state != NETREG_UNREGISTERING) {
bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
netdev_info(bond_dev, "Destroying bond\n");
bond_remove_proc_entry(bond);
unregister_netdevice(bond_dev);
}
return ret;
}
```
Moreover, as noted in the comments, these functions are executed under
the netlink-side lock. Therefore, my conclusion is that a race
condition cannot actually occur. I also think that the fact that, even
before our patch, these code paths had almost no explicit locking
anywhere serves as circumstantial evidence for this view. As Kota
said, as far as I saw, the past syzkaller-bot's report is seemingly
only NULL pointer dereference due to the root cause we reported, and
this patch should fix them.
That said, even so, I agree that the kind of countermeasures Eric
suggests are worth applying if they do not cause problems in terms of
execution speed or code size. However, I am concerned that addressing
this with READ_ONCE or RCU would imply a somewhat large amount of
rewriting.
`header_ops` is designed to allow various types of devices to be
handled in an object-oriented way, and as such it is used throughout
many parts of the Linux kernel. Using READ_ONCE or RCU every time
header_ops is accessed simply because we are worried about a race
condition in bond’s header_ops seems to imply changes like the
following, for example:
```
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 92dc1f1788de..d9aad38746ad 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1538,7 +1538,7 @@ static void neigh_hh_init(struct neighbour *n)
* hh_cache entry.
*/
if (!hh->hh_len)
- dev->header_ops->cache(n, hh, prot);
+ READ_ONCE(dev->header_ops->cache)(n, hh, prot);
write_unlock_bh(&n->lock);
}
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3131,35 +3131,41 @@ static inline int dev_hard_header(struct
sk_buff *skb, struct net_device *dev,
const void *daddr, const void *saddr,
unsigned int len)
{
- if (!dev->header_ops || !dev->header_ops->create)
+ int (*create)(struct sk_buff *skb, struct net_device *dev,
+ unsigned short type, const void *daddr,
+ const void *saddr, unsigned int len);
+ if (!dev->header_ops || !(create = READ_ONCE(dev->header_ops->create)))
return 0;
- return dev->header_ops->create(skb, dev, type, daddr, saddr, len);
+ return create(skb, dev, type, daddr, saddr, len);
}
static inline int dev_parse_header(const struct sk_buff *skb,
unsigned char *haddr)
{
+ int (*parse)(const struct sk_buff *skb, unsigned char *haddr);
const struct net_device *dev = skb->dev;
- if (!dev->header_ops || !dev->header_ops->parse)
+ if (!dev->header_ops || !(parse = READ_ONCE(dev->header_ops->parse)))
return 0;
- return dev->header_ops->parse(skb, haddr);
+ return parse(skb, haddr);
}
... (and so on)
```
It looks like we would end up rewriting on the order of a dozen or so
places with this kind of pattern, but from the perspective of the
maintainers (or in terms of Linux kernel culture), would this be
considered an acceptable change?
If this differs from what you intended, please correct me.
Best regards,
Yuki Koike
2025年5月29日(木) 0:10 Eric Dumazet <edumazet@...gle.com>:
>
> On Wed, May 28, 2025 at 7:36 AM 戸田晃太 <kota.toda@...-cybersecurity.com> wrote:
> >
> > Thank you for your review.
> >
> > 2025年5月26日(月) 17:23 Eric Dumazet <edumazet@...gle.com>:
> > >
> > > On Sun, May 25, 2025 at 10:08 PM 戸田晃太 <kota.toda@...-cybersecurity.com> wrote:
> > > >
> > > > In bond_setup_by_slave(), the slave’s header_ops are unconditionally
> > > > copied into the bonding device. As a result, the bonding device may invoke
> > > > the slave-specific header operations on itself, causing
> > > > netdev_priv(bond_dev) (a struct bonding) to be incorrectly interpreted
> > > > as the slave's private-data type.
> > > >
> > > > This type-confusion bug can lead to out-of-bounds writes into the skb,
> > > > resulting in memory corruption.
> > > >
> > > > This patch adds two members to struct bonding, bond_header_ops and
> > > > header_slave_dev, to avoid type-confusion while keeping track of the
> > > > slave's header_ops.
> > > >
> > > > Fixes: 1284cd3a2b740 (bonding: two small fixes for IPoIB support)
> > > > Signed-off-by: Kota Toda <kota.toda@...-cybersecurity.com>
> > > > Signed-off-by: Yuki Koike <yuki.koike@...-cybersecurity.com>
> > > > Co-Developed-by: Yuki Koike <yuki.koike@...-cybersecurity.com>
> > > > Reviewed-by: Paolo Abeni <pabeni@...hat.com>
> > > > Reported-by: Kota Toda <kota.toda@...-cybersecurity.com>
> > > > ---
> > > > drivers/net/bonding/bond_main.c | 61
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > include/net/bonding.h | 5 +++++
> > > > 2 files changed, 65 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > > > index 8ea183da8d53..690f3e0971d0 100644
> > > > --- a/drivers/net/bonding/bond_main.c
> > > > +++ b/drivers/net/bonding/bond_main.c
> > > > @@ -1619,14 +1619,65 @@ static void bond_compute_features(struct bonding *bond)
> > > > netdev_change_features(bond_dev);
> > > > }
> > > >
> > > > +static int bond_hard_header(struct sk_buff *skb, struct net_device *dev,
> > > > + unsigned short type, const void *daddr,
> > > > + const void *saddr, unsigned int len)
> > > > +{
> > > > + struct bonding *bond = netdev_priv(dev);
> > > > + struct net_device *slave_dev;
> > > > +
> > > > + slave_dev = bond->header_slave_dev;
> > > > +
> > > > + return dev_hard_header(skb, slave_dev, type, daddr, saddr, len);
> > > > +}
> > > > +
> > > > +static void bond_header_cache_update(struct hh_cache *hh, const
> > > > struct net_device *dev,
> > > > + const unsigned char *haddr)
> > > > +{
> > > > + const struct bonding *bond = netdev_priv(dev);
> > > > + struct net_device *slave_dev;
> > > > +
> > > > + slave_dev = bond->header_slave_dev;
> > >
> > > I do not see any barrier ?
> > >
> > > > +
> > > > + if (!slave_dev->header_ops || !slave_dev->header_ops->cache_update)
> > > > + return;
> > > > +
> > > > + slave_dev->header_ops->cache_update(hh, slave_dev, haddr);
> > > > +}
> > > > +
> > > > static void bond_setup_by_slave(struct net_device *bond_dev,
> > > > struct net_device *slave_dev)
> > > > {
> > > > + struct bonding *bond = netdev_priv(bond_dev);
> > > > bool was_up = !!(bond_dev->flags & IFF_UP);
> > > >
> > > > dev_close(bond_dev);
> > > >
> > > > - bond_dev->header_ops = slave_dev->header_ops;
> > > > + /* Some functions are given dev as an argument
> > > > + * while others not. When dev is not given, we cannot
> > > > + * find out what is the slave device through struct bonding
> > > > + * (the private data of bond_dev). Therefore, we need a raw
> > > > + * header_ops variable instead of its pointer to const header_ops
> > > > + * and assign slave's functions directly.
> > > > + * For the other case, we set the wrapper functions that pass
> > > > + * slave_dev to the wrapped functions.
> > > > + */
> > > > + bond->bond_header_ops.create = bond_hard_header;
> > > > + bond->bond_header_ops.cache_update = bond_header_cache_update;
> > > > + if (slave_dev->header_ops) {
> > > > + bond->bond_header_ops.parse = slave_dev->header_ops->parse;
> > > > + bond->bond_header_ops.cache = slave_dev->header_ops->cache;
> > > > + bond->bond_header_ops.validate = slave_dev->header_ops->validate;
> > > > + bond->bond_header_ops.parse_protocol =
> > > > slave_dev->header_ops->parse_protocol;
> > >
> > > All these updates probably need WRITE_ONCE(), and corresponding
> > > READ_ONCE() on reader sides, at a very minimum ...
> > >
> > > RCU would even be better later.
> > >
> > I believe that locking is not necessary in this patch. The update of
> > `header_ops` only happens when a slave is newly enslaved to a bond.
> > Under such circumstances, members of `header_ops` are not called in
> > parallel with updating. Therefore, there is no possibility of race
> > conditions occurring.
>
> bond_dev can certainly be live, and packets can flow.
>
> I have seen enough syzbot reports hinting at this precise issue.
Powered by blists - more mailing lists