[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251021020533.1234755-1-lizhi.xu@windriver.com>
Date: Tue, 21 Oct 2025 10:05:33 +0800
From: Lizhi Xu <lizhi.xu@...driver.com>
To: <dan.carpenter@...aro.org>
CC: <lizhi.xu@...driver.com>, <davem@...emloft.net>, <edumazet@...gle.com>,
<horms@...nel.org>, <kuba@...nel.org>, <linux-hams@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>,
<syzbot+2860e75836a08b172755@...kaller.appspotmail.com>,
<syzkaller-bugs@...glegroups.com>
Subject: Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > Task0 Task1 Task2
> > > > ===== ===== =====
> > > > [97] nr_add_node()
> > > > [113] nr_neigh_get_dev() [97] nr_add_node()
> > > > [214] nr_node_lock()
> > > > [245] nr_node->routes[2].neighbour->count--
> > > > [246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > [248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > [283] nr_node_unlock()
> > > > [214] nr_node_lock()
> > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > [254] nr_neigh_hold(nr_neigh); [97] nr_add_node()
> > > > [XXX] nr_neigh_put()
> > > > ^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > use after free happens on line [248]. Do we really need three threads to
> > > > make this race work?
> > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > executing [XXX]nr_neigh_put().
> > Execution Order:
> > 1 -> Task0
> > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> >
> > 2 -> Task1
> > [246] nr_neigh_put(nr_node->routes[2].neighbour); // After execution, the refcount value is 2
> > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> >
> > 3 -> Task0
> > [253] nr_node->routes[2].neighbour = nr_neigh // nr_neigh's refcount value is 1 and add it to routes[2]
> >
> > 4 -> Task2
> > [XXX] nr_neigh_put(nr_node->routes[2].neighbour) // After execution, neighhour is freed
> > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked) // Uaf occurs this line when accessing neighbour->count
>
> Let's step back a bit and look at the bigger picture design. (Which is
> completely undocumented so we're just guessing).
>
> When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> reference count and nr_neigh->count++, then when we remove it from
> ->routes[] we drop the reference and do nr_neigh->count--.
>
> If it's the last reference (and we are not holding ->locked) then we
> remove it from the &nr_neigh_list and drop the reference count again and
> free it. So we drop the reference count twice. This is a complicated
> design with three variables: nr_neigh_hold(), nr_neigh->count and
> ->locked. Why can it not just be one counter nr_neigh_hold(). So
> instead of setting locked = true we would just take an extra reference?
> The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
locked controls whether the neighbor quality can be automatically updated;
count controls the number of different routes a neighbor is linked to;
refcount is simply used to manage the neighbor lifecycle.
>
> Because that's fundamentally the problem, right? We call
> nr_neigh_get_dev() so we think we're holding a reference and we're
> safe, but we don't realize that calling neighbour->count-- can
> result in dropping two references.
After nr_neigh_get_dev() retrieves a neighbor, there shouldn't be an
unfinished nr_add_node() call operating on the neighbor in the route.
Therefore, we need to use a lock before the nr_neigh_get_dev() operation
begins to ensure that the neighbor is added atomically to the routing table.
BR,
Lizhi
Powered by blists - more mailing lists