[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <160839008460.3141.80891004725293385@kwain.local>
Date: Sat, 19 Dec 2020 16:01:24 +0100
From: Antoine Tenart <atenart@...nel.org>
To: Alexander Duyck <alexander.duyck@...il.com>,
Jakub Kicinski <kuba@...nel.org>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net 1/4] net-sysfs: take the rtnl lock when storing xps_cpus
Hi Jakub, Alexander,
Quoting Alexander Duyck (2020-12-19 02:41:08)
> On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > Two things: (a) is the datapath not exposed to a similar problem?
> > __get_xps_queue_idx() uses dev->tc_num in a very similar fashion.
>
> I think we are shielded from this by the fact that if you change the
> number of tc the Tx path has to be torn down and rebuilt since you are
> normally changing the qdisc configuration anyway.
That's right. But there's nothing preventing users to call functions
using the xps maps in between. There are a few functions being exposed.
One (similar) example of that is another bug I reproduced, were the old
and the new map in __netif_set_xps_queue do not have the same size,
because num_tc was updated in between two calls to this function. The
root cause is the same: the size of the map is not embedded in it and
whenever we access it we can make an out of bound access.
> > Should we perhaps make the "num_tcs" part of the XPS maps which is
> > under RCU protection rather than accessing the netdev copy?
Yes, I have a local patch (untested, still WIP) doing exactly that. The
idea is we can't make sure a num_tc update will trigger an xps
reallocation / reconfiguration of the map; but at least we can make sure
the map won't be accessed out of bounds.
It's a different issue though: not being able to access a map out of
bound once it has been allocated whereas this patch wants to prevent an
update of num_tc while the xps map allocation/setup is in progress.
> So it looks like the issue is the fact that we really need to
> synchronize netdev_reset_tc, netdev_set_tc_queue, and
> netdev_set_num_tc with __netif_set_xps_queue.
>
> > (b) if we always take rtnl_lock, why have xps_map_mutex? Can we
> > rearrange things so that xps_map_mutex is sufficient?
>
> It seems like the quick and dirty way would be to look at updating the
> 3 functions I called out so that they were holding the xps_map_mutex
> while they were updating things, and for __netif_set_xps_queue to
> expand out the mutex to include the code starting at "if (dev->num_tc)
> {".
That should do the trick. The only downside is xps_map_mutex is only
defined with CONFIG_XPS while netdev_set_num_tc is not, adding more
ifdef to it. But that's probably a better compromise than taking the
rtnl lock.
Thanks for the review and suggestions!
Antoine
Powered by blists - more mailing lists