[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <896b88ce-f86c-4f00-8404-cedc6a202729@redhat.com>
Date: Fri, 6 Sep 2024 16:49:32 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>,
Madhu Chittim <madhu.chittim@...el.com>,
Sridhar Samudrala <sridhar.samudrala@...el.com>,
Simon Horman <horms@...nel.org>, John Fastabend <john.fastabend@...il.com>,
Sunil Kovvuri Goutham <sgoutham@...vell.com>,
Jamal Hadi Salim <jhs@...atatu.com>, Donald Hunter
<donald.hunter@...il.com>, anthony.l.nguyen@...el.com,
przemyslaw.kitszel@...el.com, intel-wired-lan@...ts.osuosl.org,
edumazet@...gle.com
Subject: Re: [PATCH v6 net-next 07/15] net-shapers: implement shaper cleanup
on queue deletion
On 9/5/24 20:02, Paolo Abeni wrote:
> On 9/5/24 03:33, Jakub Kicinski wrote:
>> On Wed, 4 Sep 2024 15:53:39 +0200 Paolo Abeni wrote:
>>> + net_shaper_set_real_num_tx_queues(dev, txq);
>>> +
>>> dev_qdisc_change_real_num_tx(dev, txq);
>>>
>>> dev->real_num_tx_queues = txq;
>>
>> The dev->lock has to be taken here, around those three lines,
>> and then set / group must check QUEUE ids against
>> dev->real_num_tx_queues, no? Otherwise the work
>> net_shaper_set_real_num_tx_queues() does is prone to races?
>
> Yes, I think such race exists, but I'm unsure that tacking the lock
> around the above code will be enough.
>
> i.e. if the relevant devices has 16 channel queues the set() races with
> a channel reconf on different CPUs:
>
> CPU 1 CPU 2
>
> set_channels(8)
>
> driver_set_channel()
> // actually change the number of queues to
> // 8, dev->real_num_tx_queues is still 16
> // dev->lock is not held yet because the
> // driver still has to call
> // netif_set_real_num_tx_queues()
> set(QUEUE_15,...)
> // will pass validation
> // but queue 15 does not
> // exist anymore
>
> Acquiring dev->lock around set_channel() will not be enough: some driver
> change the channels number i.e. when enabling XDP.
>
> I think/fear we need to replace the dev->lock with the rtnl lock to
> solve the race for good.
I forgot to mention there is another, easier, alternative: keep the max
queue id check in the drivers. The driver will have to acquire and held
in the shaper callbacks the relevant driver-specific lock - 'crit_lock',
in the iavf case.
Would you be ok with such 2nd option?
Side note: I think the iavf should have to acquire such lock in the
callbacks no matter what or access/write to the rings info could be racy.
Thanks,
Paolo
Powered by blists - more mailing lists