[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7bf3d0e4-1c76-4170-a417-51c9286385a9@intel.com>
Date: Thu, 10 Apr 2025 16:04:14 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Leon Romanovsky <leon@...nel.org>
CC: Larysa Zaremba <larysa.zaremba@...el.com>,
<intel-wired-lan@...ts.osuosl.org>, Tony Nguyen <anthony.l.nguyen@...el.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Simon
Horman" <horms@...nel.org>, Jonathan Corbet <corbet@....net>, Przemek Kitszel
<przemyslaw.kitszel@...el.com>, Jiri Pirko <jiri@...nulli.us>, "Tatyana
Nikolova" <tatyana.e.nikolova@...el.com>, Andrew Lunn
<andrew+netdev@...n.ch>, Michael Ellerman <mpe@...erman.id.au>, Maciej
Fijalkowski <maciej.fijalkowski@...el.com>, Lee Trager <lee@...ger.us>,
"Madhavan Srinivasan" <maddy@...ux.ibm.com>, Sridhar Samudrala
<sridhar.samudrala@...el.com>, Jacob Keller <jacob.e.keller@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>, Mateusz Polchlopek
<mateusz.polchlopek@...el.com>, Ahmed Zaki <ahmed.zaki@...el.com>,
<netdev@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, "Karlsson, Magnus"
<magnus.karlsson@...el.com>, Emil Tantilov <emil.s.tantilov@...el.com>,
"Madhu Chittim" <madhu.chittim@...el.com>, Josh Hay <joshua.a.hay@...el.com>,
"Milena Olech" <milena.olech@...el.com>, <pavan.kumar.linga@...el.com>,
"Singhai, Anjali" <anjali.singhai@...el.com>, Phani R Burra
<phani.r.burra@...el.com>
Subject: Re: [PATCH iwl-next 05/14] libeth: add control queue support
From: Leon Romanovsky <leon@...nel.org>
Date: Thu, 10 Apr 2025 16:58:43 +0300
> On Thu, Apr 10, 2025 at 03:33:40PM +0200, Alexander Lobakin wrote:
>> From: Leon Romanovsky <leon@...nel.org>
>> Date: Thu, 10 Apr 2025 16:27:06 +0300
>>
>>> On Thu, Apr 10, 2025 at 02:58:28PM +0200, Larysa Zaremba wrote:
>>>> On Thu, Apr 10, 2025 at 02:23:49PM +0300, Leon Romanovsky wrote:
>>>>> On Thu, Apr 10, 2025 at 12:44:33PM +0200, Larysa Zaremba wrote:
>>>>>> On Thu, Apr 10, 2025 at 11:21:37AM +0300, Leon Romanovsky wrote:
>>>>>>> On Tue, Apr 08, 2025 at 02:47:51PM +0200, Larysa Zaremba wrote:
>>>>>>>> From: Phani R Burra <phani.r.burra@...el.com>
>>>>>>>>
>>>>>>>> Libeth will now support control queue setup and configuration APIs.
>>>>>>>> These are mainly used for mailbox communication between drivers and
>>>>>>>> control plane.
>>>>>>>>
>>>>>>>> Make use of the page pool support for managing controlq buffers.
>>>>>>>
>>>>>>> <...>
>>>>>>>
>>>>>>>> libeth-y := rx.o
>>>>>>>>
>>>>>>>> +obj-$(CONFIG_LIBETH_CP) += libeth_cp.o
>>>>>>>> +
>>>>>>>> +libeth_cp-y := controlq.o
>>>>>>>
>>>>>>> So why did you create separate module for it?
>>>>>>> Now you have pci -> libeth -> libeth_cp -> ixd, with the potential races between ixd and libeth, am I right?
>>>>>>>
>>>>>>
>>>>>> I am not sure what kind of races do you mean, all libeth modules themselves are
>>>>>> stateless and will stay this way [0], all used data is owned by drivers.
>>>>>
>>>>> Somehow such separation doesn't truly work. There are multiple syzkaller
>>>>> reports per-cycle where module A tries to access module C, which already
>>>>> doesn't exist because it was proxied through module B.
>>>>
>>>> Are there similar reports for libeth and libie modules when iavf is enabled?
>>>
>>> To get such report, syzkaller should run on physical iavf, it looks like it doesn't.
>>> Did I miss it here?
>>> https://syzkaller.appspot.com/upstream/s/net
>>>
>>>> It is basically the same hierarchy. (iavf uses both libeth and libie, libie
>>>> depends on libeth).
>>>>
>>>> I am just trying to understand, is this a regular situation or did I just mess
>>>> smth up?
>>>
>>> My review comment was general one. It is almost impossible to review
>>> this newly proposed architecture split for correctness.
>>>
>>>>
>>>>>
>>>>>>
>>>>>> As for the module separation, I think there is no harm in keeping it modular.
>>>>>
>>>>> Syzkaller reports disagree with you.
>>>>>
>>>>
>>>> Could you please share them?
>>>
>>> It is not an easy question to answer, because all these reports are complaining
>>> about some wrong locking order or NULL-pointer access. You will never know if
>>> it is because of programming or design error.
>>>
>>> As an approximate example, see commits a27c6f46dcec ("RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifier")
>>> and f0df225d12fc ("RDMA/bnxt_re: Add sanity checks on rdev validity").
>>> At the first glance, they look unrelated to our discussion, however
>>> they can serve as an example or races between deinit/disable paths in
>>> parent module vs. child.
>>
>> Unrelated. At first, you were talking about module dependencies, now
>> you're talking about struct device etc dependencies, which is a
>> completely different thing.
>>
>> As already said, libeth is stateless, so the latter one can't happen.
>> The former one is impossible at all. As long as at least 1 child module
>> is loaded, you can't unload the parent. And load/unload is serialized,
>> see module core code.
>
> It is not only module load/unload. It is bind/unbind, devlink operations
> e.t.c, everything that can cause to reload driver in module C.
You still mix module deps vs struct device/driver deps.
Atomic modules are not anyhow tied to the latter, esp. given that the
whole libeth is stateless and is just a bunch of helpers.
Thanks,
Olek
Powered by blists - more mailing lists