[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e119daf-ff7c-d509-4409-9551ce3403fa@amd.com>
Date: Fri, 4 Jul 2025 16:15:35 +0530
From: Abhijit Gangurde <abhijit.gangurde@....com>
To: Leon Romanovsky <leon@...nel.org>
Cc: shannon.nelson@....com, brett.creeley@....com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, corbet@....net,
jgg@...pe.ca, andrew+netdev@...n.ch, allen.hubbe@....com,
nikhil.agarwal@....com, linux-rdma@...r.kernel.org, netdev@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
Andrew Boyer <andrew.boyer@....com>
Subject: Re: [PATCH v3 09/14] RDMA/ionic: Create device queues to support
admin operations
On 7/3/25 14:11, Leon Romanovsky wrote:
> On Thu, Jul 03, 2025 at 12:29:45PM +0530, Abhijit Gangurde wrote:
>> On 7/1/25 15:54, Leon Romanovsky wrote:
>>> On Tue, Jun 24, 2025 at 05:43:10PM +0530, Abhijit Gangurde wrote:
>>>> Setup RDMA admin queues using device command exposed over
>>>> auxiliary device and manage these queues using ida.
>>>>
>>>> Co-developed-by: Andrew Boyer <andrew.boyer@....com>
>>>> Signed-off-by: Andrew Boyer <andrew.boyer@....com>
>>>> Co-developed-by: Allen Hubbe <allen.hubbe@....com>
>>>> Signed-off-by: Allen Hubbe <allen.hubbe@....com>
>>>> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@....com>
>>>> ---
>>>> v2->v3
>>>> - Fixed lockdep warning
>>>> - Used IDA for resource id allocation
>>>> - Removed rw locks around xarrays
> <...>
>
>>>> + list_for_each_entry_safe(wr, wr_next, &aq->wr_prod, aq_ent) {
>>>> + INIT_LIST_HEAD(&wr->aq_ent);
>>>> + aq->q_wr[wr->status].wr = NULL;
>>>> + wr->status = aq->admin_state;
>>>> + complete_all(&wr->work);
>>>> + }
>>>> + INIT_LIST_HEAD(&aq->wr_prod);
>>> <...>
>>>
>>>> + if (do_reset)
>>>> + /* Reset device on a timeout */
>>>> + ionic_admin_timedout(bad_aq);
>>> I wonder why RDMA driver resets device and not the one who owns PCI.
>> RDMA driver is requesting the reset via eth driver which holds the
>> privilege.
> I wonder if the one who owns CMD interface should decide and reset device
> and not the clients.
To be precise, this operation resets the RDMA logical interface built on
top of the base device, and does not affect the PCI device or the
Ethernet driver's interface. Apologies for the lack of clarity in the
previous
comment. I will update the comment to reflect this accurately in the next
version.
> <...>
>
>>>> + old_state = atomic_cmpxchg(&dev->admin_state, IONIC_ADMIN_ACTIVE,
>>>> + IONIC_ADMIN_PAUSED);
>>>> + if (old_state != IONIC_ADMIN_ACTIVE)
>>> In all these places you are mixing enum_admin_state and atomic_t for
>>> same values, but different variable. Please chose or atomic_t or enum.
>> admin_state within the admin queues is protected by the spinlock,
>> hence it is used as enum_admin_state. However device's admin_state
>> is used as as atomic to avoid reset race of reset.
> The issue is in mixing types.
I will correct this.
>
> <...>
>
>>>> +
>>>> + if (!cq) {
>>> Is it possible?
>> Possible when HCA goes bad.
> Do you have errata for that? Generally speaking, kernel is not written
> to be protected from broken HW. The overall assumption is that HW works
> correctly.
>
> Thanks
There is no known hw issue around this. This was added just as a
precautionary
check so that wrong cqid does not result in kernel panic. I can remove
this check if
this is unwarranted.
Thanks,
Abhijit
Powered by blists - more mailing lists