[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250703084102.GN6278@unreal>
Date: Thu, 3 Jul 2025 11:41:02 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Abhijit Gangurde <abhijit.gangurde@....com>
Cc: shannon.nelson@....com, brett.creeley@....com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
corbet@....net, jgg@...pe.ca, andrew+netdev@...n.ch,
allen.hubbe@....com, nikhil.agarwal@....com,
linux-rdma@...r.kernel.org, netdev@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
Andrew Boyer <andrew.boyer@....com>
Subject: Re: [PATCH v3 09/14] RDMA/ionic: Create device queues to support
admin operations
On Thu, Jul 03, 2025 at 12:29:45PM +0530, Abhijit Gangurde wrote:
>
> On 7/1/25 15:54, Leon Romanovsky wrote:
> > On Tue, Jun 24, 2025 at 05:43:10PM +0530, Abhijit Gangurde wrote:
> > > Setup RDMA admin queues using device command exposed over
> > > auxiliary device and manage these queues using ida.
> > >
> > > Co-developed-by: Andrew Boyer <andrew.boyer@....com>
> > > Signed-off-by: Andrew Boyer <andrew.boyer@....com>
> > > Co-developed-by: Allen Hubbe <allen.hubbe@....com>
> > > Signed-off-by: Allen Hubbe <allen.hubbe@....com>
> > > Signed-off-by: Abhijit Gangurde <abhijit.gangurde@....com>
> > > ---
> > > v2->v3
> > > - Fixed lockdep warning
> > > - Used IDA for resource id allocation
> > > - Removed rw locks around xarrays
<...>
> >
> > > + list_for_each_entry_safe(wr, wr_next, &aq->wr_prod, aq_ent) {
> > > + INIT_LIST_HEAD(&wr->aq_ent);
> > > + aq->q_wr[wr->status].wr = NULL;
> > > + wr->status = aq->admin_state;
> > > + complete_all(&wr->work);
> > > + }
> > > + INIT_LIST_HEAD(&aq->wr_prod);
> > <...>
> >
> > > + if (do_reset)
> > > + /* Reset device on a timeout */
> > > + ionic_admin_timedout(bad_aq);
> > I wonder why RDMA driver resets device and not the one who owns PCI.
>
> RDMA driver is requesting the reset via eth driver which holds the
> privilege.
I wonder if the one who owns CMD interface should decide and reset device
and not the clients.
>
<...>
> > > + old_state = atomic_cmpxchg(&dev->admin_state, IONIC_ADMIN_ACTIVE,
> > > + IONIC_ADMIN_PAUSED);
> > > + if (old_state != IONIC_ADMIN_ACTIVE)
> > In all these places you are mixing enum_admin_state and atomic_t for
> > same values, but different variable. Please chose or atomic_t or enum.
>
> admin_state within the admin queues is protected by the spinlock,
> hence it is used as enum_admin_state. However device's admin_state
> is used as as atomic to avoid reset race of reset.
The issue is in mixing types.
>
<...>
> > > +
> > > + if (!cq) {
> > Is it possible?
>
> Possible when HCA goes bad.
Do you have errata for that? Generally speaking, kernel is not written
to be protected from broken HW. The overall assumption is that HW works
correctly.
Thanks
Powered by blists - more mailing lists