netdev - Re: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CADEbmW3WjQFQk=g5ESAdMWjoQGHzr3ndL_0AQ8O7-QE2XRVS9w@mail.gmail.com>
Date: Tue, 5 Dec 2023 10:00:24 +0100
From: Michal Schmidt <mschmidt@...hat.com>
To: Shinas Rasheed <srasheed@...vell.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Haseeb Gani <hgani@...vell.com>, 
	Vimlesh Kumar <vimleshk@...vell.com>, "egallen@...hat.com" <egallen@...hat.com>, 
	"pabeni@...hat.com" <pabeni@...hat.com>, "horms@...nel.org" <horms@...nel.org>, 
	"kuba@...nel.org" <kuba@...nel.org>, "davem@...emloft.net" <davem@...emloft.net>, 
	"wizhao@...hat.com" <wizhao@...hat.com>, "konguyen@...hat.com" <konguyen@...hat.com>, 
	Veerasenareddy Burru <vburru@...vell.com>, Sathesh B Edara <sedara@...vell.com>, 
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks
 before using APIs

On Tue, Dec 5, 2023 at 7:50 AM Shinas Rasheed <srasheed@...vell.com> wrote:
> > -----Original Message-----
> > > On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@...vell.com>
> > wrote:
> > > > Do INIT_WORK for the various workqueue tasks before the first
> > > > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > > > was called before the control net receive work task was even
> > > > initialised, the function call wasn't returning actual firmware
> > > > info queried from Octeon.
> > >
> > > It might be more accurate to say that octep_ctrl_net_get_info depends
> > > on the processing of OEI events. This happens in intr_poll_task.
> > > That's why intr_poll_task needs to be queued earlier.
>
> Intr_poll_task is queued only when the interface is down and the PF cannot catch IRQs as they have been torn down.
> Elsewise, OEI events will trigger the OEI IRQ and consequently its handler.

Right. octep_ctrl_net_get_info is called from the probe function, and
at this point the netdev is not even registered yet. Hence the need
for intr_poll_task.
The reason I started wondering about intr_poll_task is that the commit
message talks about the INIT_WORK, but the patch also moves the
queue_delayed_work call and the reasoning for that move was missing in
the message.
I think the move is correct, but please expand the description.

> Nevertheless, your point is correct in that it
> needs to be queued earlier, but I think subsequently since it calls the control mbox task, that is more relevant and necessary as if it
> is not initialized, it cannot be scheduled even if OEI interrupts have been caught.

OK.

> > > Did octep_send_mbox_req previously always fail with EAGAIN after
> >           ^^^^^^^^^^^^^^^^^^^^^
> > I meant octep_ctrl_net_get_info here.
> >
> > > running into the 500 ms timeout in octep_send_mbox_req?
>
> Yes it did, but as it was silent (note that we're not checking any error value), it didn't stop operation. I think I might have to update this patch
> to catch the error values as well (This was a relic from the original code which spawned an extra thread to setup device and hence couldn't give back
> an error value. That implementation was discouraged and we setup things at probe itself in the upstreamed code and can check error values)

Yes, please, catch that error value.

> > > Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> > > the ctrl_req_wait_list after sending the request to the hardware. If
> > > the response arrives quickly, "d" might not yet be on the list when
> > > process_mbox_resp looks for it.
> > > Also, what protects ctrl_req_wait_list from concurrent access?
>
> Such a race condition is, I also think, valid, but is not currently occurring as response, after due processing from Octeon application,
> wouldn't arrive that quickly. Regarding concurrent access, there is currently no protection for ctrl_req_wait_list. Concurrent access here,
> can only happen if either two requests manage to get hold of the ctrl_req_wait_list or a request and a response manages to get hold of the
> ctrl_req_wait_list (the case you stated above).
>
> In the first case, since locks are implemented atop the control mbox itself, requests would have to in effect wait for their chance to
> queue their wait data "d" to ctrl_req_wait_list, avoiding concurrent access.
>
> The second case is valid, but as I stated, wouldn't happen practically. But I suppose we do have to handle all theoretical cases and perhaps
> locking can be done. I suppose a separate patch for it might be better.

Yes, fixing this should be a separate patch.

Thanks,
Michal