netdev - RE: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: 
 <PH0PR18MB47342E45E84C86A274F78074C785A@PH0PR18MB4734.namprd18.prod.outlook.com>
Date: Tue, 5 Dec 2023 06:50:13 +0000
From: Shinas Rasheed <srasheed@...vell.com>
To: Michal Schmidt <mschmidt@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Haseeb Gani
	<hgani@...vell.com>, Vimlesh Kumar <vimleshk@...vell.com>,
        "egallen@...hat.com" <egallen@...hat.com>,
        "pabeni@...hat.com"
	<pabeni@...hat.com>,
        "horms@...nel.org" <horms@...nel.org>,
        "kuba@...nel.org"
	<kuba@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "wizhao@...hat.com" <wizhao@...hat.com>,
        "konguyen@...hat.com"
	<konguyen@...hat.com>,
        Veerasenareddy Burru <vburru@...vell.com>,
        Sathesh B
 Edara <sedara@...vell.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: RE: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks
 before using APIs

> -----Original Message-----
> > On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@...vell.com>
> wrote:
> > > Do INIT_WORK for the various workqueue tasks before the first
> > > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > > was called before the control net receive work task was even
> > > initialised, the function call wasn't returning actual firmware
> > > info queried from Octeon.
> >
> > It might be more accurate to say that octep_ctrl_net_get_info depends
> > on the processing of OEI events. This happens in intr_poll_task.
> > That's why intr_poll_task needs to be queued earlier.

Intr_poll_task is queued only when the interface is down and the PF cannot catch IRQs as they have been torn down.
Elsewise, OEI events will trigger the OEI IRQ and consequently its handler. Nevertheless, your point is correct in that it
needs to be queued earlier, but I think subsequently since it calls the control mbox task, that is more relevant and necessary as if it
is not initialized, it cannot be scheduled even if OEI interrupts have been caught.

> > Did octep_send_mbox_req previously always fail with EAGAIN after
>           ^^^^^^^^^^^^^^^^^^^^^
> I meant octep_ctrl_net_get_info here.
> 
> > running into the 500 ms timeout in octep_send_mbox_req?

Yes it did, but as it was silent (note that we're not checking any error value), it didn't stop operation. I think I might have to update this patch
to catch the error values as well (This was a relic from the original code which spawned an extra thread to setup device and hence couldn't give back
an error value. That implementation was discouraged and we setup things at probe itself in the upstreamed code and can check error values)

> > Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> > the ctrl_req_wait_list after sending the request to the hardware. If
> > the response arrives quickly, "d" might not yet be on the list when
> > process_mbox_resp looks for it.
> > Also, what protects ctrl_req_wait_list from concurrent access?

Such a race condition is, I also think, valid, but is not currently occurring as response, after due processing from Octeon application,
wouldn't arrive that quickly. Regarding concurrent access, there is currently no protection for ctrl_req_wait_list. Concurrent access here,
can only happen if either two requests manage to get hold of the ctrl_req_wait_list or a request and a response manages to get hold of the
ctrl_req_wait_list (the case you stated above). 

In the first case, since locks are implemented atop the control mbox itself, requests would have to in effect wait for their chance to
queue their wait data "d" to ctrl_req_wait_list, avoiding concurrent access. 

The second case is valid, but as I stated, wouldn't happen practically. But I suppose we do have to handle all theoretical cases and perhaps
locking can be done. I suppose a separate patch for it might be better.