netdev - Re: [PATCH v3 net 1/4] pds_core: Prevent possible adminq overflow/stuck condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <00b3b95c-7108-4fa8-9de8-ae19c94fe94e@intel.com>
Date: Wed, 16 Apr 2025 16:34:28 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: "Nelson, Shannon" <shannon.nelson@....com>, <andrew+netdev@...n.ch>,
	<brett.creeley@....com>, <davem@...emloft.net>, <edumazet@...gle.com>,
	<kuba@...nel.org>, <pabeni@...hat.com>, <michal.swiatkowski@...ux.intel.com>,
	<horms@...nel.org>, <linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>
Subject: Re: [PATCH v3 net 1/4] pds_core: Prevent possible adminq
 overflow/stuck condition



On 4/16/2025 1:49 PM, Nelson, Shannon wrote:
> On 4/16/2025 1:13 PM, Jacob Keller wrote:
>>
>> On 4/15/2025 4:29 PM, Shannon Nelson wrote:
>>> From: Brett Creeley <brett.creeley@....com>
>>>
>>> The pds_core's adminq is protected by the adminq_lock, which prevents
>>> more than 1 command to be posted onto it at any one time. This makes it
>>> so the client drivers cannot simultaneously post adminq commands.
>>> However, the completions happen in a different context, which means
>>> multiple adminq commands can be posted sequentially and all waiting
>>> on completion.
>>>
>>> On the FW side, the backing adminq request queue is only 16 entries
>>> long and the retry mechanism and/or overflow/stuck prevention is
>>> lacking. This can cause the adminq to get stuck, so commands are no
>>> longer processed and completions are no longer sent by the FW.
>>>
>>> As an initial fix, prevent more than 16 outstanding adminq commands so
>>> there's no way to cause the adminq from getting stuck. This works
>>> because the backing adminq request queue will never have more than 16
>>> pending adminq commands, so it will never overflow. This is done by
>>> reducing the adminq depth to 16.
>>>
>>
>> What happens if a client driver tries to enqueue a request when the
>> adminq is full? Does it just block until there is space, presumably
>> holding the adminq_lock the entire time to prevent someone else from
>> inserting?
> 
> Right now we will return -ENOSPC and it is up to the client to decide 
> whether or not it wants to do a retry.
> 
> We have another patch that has pdsc_adminq_post() doing a limited retry 
> loop which was part of the original posting [1], but Kuba suggested 
> using a semaphore instead.  That sent us down a redesign branch that we 
> haven't been able to spend time on.  We'd like to have kept the retry 
> loop patch until then to at least mitigate the situation, but the 
> discussion got dropped.

Sure. This fix makes sense in that context.

Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>

> 
> sln
> 
> [1] 
> https://lore.kernel.org/netdev/20250129004337.36898-3-shannon.nelson@amd.com/