[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37c6ac9b-305d-42a1-984d-69f19fc44112@windriver.com>
Date: Thu, 25 Sep 2025 09:23:06 +0800
From: "Chang, Jianpeng (CN)" <Jianpeng.Chang.CN@...driver.com>
To: Wei Fang <wei.fang@....com>
Cc: "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Claudiu Manoil <claudiu.manoil@....com>,
Vladimir Oltean <vladimir.oltean@....com>,
Clark Wang
<xiaoning.wang@....com>,
"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"davem@...emloft.net" <davem@...emloft.net>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com"
<pabeni@...hat.com>,
Alexandru Marginean <alexandru.marginean@....com>
Subject: Re: [PATCH] net: enetc: fix the deadlock of enetc_mdio_lock
On 9/24/2025 4:39 PM, Wei Fang wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi Jianpeng,
>
> This patch should target to net tree, please add the target tree to the
> subject, for example, "[PATCH net]".
>
>> After applying the workaround for err050089, the LS1028A platform
>> experiences RCU stalls on RT kernel. This issue is caused by the
>> recursive acquisition of the read lock enetc_mdio_lock. Here list some
>> of the call stacks identified under the enetc_poll path that may lead to
>> a deadlock:
>>
>> enetc_poll
>> -> enetc_lock_mdio
>> -> enetc_clean_rx_ring OR napi_complete_done
>> -> napi_gro_receive
>> -> enetc_start_xmit
>> -> enetc_lock_mdio
>> -> enetc_map_tx_buffs
>> -> enetc_unlock_mdio
>> -> enetc_unlock_mdio
>>
>> After enetc_poll acquires the read lock, a higher-priority writer attempts
>> to acquire the lock, causing preemption. The writer detects that a
>> read lock is already held and is scheduled out. However, readers under
>> enetc_poll cannot acquire the read lock again because a writer is already
>> waiting, leading to a thread hang.
>>
>> Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
>> recursive lock acquisition.
>>
>> Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue")
> Thanks for catching this issue, we also found the similar issue on i.MX95, but
> i.MX95 does not have the hardware issue, so we removed the lock for i.MX95,
> see 86831a3f4cd4 ("net: enetc: remove ERR050089 workaround for i.MX95").
> We also supposed that the LS1028A should have the same issue, but we did
> not reproduce it on LS1028A and did not debug the issue further.
>
> Claudiu previously suspected that this issue was introduced by 6d36ecdbc441
> ("net: enetc: take the MDIO lock only once per NAPI poll cycle"). Your patch
> is basically the same as the one after reverting it. Therefore, the blamed
> commit may be the commit 6d36ecdbc441 not the commit fd5736bf9f23,
> please help check it.
Thank you for your response. I reproduced this issue while running an
iperf test on an LS1028A.
You are correct. The commit fd5736bf9f23 did not introduce the issue
directly. After 6d36ecdbc441
I find it somewhat puzzling that subsequent commits along the current
call path have also introduced
problematic changes. So I write fd5736bf9f23 here.
I will address the issue with the fix line and incorporate all the other
changes you pointed out in the v2 patch.
Thanks,
Jianpeng
>
>> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@...driver.com>
>> ---
>> drivers/net/ethernet/freescale/enetc/enetc.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c
>> b/drivers/net/ethernet/freescale/enetc/enetc.c
>> index e4287725832e..164d2e9ec68c 100644
>> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
>> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
>> @@ -1558,6 +1558,8 @@ static int enetc_clean_rx_ring(struct enetc_bdr
>> *rx_ring,
>> /* next descriptor to process */
>> i = rx_ring->next_to_clean;
>>
>> + enetc_lock_mdio();
>> +
>> while (likely(rx_frm_cnt < work_limit)) {
>> union enetc_rx_bd *rxbd;
>> struct sk_buff *skb;
>> @@ -1593,7 +1595,9 @@ static int enetc_clean_rx_ring(struct enetc_bdr
>> *rx_ring,
>> rx_byte_cnt += skb->len + ETH_HLEN;
>> rx_frm_cnt++;
>>
>> + enetc_unlock_mdio();
>> napi_gro_receive(napi, skb);
>> + enetc_lock_mdio();
>> }
>>
>> rx_ring->next_to_clean = i;
>> @@ -1601,6 +1605,7 @@ static int enetc_clean_rx_ring(struct enetc_bdr
>> *rx_ring,
>> rx_ring->stats.packets += rx_frm_cnt;
>> rx_ring->stats.bytes += rx_byte_cnt;
>>
>> + enetc_unlock_mdio();
> Nit: please add a blank line before "return".
>
>> return rx_frm_cnt;
>> }
>>
>> @@ -1910,6 +1915,8 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
>> *rx_ring,
>> /* next descriptor to process */
>> i = rx_ring->next_to_clean;
>>
>> + enetc_lock_mdio();
>> +
>> while (likely(rx_frm_cnt < work_limit)) {
>> union enetc_rx_bd *rxbd, *orig_rxbd;
>> struct xdp_buff xdp_buff;
>> @@ -1973,7 +1980,9 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
>> *rx_ring,
>> */
>> enetc_bulk_flip_buff(rx_ring, orig_i, i);
>>
>> + enetc_unlock_mdio();
>> napi_gro_receive(napi, skb);
>> + enetc_lock_mdio();
>> break;
>> case XDP_TX:
>> tx_ring = priv->xdp_tx_ring[rx_ring->index];
>> @@ -2038,6 +2047,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
>> *rx_ring,
>> enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring) -
>> rx_ring->xdp.xdp_tx_in_flight);
>>
>> + enetc_unlock_mdio();
> Nit: same as above.
>
>> return rx_frm_cnt;
>> }
>
Powered by blists - more mailing lists