[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6852b406-1301-4570-b448-6fd58694d219@linux.vnet.ibm.com>
Date: Tue, 31 Oct 2023 18:18:14 -0500
From: Thinh Tran <thinhtr@...ux.vnet.ibm.com>
To: Pavan Chebbi <pavan.chebbi@...adcom.com>
Cc: netdev@...r.kernel.org, siva.kallam@...adcom.com, prashant@...adcom.com,
mchan@...adcom.com, drc@...ux.vnet.ibm.com,
Venkata Sai Duggi <venkata.sai.duggi@....com>
Subject: Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
Thanks for the review and I apologize for the delayed response. I had
some trouble accessing the system, which delayed my investigation.
On 10/2/2023 11:34 PM, Pavan Chebbi wrote:
>
> Can you elaborate on the race condition please? Are you saying
> tg3_reset_task_cancel() cleared the flag and tg3_tx_recover() set it
> again and the reset task never got a chance to run?
> Is that what is leading to TX stall?
This code path only triggered once, and after updating both the system
and adapter firmware, I haven't encountered it again. However, the race
condition issue still persists, causing the interfaces to go down.
Implementing the memory barrier, smp_mb__after_atomic(), as suggested by
Michael Chan, the intermittent problem still persists. Upon closer
investigation, I identified the root cause, details in the next version
of the patch. When I commented out the call to the tg3_dump_state()
function in the tg3_tx_timeout() function, the issue occurred quicker.
I'm working on submitting the v2 of the patch.
Regards,
Thinh Tran
Powered by blists - more mailing lists