[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97ec8df6-0690-4158-be44-ef996746d734@linux.ibm.com>
Date: Fri, 17 Jan 2025 15:05:24 -0600
From: Eddie James <eajames@...ux.ibm.com>
To: Paul Fertser <fercerpav@...il.com>, Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, horms@...nel.org,
pabeni@...hat.com, edumazet@...gle.com, davem@...emloft.net,
sam@...dozajonas.com, Ivan Mikhaylov <fr0st61te@...il.com>
Subject: Re: [PATCH] net/ncsi: Fix NULL pointer derefence if CIS arrives
before SP
On 1/15/25 17:01, Paul Fertser wrote:
> Hi Jakub,
>
> On Tue, Jan 14, 2025 at 02:49:32PM -0800, Jakub Kicinski wrote:
>> Any thoughts on this fix?
> This indeed looks related to what we discussed!
>
>> On Fri, 10 Jan 2025 13:41:33 -0600 Eddie James wrote:
>>> If a Clear Initial State response packet is received before the
>>> Select Package response, then the channel set up will dereference
>>> the NULL package pointer. Fix this by setting up the package
>>> in the CIS handler if it's not found.
> My current notion is that the responses can't normally be re-ordered
> (as we are supposed to send the next command only after receiving
> response for the previous one) and so any surprising event like that
> signifies that the FSM got out of sync (unfortunately it's written in
> such a way that it switches to the "next state" based on the quantity
> of responses the current state expected, not on the actual content of
> them; that's rather fragile).
>
> Sending the "Select Package" command is the first thing that is
> performed after package discovery is complete so problems in that area
> suggest that the reason might be lack of processing for the response
> to the last "Package Deselect" command: receiving it would advance the
> state machine prematurely. It's not quite clear to me how the SP
> response can be lost altogether or what else happens there in the
> failure case, unfortunately it's not reproducible on my system so I
> can't just add more debugging to see all responses and state
> transitions as they happen.
>
> Eddie, how easy is it to reproduce the issue in your setup? Can you
> please try if the change in [0] makes a difference?
I am able to reproduce the panic at will, and unfortunately your patch
does not prevent the issue.
However I suspect this issue may be unique to my set up, so my patch may
not be necessary. I found that I had some user space issues. Fixing
userspace prevented this issue.
Thanks,
Eddie
>
> [0] https://lore.kernel.org/all/Z4ZewoBHkHyNuXT5@home.paul.comp/
>
Powered by blists - more mailing lists