[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<CH2PR12MB389559F06323672319B696ABD7D7A@CH2PR12MB3895.namprd12.prod.outlook.com>
Date: Mon, 16 Oct 2023 17:45:13 +0000
From: Asmaa Mnebhi <asmaa@...dia.com>
To: Florian Fainelli <f.fainelli@...il.com>, "davem@...emloft.net"
<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
"olteanv@...il.com" <olteanv@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, David Thompson
<davthompson@...dia.com>
Subject: RE: [PATCH v3 2/3] mlxbf_gige: Fix intermittent no ip issue
> > > Although the link is up, there is no ip assigned on a setup with
> > > high background traffic. Nothing is transmitted nor received.
> > > The RX error count keeps on increasing. After several minutes, the
> > > RX error count stagnates and the GigE interface finally gets an ip.
> > >
> > > The issue is in the mlxbf_gige_rx_init function. As soon as the RX
> > > DMA is enabled, the RX CI reaches the max of 128, and it becomes
> > > equal to RX PI. RX CI doesn't decrease since the code hasn't ran phy_start
> yet.
> > >
> > > The solution is to move the rx init after phy_start.
> > >
> > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet
> > > driver")
> > > Signed-off-by: Asmaa Mnebhi <asmaa@...dia.com>
> > > Reviewed-by: David Thompson <davthompson@...dia.com>
> >
> > This seems fine, but your description of the problem still looks like
> > there may be a more fundamental ordering issue when you enable your RX
> pipe here.
> >
> > It seems to me like you should enable it from "inner" as in closest to
> > the CPU/DMA subsystem towards "outer" which is the MAC and finally the
> PHY.
> >
> > It should be fine to enable your RX DMA as long as you keep the MAC's
> > RX disabled, and then you can enable your MAC's RX enable and later
> > start the PHY.
>
> Thanks for your feedback Florian. I will take a look and address your
> comments shortly. Sorry for the delayed response, I was OOO.
Hi Florian,
We would like to maintain the code as is because we need to set the RX DMA after the MAC RX filters and the RX rings are setup (in mlxbf_gige_rx_init()).
The PHY start logic needs to be done before that, otherwise, there is a chance we would encounter this bug where our MAC RX consumer index (CI) equals our MAC RX production index (PI) and that results in a MAC state that cannot be solved until we cleanup the MAC again. Note that this bug is difficult to reproduce. Our QA had to run the reboot test and have a setup with really high background traffic.
Thanks.
Asmaa
Powered by blists - more mailing lists