lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 16 Oct 2023 17:45:13 +0000
From: Asmaa Mnebhi <asmaa@...dia.com>
To: Florian Fainelli <f.fainelli@...il.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"olteanv@...il.com" <olteanv@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, David Thompson
	<davthompson@...dia.com>
Subject: RE: [PATCH v3 2/3] mlxbf_gige: Fix intermittent no ip issue

> > > Although the link is up, there is no ip assigned on a setup with
> > > high background traffic. Nothing is transmitted nor received.
> > > The RX error count keeps on increasing. After several minutes, the
> > > RX error count stagnates and the GigE interface finally gets an ip.
> > >
> > > The issue is in the mlxbf_gige_rx_init function. As soon as the RX
> > > DMA is enabled, the RX CI reaches the max of 128, and it becomes
> > > equal to RX PI. RX CI doesn't decrease since the code hasn't ran phy_start
> yet.
> > >
> > > The solution is to move the rx init after phy_start.
> > >
> > > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet
> > > driver")
> > > Signed-off-by: Asmaa Mnebhi <asmaa@...dia.com>
> > > Reviewed-by: David Thompson <davthompson@...dia.com>
> >
> > This seems fine, but your description of the problem still looks like
> > there may be a more fundamental ordering issue when you enable your RX
> pipe here.
> >
> > It seems to me like you should enable it from "inner" as in closest to
> > the CPU/DMA subsystem towards "outer" which is the MAC and finally the
> PHY.
> >
> > It should be fine to enable your RX DMA as long as you keep the MAC's
> > RX disabled, and then you can enable your MAC's RX enable and later
> > start the PHY.
> 
> Thanks for your feedback Florian. I will take a look and address your
> comments shortly. Sorry for the delayed response, I was OOO.
Hi Florian,

We would like to maintain the code as is because we need to set the RX DMA after the MAC RX filters and the RX rings are setup (in mlxbf_gige_rx_init()).
The PHY start logic needs to be done  before that, otherwise, there is a chance we would encounter this bug where our MAC RX consumer index (CI) equals our MAC RX production index (PI) and that results in a MAC state that cannot be solved until we cleanup the MAC again. Note that this bug is difficult to reproduce. Our QA had to run the reboot test and have a setup with really high background traffic.

Thanks.
Asmaa

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ