lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Aug 2020 20:37:56 +0000
From:   Asmaa Mnebhi <asmaa@...dia.com>
To:     Andrew Lunn <andrew@...n.ch>
CC:     David Thompson <dthompson@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        Jiri Pirko <jiri@...lanox.com>,
        "Asmaa Mnebhi" <Asmaa@...lanox.com>
Subject: RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver



> -----Original Message-----
> From: Andrew Lunn <andrew@...n.ch>
> Sent: Tuesday, August 11, 2020 4:07 PM
> To: Asmaa Mnebhi <asmaa@...dia.com>
> Cc: David Thompson <dthompson@...lanox.com>;
> netdev@...r.kernel.org; davem@...emloft.net; kuba@...nel.org; Jiri
> Pirko <jiri@...lanox.com>; Asmaa Mnebhi <Asmaa@...lanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet
> driver
> 
> On Tue, Aug 11, 2020 at 07:53:35PM +0000, Asmaa Mnebhi wrote:
> > Hi Andrew,
> >
> > Thanks again for your feedback.
> >
> > > > +	/* Finally check if this interrupt is from PHY device.
> > > > +	 * Return if it is not.
> > > > +	 */
> > > > +	val = readl(priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CAUSE_EVTEN0);
> > > > +	if (!(val & priv->phy_int_gpio_mask))
> > > > +		return IRQ_NONE;
> > > > +
> > > > +	/* Clear interrupt when done, otherwise, no further interrupt
> > > > +	 * will be triggered.
> > > > +	 * Writing 0x1 to the clear cause register also clears the
> > > > +	 * following registers:
> > > > +	 * cause_gpio_arm_coalesce0
> > > > +	 * cause_rsh_coalesce0
> > > > +	 */
> > > > +	val = readl(priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > > > +	val |= priv->phy_int_gpio_mask;
> > > > +	writel(val, priv->gpio_io +
> > > > +			MLXBF_GIGE_GPIO_CAUSE_OR_CLRCAUSE);
> > >
> > > Shoudn't there be a call into the PHY driver at this point?
> > >
> > > > +
> > > > +	return IRQ_HANDLED;
> > > > +}
> > >
> > > So these last three functions seem to be an interrupt controller?
> > > So why not model it as a Linux interrupt controller?
> >
> > Apologies for the confusion. The plan is to remove support to the polling
> and instead support the HW interrupt as follows (from the probe):
> > irq = platform_get_irq(pdev, MLXBF_GIGE_PHY_INT_N);
> >          if (irq < 0) {
> >                  dev_err(dev, "Failed to retrieve irq 0x%x\n", irq);
> >                  return -ENODEV;
> >          }
> >          priv->mdiobus->irq[phy_addr] = irq;
> 
> O.K, that is one way to do it. The other is via the MAC driver calling
> phy_mac_interrupt().
> 
> > I guess my question is should we model it as a linux interrupt
> > controller rather than use phy_connect_direct ?
> 
> It seems like there are other interrupt sources, not just the PHY. Do you plan
> to use any of them? It can be easier to debug issues if you have an interrupt
> controller, can see counters in /proc/interrupts, etc. Also, if you need to
> export the lines to some other driver, e.g. SFP, it is easier to do when there is
> an interrupt controller.
> 
> > Using phy_connect_direct to register my interrupt handler, I have
> > encountered a particular issue where the PHY interrupt is triggered
> > before the phy link status bit (reg 0x1 of the PHY device) is set to
> > 1 (indicating link is up).
> 
> So the hardware is broken :-(
> 
> What about the other way, link down? Same problem?
> 
> Polling is probably your best bet, since it is robust against broken interrupts.
> If i remember correctly, this is an off the shelf 1G PHY?
> Microchip? Is there an errata for this? Maybe the errata suggests a work
> around?

So let me explain further and would greatly appreciate your input.
Technically, when this driver gets loaded, we shouldn't need the interrupt when bringing up the link for the first time, do we?
Correct me if I am wrong, "phy_start" should bring up the link. Phy_start calls phy_start_aneg, which eventually calls phy_check_link_status.
phy_check_link_status , reads the link state bit of the BMSR register (only twice),  and based on that determines whether to bring up/down the link. In our case, that bit is still 0 when the read is donw. A little bit later, it gets set to 1.

This is why polling works in this case. Phy_start fails to bring up the link but the polling eventually bring it up. If we choose to use the interrupt, we should make sure that the 
Interrupt is enabled a little bit after phy_start, otherwise, it would just be wasted.

Best,
Asmaa

> 
>      Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ