lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 3 Aug 2020 14:23:19 +0000
From:   Asmaa Mnebhi <Asmaa@...lanox.com>
To:     Andrew Lunn <andrew@...n.ch>
CC:     David Thompson <dthompson@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>, Jiri Pirko <jiri@...lanox.com>
Subject: RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver



> -----Original Message-----
> From: Andrew Lunn <andrew@...n.ch>
> Sent: Friday, July 31, 2020 9:15 PM
> To: Asmaa Mnebhi <Asmaa@...lanox.com>
> Cc: David Thompson <dthompson@...lanox.com>;
> netdev@...r.kernel.org; davem@...emloft.net; kuba@...nel.org; Jiri Pirko
> <jiri@...lanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
> 
> > > > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int
> > > > > +phy_add, int
> > > >
> > > > > +phy_reg) {
> > > >
> > > > > +         struct mlxbf_gige *priv = bus->priv;
> > > >
> > > > > +         u32 cmd;
> > > >
> > > > > +         u32 ret;
> > > >
> > > > > +
> > > >
> > > > > +         /* If the lock is held by something else, drop the request.
> > > >
> > > > > +         * If the lock is cleared, that means the busy bit was cleared.
> > > >
> > > > > +         */
> > > >
> > > >
> > > >
> > > > How can this happen? The mdio core has a mutex which prevents
> > > > parallel
> > > access?
> > > >
> > > >
> > > >
> > > > This is a HW Lock. It is an actual register. So another HW entity
> > > > can be holding that lock and reading/changing the values in the HW
> registers.
> > >
> > > You have not explains how that can happen? Is there something in the
> > > driver i missed which takes a backdoor to read/write MDIO transactions?
> >
> > Ah ok! There is a HW entity (called YU) within the BlueField which is
> connected to the PHY device.
> > I think the YU is what you are calling "backdoor" here. The YU
> > contains several registers which control reads/writes To the PHY. So
> > it is like an extra layer for reading MDIO registers. One of the YU registers is
> the gateway register (aka GW or MLXBF_GIGE_MDIO_GW_OFFSET in the
> code). If the GW register's LOCK bit is not cleared, we cannot write anything
> to the actual PHY MDIO registers.
> > Did I answer your question?
> 
> Nope.
> 
> How can two transactions happen at the same time, causing this lock bit to
> be locked? Given that the MDIO core has a mutex and serialises all
> transactions. How can the lock bit every be set?

Ah I see what you are saying. SW takes care of it, so HW would never fall into this scenario. That will make things cleaner and faster then! Ok will change it, test it and report back.

> 
> > > > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> > > >
> > > > > +         if (ret)
> > > >
> > > > > +                       return -EBUSY;
> > > >
> > > >
> > > >
> > > > PHY drivers are not going to like that. They are not going to retry.
> > > > What is likely to happen is that phylib moves into the ERROR
> > > > state, and the PHY driver grinds to a halt.
> > > >
> > > >
> > > >
> > > > This is a fairly quick HW transaction. So I don’t think it would
> > > > cause and issue for the PHY drivers. In this case, we use the
> > > > micrel KSZ9031. We haven’t seen issues.
> > >
> > > So you have happy to debug hard to find and reproduce issues when it
> > > does happen? Or would you like to spend a little bit of time now and
> > > just prevent it happening at all?
> >
> > I think I misunderstood your comment. Did you ask why we are polling
> here? Or that we shouldn't be returning -EBUSY?
> 
> I think you should not be returning EBUSY. If it every happens, bad things will
> happen.
> 
> This lock bit seems to server no purpose. Software will ensure that
> transactions are serialized. If it serves no purpose, just ensure it is unlocked
> at probe time, and then ignore it. If you ignore it, you will never return -
> EBUSY and so bad things will never happen.
> 
> Just because hardware exists does not mean you have to use it or that it
> adds any value.

Sounds good.
> 
>        Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ