linux-kernel - Re: [RFC PATCH v2 3/8] net: sparx5: add hostmode with phylink support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201223132903.gkle4552uahgqk55@mchp-dev-shegelun>
Date:   Wed, 23 Dec 2020 14:29:03 +0100
From:   Steen Hegelund <steen.hegelund@...rochip.com>
To:     Andrew Lunn <andrew@...n.ch>
CC:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Russell King <linux@...linux.org.uk>,
        Lars Povlsen <lars.povlsen@...rochip.com>,
        Bjarni Jonasson <bjarni.jonasson@...rochip.com>,
        Microchip Linux Driver Support <UNGLinuxDriver@...rochip.com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Madalin Bucur <madalin.bucur@....nxp.com>,
        Nicolas Ferre <nicolas.ferre@...rochip.com>,
        Mark Einon <mark.einon@...il.com>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Arnd Bergmann <arnd@...db.de>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC PATCH v2 3/8] net: sparx5: add hostmode with phylink support

Hi Andrew,

On 22.12.2020 15:41, Andrew Lunn wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>On Tue, Dec 22, 2020 at 10:46:12AM +0100, Steen Hegelund wrote:
>> Hi Andrew,
>>
>> On Sat, 2020-12-19 at 20:51 +0100, Andrew Lunn wrote:
>> > EXTERNAL EMAIL: Do not click links or open attachments unless you
>> > know the content is safe
>> >
>> > > +     /* Create a phylink for PHY management.  Also handles SFPs */
>> > > +     spx5_port->phylink_config.dev = &spx5_port->ndev->dev;
>> > > +     spx5_port->phylink_co
>> > > nfig.type = PHYLINK_NETDEV;
>> > > +     spx5_port->phylink_config.pcs_poll = true;
>> > > +
>> > > +     /* phylink needs a valid interface mode to parse dt node */
>> > > +     if (phy_mode == PHY_INTERFACE_MODE_NA)
>> > > +             phy_mode = PHY_INTERFACE_MODE_10GBASER;
>> >
>> > Maybe just enforce a valid value in DT?
>>
>> Maybe I need to clarify that you must choose between an Ethernet cuPHY
>> or an SFP, so it is optional.
>
>But you also need to watch out for somebody putting a copper modules
>in an SFP port. phylink will then set the mode to SGMII for a 1G
>copper module, etc.
>
The cuPHY SFPs are handled by phylink out-of-the-box if the
kernel has added support for the particular cuPHY driver, and that is
done just by specifying the SFP phandle.
So here we just need to know if the user has attached a cuPHY directly
or an SFP.

The phylink_of_phy_connect function provides a way to add a cuPHY
direcly to the PHYLINK instance, but I have not found a way that you can
specify a specific cuPHY embedded in an SFP, so here PHYLINK determines
what is the appropriate PHY (driver) to use.

Could this be done in a simpler way?

>> > > +/* Configuration */
>> > > +static inline bool sparx5_use_cu_phy(struct sparx5_port *port)
>> > > +{
>> > > +     return port->conf.phy_mode != PHY_INTERFACE_MODE_NA;
>> > > +}
>> >
>> > That is a rather odd definition of copper.
>>
>> Should I rather use a bool property to select between the two options
>> (cuPHY or SFP)?
>
>I guess what you are trying to indicate is between a hard wired Copper
>PHY and an SFP cage? You have some sort of MII switch which allows the
>MAC to be connected to either the QSGMII PHY, or an SFP cage? But
>since the SFP cage could be populated with a copper PHY, and PHYLINK
>will then instantiate a phylib copper PHY driver for it, looking at
>phy_mode is not reliable. You need a property which selects the port,
>not the technology.

Yes the intention was to be able to distinguish between the hardwired 
cuPHY case and the SFP case.

I am OK with adding a property to distinguish between the two cases, but
if the SFP handle is present, PHYLINK has been able to handle an
embedded cuPHY (if the driver is available) and use that in the tests
that I have done so far. So my thinking was that if a phy handle is
present, then the user wants a directly attached cuPHY, not an SFP.

>
>> > > +static int sparx5_port_open(struct net_device *ndev)
>> > > +{
>> > > +     struct sparx5_port *port = netdev_priv(ndev);
>> > > +     int err = 0;
>> > > +
>> > > +     err = phylink_of_phy_connect(port->phylink, port->of_node,
>> > > 0);
>> > > +     if (err) {
>> > > +             netdev_err(ndev, "Could not attach to PHY\n");
>> > > +             return err;
>> > > +     }
>> > > +
>> > > +     phylink_start(port->phylink);
>> > > +
>> > > +     if (!ndev->phydev) {
>> >
>> > Humm. When is ndev->phydev set? I don't think phylink ever sets it.
>>
>> Indirectly: phylink_of_phy_connect uses phy_attach_direct and that sets
>> the phydev.
>
>Ah, O.K. But watch out for a copper SFP module!

Hmm, my expectation is that we have this covered by now.

>
>> > > +static void sparx5_xtr_grp(struct sparx5 *sparx5, u8 grp, bool
>> > > byte_swap)
>> > > +{
>> > > +     int i, byte_cnt = 0;
>> > > +     bool eof_flag = false, pruned_flag = false, abort_flag =
>> > > false;
>> > > +     u32 ifh[IFH_LEN];
>> > > +     struct sk_buff *skb;
>> > > +     struct frame_info fi;
>> > > +     struct sparx5_port *port;
>> > > +     struct net_device *netdev;
>> > > +     u32 *rxbuf;
>> > > +
>> > > +     /* Get IFH */
>> > > +     for (i = 0; i < IFH_LEN; i++)
>> > > +             ifh[i] = spx5_rd(sparx5, QS_XTR_RD(grp));
>> > > +
>> > > +     /* Decode IFH (whats needed) */
>> > > +     sparx5_ifh_parse(ifh, &fi);
>> > > +
>> > > +     /* Map to port netdev */
>> > > +     port = fi.src_port < SPX5_PORTS ?
>> > > +             sparx5->ports[fi.src_port] : NULL;
>> > > +     if (!port || !port->ndev) {
>> > > +             dev_err(sparx5->dev, "Data on inactive port %d\n",
>> > > fi.src_port);
>> > > +             sparx5_xtr_flush(sparx5, grp);
>> > > +             return;
>> > > +     }
>> > > +
>> > > +     /* Have netdev, get skb */
>> > > +     netdev = port->ndev;
>> > > +     skb = netdev_alloc_skb(netdev, netdev->mtu + ETH_HLEN);
>> > > +     if (!skb) {
>> > > +             sparx5_xtr_flush(sparx5, grp);
>> > > +             dev_err(sparx5->dev, "No skb allocated\n");
>> > > +             return;
>> > > +     }
>> > > +     rxbuf = (u32 *)skb->data;
>> > > +
>> > > +     /* Now, pull frame data */
>> > > +     while (!eof_flag) {
>> > > +             u32 val = spx5_rd(sparx5, QS_XTR_RD(grp));
>> > > +             u32 cmp = val;
>> > > +
>> > > +             if (byte_swap)
>> > > +                     cmp = ntohl((__force __be32)val);
>> > > +
>> > > +             switch (cmp) {
>> > > +             case XTR_NOT_READY:
>> > > +                     break;
>> > > +             case XTR_ABORT:
>> > > +                     /* No accompanying data */
>> > > +                     abort_flag = true;
>> > > +                     eof_flag = true;
>> > > +                     break;
>> > > +             case XTR_EOF_0:
>> > > +             case XTR_EOF_1:
>> > > +             case XTR_EOF_2:
>> > > +             case XTR_EOF_3:
>> > > +                     /* This assumes STATUS_WORD_POS == 1, Status
>> > > +                      * just after last data
>> > > +                      */
>> > > +                     byte_cnt -= (4 - XTR_VALID_BYTES(val));
>> > > +                     eof_flag = true;
>> > > +                     break;
>> > > +             case XTR_PRUNED:
>> > > +                     /* But get the last 4 bytes as well */
>> > > +                     eof_flag = true;
>> > > +                     pruned_flag = true;
>> > > +                     fallthrough;
>> > > +             case XTR_ESCAPE:
>> > > +                     *rxbuf = spx5_rd(sparx5, QS_XTR_RD(grp));
>> > > +                     byte_cnt += 4;
>> > > +                     rxbuf++;
>> > > +                     break;
>> > > +             default:
>> > > +                     *rxbuf = val;
>> > > +                     byte_cnt += 4;
>> > > +                     rxbuf++;
>> > > +             }
>> > > +     }
>> > > +
>> > > +     if (abort_flag || pruned_flag || !eof_flag) {
>> > > +             netdev_err(netdev, "Discarded frame: abort:%d
>> > > pruned:%d eof:%d\n",
>> > > +                        abort_flag, pruned_flag, eof_flag);
>> > > +             kfree_skb(skb);
>> > > +             return;
>> > > +     }
>> > > +
>> > > +     if (!netif_oper_up(netdev)) {
>> > > +             netdev_err(netdev, "Discarded frame: Interface not
>> > > up\n");
>> > > +             kfree_skb(skb);
>> > > +             return;
>> > > +     }
>> >
>> > Why is it sending frames when it is not up?
>>
>> This is intended for received frames. A situation where the lower
>> layers have been enabled correctly but not the port.
>
>But why should that happen? It suggests you have the order wrong. The
>lower level should only be enabled once the port is opened.

Yes, on second thought I think this was added to capture an error
situation with a particular cuPHY that we were testing.
It should be removed now.
>
>> > No DMA? What sort of performance do you get? Enough for the odd BPDU,
>> > IGMP frame etc, but i guess you don't want any real bulk data to be
>> > sent this way?
>>
>> Yes the register based injection/extration is not going to be fast, but
>> the FDMA and its driver is being sent later as separate series to keep
>> the size of this review down.
>
>FDMA?

Ah, I should qualify this a bit more: A "Frame DMA" to transfer rx/tx
frames via CPU ports instead of the register based injection/extraction
that is in the driver at the moment.
>
>I need a bit more background here, just to make use this should be a
>pure switchdev driver and not a DSA driver.
>
It is not a DSA driver (if I have understood the principle correctly).
>>
>> >
>> > > +irqreturn_t sparx5_xtr_handler(int irq, void *_sparx5)
>> > > +{
>> > > +     struct sparx5 *sparx5 = _sparx5;
>> > > +
>> > > +     /* Check data in queue */
>> > > +     while (spx5_rd(sparx5, QS_XTR_DATA_PRESENT) & BIT(XTR_QUEUE))
>> > > +             sparx5_xtr_grp(sparx5, XTR_QUEUE, false);
>> > > +
>> > > +     return IRQ_HANDLED;
>> > > +}
>> >
>> > Is there any sort of limit how many times this will loop? If somebody
>> > is blasting 10Gbps at the CPU, will it ever get out of this loop?
>>
>> Hmmm, not at the moment but this is because the FDMA driver is intended
>> to be used in these scenarios.
>
>So throwing out an idea, which might be terrible. How about limiting
>it to 64 loops, the same as the NAPI poll? That might allow the
>machine to get some work done before the next interrupt? Does the
>hardware do interrupt coalescing? But is this is going to be quickly
>thrown away and replaced with FDMA, don't spend too much time on it.

I agree with you.  I will put a cap on the number of loops.

>
>         Andrew

BR
Steen

---------------------------------------
Steen Hegelund
steen.hegelund@...rochip.com