netdev - Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a0u53rHSW=72CnnbhrY28Z+9f=Yv2K-bbj5OD+2Ds4unA@mail.gmail.com>
Date:   Fri, 15 May 2020 14:04:30 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Bartosz Golaszewski <brgl@...ev.pl>
Cc:     Jonathan Corbet <corbet@....net>, Rob Herring <robh+dt@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Matthias Brugger <matthias.bgg@...il.com>,
        John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Fabien Parent <fparent@...libre.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Edwin Peer <edwin.peer@...adcom.com>,
        DTML <devicetree@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "moderated list:ARM/Mediatek SoC..." 
        <linux-mediatek@...ts.infradead.org>,
        Stephane Le Provost <stephane.leprovost@...iatek.com>,
        Pedro Tsai <pedro.tsai@...iatek.com>,
        Andrew Perepech <andrew.perepech@...iatek.com>,
        Bartosz Golaszewski <bgolaszewski@...libre.com>
Subject: Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
>
> czw., 14 maj 2020 o 18:19 Arnd Bergmann <arnd@...db.de> napisał(a):
> >
> > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
> > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv)
> > > +{
> > > +       unsigned int val;
> > > +
> > > +       regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> > > +       regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> > > +
> > > +       return val;
> > > +}
> >
> > Do you actually need to read the register? That is usually a relatively
> > expensive operation, so if possible try to use clear the bits when
> > you don't care which bits were set.
> >
>
> I do care, I'm afraid. The returned value is being used in the napi
> poll callback to see which ring to process.

I suppose the other callers are not performance critical.

For the rx and tx processing, it should be better to just always look at
the queue directly and ignore the irq status, in particular when you
are already in polling mode: suppose you receive ten frames at once
and only process five but clear the irq flag.

When the poll function is called again, you still need to process the
others, but I would assume that the status tells you that nothing
new has arrived so you don't process them until the next interrupt.

For the statistics, I assume you do need to look at the irq status,
but this doesn't have to be done in the poll function. How about
something like:

- in hardirq context, read the irq status word
- irq rx or tx irq pending, call napi_schedule
- if stats irq pending, schedule a work function
- in napi poll, process both queues until empty or
  budget exhausted
- if packet processing completed in poll function
  ack the irq and check again, call napi_complete
- in work function, handle stats irq, then ack it

> > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > > +{
> > > +       struct mtk_mac_ring *ring = &priv->tx_ring;
> > > +       struct net_device *ndev = priv->ndev;
> > > +       int ret;
> > > +
> > > +       for (;;) {
> > > +               mtk_mac_lock(priv);
> > > +
> > > +               if (!mtk_mac_ring_descs_available(ring)) {
> > > +                       mtk_mac_unlock(priv);
> > > +                       break;
> > > +               }
> > > +
> > > +               ret = mtk_mac_tx_complete_one(priv);
> > > +               if (ret) {
> > > +                       mtk_mac_unlock(priv);
> > > +                       break;
> > > +               }
> > > +
> > > +               if (netif_queue_stopped(ndev))
> > > +                       netif_wake_queue(ndev);
> > > +
> > > +               mtk_mac_unlock(priv);
> > > +       }
> > > +}
> >
> > It looks like most of the stuff inside of the loop can be pulled out
> > and only done once here.
> >
>
> I did that in one of the previous submissions but it was pointed out
> to me that a parallel TX path may fill up the queue before I wake it.

Right, I see you plugged that hole, however the way you hold the
spinlock across the expensive DMA management but then give it
up in each loop iteration feels like this is not the most efficient
way.

The easy way would be to just hold the lock across the entire
loop and then be sure you do it right. Alternatively you could
minimize the locking and only do the wakeup after up do the final
update to the tail pointer, at which point you know the queue is not
full because you have just freed up at least one entry.

> > > +static int mtk_mac_poll(struct napi_struct *napi, int budget)
> > > +{
> > > +       struct mtk_mac_priv *priv;
> > > +       unsigned int status;
> > > +       int received = 0;
> > > +
> > > +       priv = container_of(napi, struct mtk_mac_priv, napi);
> > > +
> > > +       status = mtk_mac_intr_read_and_clear(priv);
> > > +
> > > +       /* Clean up TX */
> > > +       if (status & MTK_MAC_BIT_INT_STS_TNTC)
> > > +               mtk_mac_tx_complete_all(priv);
> > > +
> > > +       /* Receive up to $budget packets */
> > > +       if (status & MTK_MAC_BIT_INT_STS_FNRC)
> > > +               received = mtk_mac_process_rx(priv, budget);
> > > +
> > > +       /* One of the counter reached 0x8000000 - update stats and reset all
> > > +        * counters.
> > > +        */
> > > +       if (status & MTK_MAC_REG_INT_STS_MIB_CNT_TH) {
> > > +               mtk_mac_update_stats(priv);
> > > +               mtk_mac_reset_counters(priv);
> > > +       }
> > > +
> > > +       if (received < budget)
> > > +               napi_complete_done(napi, received);
> > > +
> > > +       mtk_mac_intr_unmask_all(priv);
> > > +
> > > +       return received;
> > > +}
> >
> > I think you want to leave (at least some of) the interrupts masked
> > if your budget is exhausted, to avoid generating unnecessary
> > irqs.
> >
>
> The networking stack shouldn't queue any new TX packets if the queue
> is stopped - is this really worth complicating the code? Looks like
> premature optimization IMO.

Avoiding IRQs is one of the central aspects of using NAPI -- the idea
is that either you know there is more work to do and you will be called
again in the near future with additional budget, or you enable interrupts
and the irq handler calls napi_schedule, but not both.

This is mostly about RX processing, which is limited by the budget,
for TX you already free all descriptors regardless of the budget.

     Arnd