[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMRc=Mf_vYt1J-cc6aZ2-Qv_YDEymVoC7ZiwuG9BrXoGMsXepw@mail.gmail.com>
Date: Fri, 15 May 2020 14:56:09 +0200
From: Bartosz Golaszewski <brgl@...ev.pl>
To: Arnd Bergmann <arnd@...db.de>
Cc: Jonathan Corbet <corbet@....net>, Rob Herring <robh+dt@...nel.org>,
"David S . Miller" <davem@...emloft.net>,
Matthias Brugger <matthias.bgg@...il.com>,
John Crispin <john@...ozen.org>,
Sean Wang <sean.wang@...iatek.com>,
Mark Lee <Mark-MC.Lee@...iatek.com>,
Jakub Kicinski <kuba@...nel.org>,
Fabien Parent <fparent@...libre.com>,
Heiner Kallweit <hkallweit1@...il.com>,
Edwin Peer <edwin.peer@...adcom.com>,
DTML <devicetree@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Networking <netdev@...r.kernel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
"moderated list:ARM/Mediatek SoC..."
<linux-mediatek@...ts.infradead.org>,
Stephane Le Provost <stephane.leprovost@...iatek.com>,
Pedro Tsai <pedro.tsai@...iatek.com>,
Andrew Perepech <andrew.perepech@...iatek.com>,
Bartosz Golaszewski <bgolaszewski@...libre.com>
Subject: Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver
pt., 15 maj 2020 o 14:04 Arnd Bergmann <arnd@...db.de> napisał(a):
>
> On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
> >
> > czw., 14 maj 2020 o 18:19 Arnd Bergmann <arnd@...db.de> napisał(a):
> > >
> > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
> > > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv)
> > > > +{
> > > > + unsigned int val;
> > > > +
> > > > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val);
> > > > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val);
> > > > +
> > > > + return val;
> > > > +}
> > >
> > > Do you actually need to read the register? That is usually a relatively
> > > expensive operation, so if possible try to use clear the bits when
> > > you don't care which bits were set.
> > >
> >
> > I do care, I'm afraid. The returned value is being used in the napi
> > poll callback to see which ring to process.
>
> I suppose the other callers are not performance critical.
>
> For the rx and tx processing, it should be better to just always look at
> the queue directly and ignore the irq status, in particular when you
> are already in polling mode: suppose you receive ten frames at once
> and only process five but clear the irq flag.
>
> When the poll function is called again, you still need to process the
> others, but I would assume that the status tells you that nothing
> new has arrived so you don't process them until the next interrupt.
>
> For the statistics, I assume you do need to look at the irq status,
> but this doesn't have to be done in the poll function. How about
> something like:
>
> - in hardirq context, read the irq status word
> - irq rx or tx irq pending, call napi_schedule
> - if stats irq pending, schedule a work function
> - in napi poll, process both queues until empty or
> budget exhausted
> - if packet processing completed in poll function
> ack the irq and check again, call napi_complete
> - in work function, handle stats irq, then ack it
>
I see your point. I'll try to come up with something and send a new
version on Monday.
> > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > > > +{
> > > > + struct mtk_mac_ring *ring = &priv->tx_ring;
> > > > + struct net_device *ndev = priv->ndev;
> > > > + int ret;
> > > > +
> > > > + for (;;) {
> > > > + mtk_mac_lock(priv);
> > > > +
> > > > + if (!mtk_mac_ring_descs_available(ring)) {
> > > > + mtk_mac_unlock(priv);
> > > > + break;
> > > > + }
> > > > +
> > > > + ret = mtk_mac_tx_complete_one(priv);
> > > > + if (ret) {
> > > > + mtk_mac_unlock(priv);
> > > > + break;
> > > > + }
> > > > +
> > > > + if (netif_queue_stopped(ndev))
> > > > + netif_wake_queue(ndev);
> > > > +
> > > > + mtk_mac_unlock(priv);
> > > > + }
> > > > +}
> > >
> > > It looks like most of the stuff inside of the loop can be pulled out
> > > and only done once here.
> > >
> >
> > I did that in one of the previous submissions but it was pointed out
> > to me that a parallel TX path may fill up the queue before I wake it.
>
> Right, I see you plugged that hole, however the way you hold the
> spinlock across the expensive DMA management but then give it
> up in each loop iteration feels like this is not the most efficient
> way.
>
Maybe my thinking is wrong here, but I assumed that with a spinlock
it's better to give other threads the chance to run in between each
iteration. I didn't benchmark it though.
> The easy way would be to just hold the lock across the entire
> loop and then be sure you do it right. Alternatively you could
> minimize the locking and only do the wakeup after up do the final
> update to the tail pointer, at which point you know the queue is not
> full because you have just freed up at least one entry.
>
Makes sense, I'll see what I can do.
Bartosz
Powered by blists - more mailing lists