lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161202123241.GA5869@amd>
Date:   Fri, 2 Dec 2016 13:32:41 +0100
From:   Pavel Machek <pavel@....cz>
To:     Giuseppe CAVALLARO <peppe.cavallaro@...com>
Cc:     alexandre.torgue@...com, David Miller <davem@...emloft.net>,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.

Hi!

> >Well, if you have a workload that sends and receive packets, it tends
> >to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
> >like that -- it is "sending packets at 3MB/sec, receiving none". So
> >the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
> >and then we run out of transmit descriptors, and then 40msec passes,
> >and then we clean them. Bad.
> >
> >And that's why low-res timers do not cut it.
> 
> in that case, I expect that the tuning of the driver could help you.
> I mean, by using ethtool, it could be enough to set the IC bit on all
> the descriptors. You should touch the tx_coal_frames.
> 
> Then you can use ethtool -S to monitor the status.

Yes, I did something similar. Unfortnunately that meant crash within
minutes, at least with 4.4 kernel. (If you know what was fixed between
4.4 and 4.9, that would be helpful).

> We had experimented this tuning on STB IP where just datagrams
> had to send externally. To be honest, although we had seen
> better results w/o any timer, we kept this approach enabled
> because the timer was fast enough to cover our tests on SH4 boxes.

Please reply to David, and explain how it is supposed to
work... because right now it does not. 40 msec delays are not
acceptable in default configuration.

> >>In the ring, some descriptors can raise the irq (according to a
> >>threshold) and set the IC bit. In this path, the NAPI  poll will be
> >>scheduled.
> >
> >Not NAPI poll but stmmac_tx_timer(), right?
> 
> in the xmit according the the threshold the timer is started or the
> interrupt is set inside the descriptor.
> Then stmmac_tx_clean will be always called and, if you see the flow,
> no irqlock protection is needed!

Agreed that no irqlock protection is needed if we rely on napi and timers.

> >>Concerning the lock protection, we had reviewed long time ago and
> >>IIRC, no raise condition should be present. Open to review it,
> >>again!
...
> >There's nothing that protect stmmac_poll() from running concurently
> >with stmmac_dma_interrupt(), right?
> 
> This is not necessary.

dma_interrupt accesses shared priv->xstats; variables are of type
unsigned long (not atomic_t), yet they are accesssed from interrupt
context and from stmmac_ethtool without any locking. That can result
in broken statistics AFAICT.

Please take another look. As far as I can tell, you can have two cpus
at #1 and #2 in the code, at the same time. It looks like napi_... has
some atomic opertions inside so that looks safe at the first look. But
I'm not sure if they also include enough memory barriers to make it
safe...?


static void stmmac_dma_interrupt(struct stmmac_priv *priv)
{
...
        status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats);
        if (likely((status & handle_rx)) || (status & handle_tx)) {
                if (likely(napi_schedule_prep(&priv->napi))) {
#1
                        stmmac_disable_dma_irq(priv);
                        __napi_schedule(&priv->napi);
                }
        }


static int stmmac_poll(struct napi_struct *napi, int budget)
{
        struct stmmac_priv *priv = container_of(napi, struct stmmac_priv, napi);
        int work_done = 0;

        priv->xstats.napi_poll++;
        stmmac_tx_clean(priv);

        work_done = stmmac_rx(priv, budget);
	if (work_done < budget) {
                napi_complete(napi);
#2
	        stmmac_enable_dma_irq(priv);
        }
        return work_done;
}


Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ