lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Feb 2017 18:46:19 +0000
From:   Will Deacon <will.deacon@....com>
To:     Brian Starkey <brian.starkey@....com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Potapenko <glider@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux@...linux.org.uk
Subject: Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let
 ksoftirqd do its job"

Hi all,

I've also stumbled over this issue with the ARM fastmodel and, somewhat
embarrassingly, blamed the model developers for the regression. I'm using
NFS and copying ~14MB file from NFS to a virtio-blk device which takes
over 20 minutes with 4cd13c21b207, but <1 min with it reverted.

I also think I've figured out what's going on. See below.

On Fri, Nov 25, 2016 at 01:14:03PM +0000, Brian Starkey wrote:
> On Wed, Nov 23, 2016 at 12:03:28PM -0800, Eric Dumazet wrote:
> >On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <brian.starkey@....com> wrote:
> >
> >>This patch didn't help.
> >>
> >>I did get some new traces though - I've attached the diff for the
> >>trace_printks I added.
> >>
> >>Before 4cd13c21b207:
> >>https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
> >>After 4cd13c21b207:
> >>https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
> >>
> >>It looks like the difference is that after 4cd13c21b207 the RX softirq
> >>isn't running, and RX interrupts don't call softirq_raise anymore -
> >>presumably because there's one pending, but I didn't have time to
> >>track that down to a code-path.
> >>
> >>Cheers,
> >>-Brian
> >>
> >
> >Hi Brian
> >
> >Looks like netif_rx() drops the incoming packets then ?
> >
> >Maybe netif_running() is not happy :(
> >
> >Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)
> 
> Some packets are dropped, but not very many:
> 
>   $ grep NET_RX_SUCCESS trace_netif_rx.txt | wc -l
>   14399
>   $ grep NET_RX_DROP trace_netif_rx.txt | wc -l
>   22
> 
> Without the ksoftirqd change there were zero NET_RX_DROPs.

The SMC91x has an on-chip 8KB FIFO (i.e. there's no DMA going on here).
When the FIFO is full (every 4 TCP packets in my case), we get an
interrupt and run down the smc_rcv path. There, we allocate an skb for
the data (netdev_alloc_skb) and copy the data out of the FIFO
(SMC_PULL_DATA) into the buffer, which we hand over the network core via
netif_rx.

The problem is that netif_rx defers to ksoftirqd to process the packet
and more crucially *free* the skb after it's been consumed. Since the
thing was allocated in IRQ context, we end up exhausting our GFP_ATOMIC
memory because ksoftirqd gets interrupted so frequently due to the tiny
FIFO depth that buffers are allocated at a much higher frequency than
they are freed. This may be exagerated by the relative speed of the model
emulated CPU with respect to the network interface, but I'd expect this
to be reproducible on real hardware too (rmk, cc'd, was going to give that
a go).

Prior to 4cd13c21b207, we'd always run softirqs synchronously on the
hardirq exit path and therefore have a chance to free some skbs before
actually EOI'ing the hardirq and allowing the FIFO-full interrupt to
interrupt us again.

Converting the smc91x driver over to NAPI would probably solve this problem,
but given the "vintage" of this code, I'd be more tempted by a simpler
point fix if only I could think of one.

Any ideas?

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ