lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140822082728.GN30401@n2100.arm.linux.org.uk>
Date:	Fri, 22 Aug 2014 09:27:29 +0100
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Mattis Lorentzon <Mattis.Lorentzon@...oliv.com>,
	Fabio Estevam <festevam@...il.com>
Cc:	Fredrik Noring <fredrik.noring@...oliv.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: Oops: 17 SMP ARM (v3.16-rc2)

On Thu, Aug 14, 2014 at 02:43:56PM +0000, Mattis Lorentzon wrote:
> Fabio and Russell,
> 
> > A working theory is that the switch (3Com Switch 4400) triggers the
> > degeneration of the network stack from which Linux does not seem to
> > recover, even if we later bypass the switch and directly connect the board to
> > the server machine.
> 
> After a few more tests we have finally been able to trigger the exact
> same stalls on the Sabrelite board with a direct network connection
> (i.e. without the switch).

That's a setup which I can't reproduce, as all my MX6 hardware runs
root-NFS, so using a direct connection to a machine to test will
result in the MX6 losing its root filesystem.

That said, on SolidRun hardware, there is some investigation going on
at the moment concerning poor UDP performance - this is an on-going
problem that has been present for a long time.

What we find is that TCP performance achieves around the 600mbps mark,
but UDP performance can be extremely poor with high packet loss.
Adding a udelay(210) into the fec_enet_rx() can perversely (on
multi-core SoCs) increase UDP performance to around 500mbps at the
expense of a reduction in TCP performance.

This "solution" was tripped over while trying to debug this problem,
and it was found that adding printk()s to the driver increased UDP
performance - so subsituting udelay() for printk() was then tried.

I tried to run perf on the kernel yesterday to find out what's going
on, but for some reason, perf gave me impossible call traces, so I
gave up with that idea.  For example, perf told me that there was a
high hit rate in memcpy() being called from net_rx_action(), but
net_rx_action() doesn't call memcpy(), nor do any of the called
functions as a tail-call.

That said, I don't think perf could tell us what's going on - what
we need is a trace of the CPU's execution while iperf is running,
*without* affecting the CPU itself.  This is something I can't do
with the hardware I have.

My suspicion (unproven) is that a batch of packets get processed in
the softirq handler called during the FEC interrupt exit path.  Then,
because there's more work to be done, ksoftirqd is scheduled, but it
takes time for ksoftirqd to start running - during which time we drop
a lot of packets.  ksoftirqd processes some packets, but then finds
that it can't complete the NAPI "work budget", and so stops running,
resulting in the packet processing being triggered by the next FEC
interrupt, and the cycle repeats.

TCP notices this, and adjusts its sending rate to match, whereas UDP
just carries on regardless, resulting in lots of packets dropped each
time we switch from the tail of hardirq processing to ksoftirqd.

With the udelay() in place, processing takes enough time that it gets
bounced onto ksoftirqd, where it stays.

I'm adding this to this thread in case it has any bearing on the
problem(s) you're seeing - yes, it seems like a different problem, but
could it be related...

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ