lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOuSAjeUX_tvzr9K8B4T-TdXUDcYqFxbtYTa=ke-ePhMJo5NVQ@mail.gmail.com>
Date:	Tue, 7 Jun 2016 17:19:43 -0700
From:	Alison Chaiken <alison@...oton-tech.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Clark Williams <williams@...hat.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	David Miller <davem@...emloft.net>
Subject: Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

I wrote:
>>We've applied Sebastian's commit "softirq: split timer softirqs out of
>>ksoftirqd," which improved event loop stability substantially when we

Sebastian Andrzej Siewior <bigeasy@...utronix.de>replied:
>Why did you apply that one? You have 4.1.18-ti-rt so I don't know how
>that works but v4.1.15-rt18 had this patch included. Also "net: provide
>a way to delegate processing a softirq to ksoftirqd" should be applied
>(which is also part of v4.1.15-rt18).

Sorry to be obscure; I had applied that patch to v4.1.6-rt5.

> What I remember from testing the two patches on am335x was that before a
> ping flood on gbit froze the serial console but with them it the ping
> flood was not noticed.

I compiled a kernel from upstream d060a36 "Merge branch
'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into
ti-rt-linux-4.1.y" which is unpatched except for using a
board-appropriate device-tree.    The serial console is responsive
with all our RT userspace applications running alongside a rapid
external ping.   However, our main event loop misses frequently as
soon as ping faster than 'ping -i 0.0002' is run.    mpstat shows that
the sum of the hard IRQ rates in a second is equal precisely to the
NET_RX rate, which is ~3400/s.   Does the fact that 3400 < (1/0.0002)
already mean that some packets are dropped?   ftrace shows that
cpsw_rx_poll() is called even when there is essentially no network
traffic, so I'm not sure how to tell if NAPI is working as intended.

I tried running the wakeup_rt tracer, but it loads the system too
much.     With ftrace capturing IRQ, scheduler and net events, we're
writing out markers into the trace buffer when the event loop makes
its deadline and then when it misses so that we can compare the normal
and long-latency intervals, but there doesn't appear to be a smoking
gun in the difference between the two.

Thanks for all your help,
Alison

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ