netdev - Re: [PATCH][RT] netpoll: Always take poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOuSAjeUX_tvzr9K8B4T-TdXUDcYqFxbtYTa=ke-ePhMJo5NVQ@mail.gmail.com>
Date:	Tue, 7 Jun 2016 17:19:43 -0700
From:	Alison Chaiken <alison@...oton-tech.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Clark Williams <williams@...hat.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	David Miller <davem@...emloft.net>
Subject: Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

I wrote:
>>We've applied Sebastian's commit "softirq: split timer softirqs out of
>>ksoftirqd," which improved event loop stability substantially when we

Sebastian Andrzej Siewior <bigeasy@...utronix.de>replied:
>Why did you apply that one? You have 4.1.18-ti-rt so I don't know how
>that works but v4.1.15-rt18 had this patch included. Also "net: provide
>a way to delegate processing a softirq to ksoftirqd" should be applied
>(which is also part of v4.1.15-rt18).

Sorry to be obscure; I had applied that patch to v4.1.6-rt5.

> What I remember from testing the two patches on am335x was that before a
> ping flood on gbit froze the serial console but with them it the ping
> flood was not noticed.

I compiled a kernel from upstream d060a36 "Merge branch
'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into
ti-rt-linux-4.1.y" which is unpatched except for using a
board-appropriate device-tree.    The serial console is responsive
with all our RT userspace applications running alongside a rapid
external ping.   However, our main event loop misses frequently as
soon as ping faster than 'ping -i 0.0002' is run.    mpstat shows that
the sum of the hard IRQ rates in a second is equal precisely to the
NET_RX rate, which is ~3400/s.   Does the fact that 3400 < (1/0.0002)
already mean that some packets are dropped?   ftrace shows that
cpsw_rx_poll() is called even when there is essentially no network
traffic, so I'm not sure how to tell if NAPI is working as intended.

I tried running the wakeup_rt tracer, but it loads the system too
much.     With ftrace capturing IRQ, scheduler and net events, we're
writing out markers into the trace buffer when the event loop makes
its deadline and then when it misses so that we can compare the normal
and long-latency intervals, but there doesn't appear to be a smoking
gun in the difference between the two.

Thanks for all your help,
Alison