lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <OF70711DAC.356CE873-ON88257DE7.007C4A72-88257DE8.00776286@selinc.com>
Date:	Tue, 10 Feb 2015 13:43:59 -0800
From:	trevor_davenport@...inc.com
To:	linux-kernel@...r.kernel.org
Subject: process_backlog interruptions with 3.10.47-rt50

I've recently encountered a problem after upgrading from 3.0.57-rt82 to 
3.10.47-rt50 where process_backlog gets interrupted and does not resume 
for a while, which results in packets not being processed in time.  I see 
net_rx_action, which then calls process_backlog (as the poll method to 
process the backlog of packets queued up the netif_rx) but then after the 
interruption, it does not finish for about 5ms.  In the older kernel it 
would finish based on the priority of ksoftirqd.  This is no longer the 
case.
 
I have priorities configured so that hard interrupts are highest, 
ksoftirqd next (both are SCHED_FIFO) and then my program is currently 
SCHED_OTHER but I still do not see the rx softirq finish before my program 
runs.

This is all on a single core powerpc device.  I do not see these problems 
with a net device which uses NAPI directly (as such i'm updating my driver 
to use NAPI) but it seems like there is a real bug here somewhere.  I have 
not been able to find any mention of similar problems (perhaps few people 
are using netif_rx these days).

I've attached a recording from perf which shows the problem. Specifically, 
you see net_rx_action run at time 213.079014 and then it doesn't finish 
until about 5ms later at time 213.084953 which i not the case on the older 
kernels.  It seems something has changed with softirq handling or 
process_backlog needed adapted for it.  My suspicion is this has something 
to do with the work mentioned in 210dc110063cf040d3209fddf766f6fcafccdc34 
but I'm not an expert with this area of the kernel.

Any tips or suggestions would be appreciated.  Please CC me as I am not 
subscribed to lkml.

Thanks,
Trevor




View attachment "perf_script.txt" of type "text/plain" (16824 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ