lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 22 Jul 2015 16:14:53 +0200
From:	Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To:	Florian Westphal <fw@...len.de>
Cc:	Frank Schreuder <fschreuder@...nsip.nl>,
	Johan Schuijt <johan@...nsip.nl>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"nikolay@...hat.com" <nikolay@...hat.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"chutzpah@...too.org" <chutzpah@...too.org>,
	Robin Geuze <robing@...nsip.nl>,
	netdev <netdev@...r.kernel.org>
Subject: Re: reproducable panic eviction work queue

On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote:
> On 07/22/2015 03:58 PM, Florian Westphal wrote:
>> Nikolay Aleksandrov <nikolay@...ulusnetworks.com> wrote:
>>> On 07/22/2015 10:17 AM, Frank Schreuder wrote:
>>>> I got some additional information from syslog:
>>>>
>>>> Jul 22 09:49:33 dommy0 kernel: [  675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42]
>>>> Jul 22 09:49:42 dommy0 kernel: [  685.114033] INFO: rcu_sched self-detected stall on CPU { 3}  (t=39918 jiffies g=988 c=987 q=23168)
>>>>
>>>> Thanks,
>>>> Frank
>>>>
>>>>
>>>
>>> Hi,
>>> It looks like it's happening because of the evict_again logic, I think we should also
>>> add Florian's first suggestion about simplifying it to the patch and just skip the
>>> entry if we can't delete its timer otherwise we can restart the eviction and see
>>> entries that already had their timer stopped by us and can keep restarting for
>>> a long time.
>>> Here's an updated patch that removes the evict_again logic.
>>
>> Thanks Nik.  I'm afraid this adds bug when netns is exiting.
>>
>> Currently, we wait until timer has finished, but after the change
>> we might destroy percpu counter while a timer is still executing on
>> another cpu.
>>
>> I pushed a patch series to
>> https://git.breakpoint.cc/cgit/fw/net.git/log/?h=inetfrag_fixes_02
>>
>> It includes this patch with a small change -- deferral of the percpu
>> counter subtraction until after queue has been free'd.
>>
>> Frank -- it would be great if you could test with the four patches in
>> that series applied.
>>
>> I'll then add your tested-by Tag to all of them before submitting this.
>>
>> Thanks again for all your help in getting this fixed!
>>
> 
> Sure, I didn't think it through, just supplied it for the test. :-)
> Thanks for fixing it up!
> 

Patches look great, even the INET_FRAG_EVICTED flag will not be accidentally cleared 
this way. I'll give them a try.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists