netdev - Re: reproducable panic eviction work queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5FD5C17E-B321-404E-80A2-EE46BB8AA746@transip.nl>
Date:	Sat, 18 Jul 2015 09:01:40 +0000
From:	Johan Schuijt <johan@...nsip.nl>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	"nikolay@...hat.com" <nikolay@...hat.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"fw@...len.de" <fw@...len.de>,
	"chutzpah@...too.org" <chutzpah@...too.org>,
	Robin Geuze <robing@...nsip.nl>,
	Frank Schreuder <fschreuder@...nsip.nl>,
	netdev <netdev@...r.kernel.org>
Subject: Re: reproducable panic eviction work queue

Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic.

- Johan


> On 18 Jul 2015, at 10:56, Eric Dumazet <eric.dumazet@...il.com> wrote:
> 
> On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote:
>> Hey guys, 
>> 
>> 
>> We’re currently running into a reproducible panic in the eviction work
>> queue code when we pin al our eth* IRQ to different CPU cores (in
>> order to scale our networking performance for our virtual servers).
>> This only occurs in kernels >= 3.17 and is a result of the following
>> change:
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24
>> 
>> 
>> The race/panic we see seems to be the same as, or similar to:
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29
>> 
>> 
>> We can confirm that this is directly exposed by the IRQ pinning since
>> disabling this stops us from being able to reproduce this case :)
>> 
>> 
>> How te reproduce: in our test-setup we have 4 machines generating UDP
>> packets which are send to the vulnerable host. These all have a MTU of
>> 100 (for test purposes) and send UDP packets of a size of 256 bytes.
>> Within half an hour you will see the following panic:
>> 
>> 
>> crash> bt
>> PID: 56     TASK: ffff885f3d9fc210  CPU: 9   COMMAND: "kworker/9:0"
>> #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7
>> #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187
>> #2 [ffff885f3da03c80] oops_end at ffffffff81015140
>> #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88
>>    [exception RIP: inet_evict_bucket+281]
>>    RIP: ffffffff81480699  RSP: ffff885f3da03d58  RFLAGS: 00010292
>>    RAX: ffff885f3da03d08  RBX: dead0000001000a8  RCX:
>> ffff885f3da03d08
>>    RDX: 0000000000000006  RSI: ffff885f3da03ce8  RDI:
>> dead0000001000a8
>>    RBP: 0000000000000002   R8: 0000000000000286   R9:
>> ffff88302f401640
>>    R10: 0000000080000000  R11: ffff88602ec0c138  R12:
>> ffffffff81a8d8c0
>>    R13: ffff885f3da03d70  R14: 0000000000000000  R15:
>> ffff881d6efe1a00
>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a
>> #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19
>> #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3
>> #7 [ffff885f3da03ed0] kthread at ffffffff8108103e
>> #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c
>> 
>> 
>> We would love to receive your input on this matter.
>> 
>> 
>> Thx in advance,
>> 
>> 
>> - Johan
> 
> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 &
> d70127e8a942364de8dd140fe73893efda363293
> 
> Also please send your mails in text format, not html, and CC netdev ( I
> did here)
> 
>> 
>> 
> 
>