linux-kernel - Re: [PATCH 1/4] cachefiles: Fix assertion "6 == 5 is false" at fs/fscache/operation.c:494

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOMGZ=GsA4_Zz1z6mae2WzmAeL5p9r-7mXC8ksE-uEF7YP_g_A@mail.gmail.com>
Date:   Fri, 6 Jul 2018 10:31:24 +0200
From:   Vegard Nossum <vegard.nossum@...il.com>
To:     NeilBrown <neilb@...e.com>
Cc:     David Howells <dhowells@...hat.com>, linux-cachefs@...hat.com,
        kiran.modukuri@...il.com, Lei Xue <carmark.dlut@...il.com>,
        LKML <linux-kernel@...r.kernel.org>, aderobertis@...rics.net,
        dja@...ens.net
Subject: Re: [PATCH 1/4] cachefiles: Fix assertion "6 == 5 is false" at fs/fscache/operation.c:494

On 6 July 2018 at 01:45, NeilBrown <neilb@...e.com> wrote:
> On Thu, Jul 05 2018, David Howells wrote:
>
>> From: kiran modukuri <kiran.modukuri@...il.com>
>>
>> There is a potential race in fscache operation enqueuing for reading and
>> copying multiple pages from cachefiles to netfs.
>> Under some heavy load system, it will happen very often.
>>
>> If this race occurs, an oops similar to the following is seen:
>>
>>  kernel BUG at fs/fscache/operation.c:69!
>>  invalid opcode: 0000 [#1] SMP
>>  ...
>>  #0 [ffff883fff0838d8] machine_kexec at ffffffff81051beb
>>  #1 [ffff883fff083938] crash_kexec at ffffffff810f2542
>>  #2 [ffff883fff083a08] oops_end at ffffffff8163e1a8
>>  #3 [ffff883fff083a30] die at ffffffff8101859b
>>  #4 [ffff883fff083a60] do_trap at ffffffff8163d860
>>  #5 [ffff883fff083ab0] do_invalid_op at ffffffff81015204
>>  #6 [ffff883fff083b60] invalid_op at ffffffff8164701e
>>     [exception RIP: fscache_enqueue_operation+246]
>>     RIP: ffffffffa0b793c6  RSP: ffff883fff083c18  RFLAGS: 00010046
>>     RAX: 0000000000000019  RBX: ffff8832ed1a9ec0  RCX: 0000000000000006
>>     RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000046
>>     RBP: ffff883fff083c20   R8: 0000000000000086   R9: 000000000000178f
>>     R10: ffffffff816aeb00  R11: ffff883fff08392e  R12: ffff8802f0525620
>>     R13: ffff88407ffc01d8  R14: 0000000000000000  R15: 0000000000000003
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>>  #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
>>  #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
>>  #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
>>
>> Reported-by: Lei Xue <carmark.dlut@...il.com>
>> Reported-by: Vegard Nossum <vegard.nossum@...il.com>
>> Reported-by: Anthony DeRobertis <aderobertis@...rics.net>
>> Reported-by: NeilBrown <neilb@...e.com>
>> Reported-by: Daniel Axtens <dja@...ens.net>
>> Reported-by: KiranKumar Modukuri <kiran.modukuri@...il.com>
>> Signed-off-by: David Howells <dhowells@...hat.com>
>> ---

[...]

> Thanks - I like this approach.  Taking the extra reference makes it a
> lot more clear what is happening and why.

The changelog is a bit sparse, no? We have more info here:

https://lkml.org/lkml/2018/5/8/520
https://lkml.org/lkml/2018/7/3/1184

Why not crib some of that and explain the issue properly (or at
minimum link the previous threads)?

Thanks,


Vegard