[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOMGZ=GsA4_Zz1z6mae2WzmAeL5p9r-7mXC8ksE-uEF7YP_g_A@mail.gmail.com>
Date: Fri, 6 Jul 2018 10:31:24 +0200
From: Vegard Nossum <vegard.nossum@...il.com>
To: NeilBrown <neilb@...e.com>
Cc: David Howells <dhowells@...hat.com>, linux-cachefs@...hat.com,
kiran.modukuri@...il.com, Lei Xue <carmark.dlut@...il.com>,
LKML <linux-kernel@...r.kernel.org>, aderobertis@...rics.net,
dja@...ens.net
Subject: Re: [PATCH 1/4] cachefiles: Fix assertion "6 == 5 is false" at fs/fscache/operation.c:494
On 6 July 2018 at 01:45, NeilBrown <neilb@...e.com> wrote:
> On Thu, Jul 05 2018, David Howells wrote:
>
>> From: kiran modukuri <kiran.modukuri@...il.com>
>>
>> There is a potential race in fscache operation enqueuing for reading and
>> copying multiple pages from cachefiles to netfs.
>> Under some heavy load system, it will happen very often.
>>
>> If this race occurs, an oops similar to the following is seen:
>>
>> kernel BUG at fs/fscache/operation.c:69!
>> invalid opcode: 0000 [#1] SMP
>> ...
>> #0 [ffff883fff0838d8] machine_kexec at ffffffff81051beb
>> #1 [ffff883fff083938] crash_kexec at ffffffff810f2542
>> #2 [ffff883fff083a08] oops_end at ffffffff8163e1a8
>> #3 [ffff883fff083a30] die at ffffffff8101859b
>> #4 [ffff883fff083a60] do_trap at ffffffff8163d860
>> #5 [ffff883fff083ab0] do_invalid_op at ffffffff81015204
>> #6 [ffff883fff083b60] invalid_op at ffffffff8164701e
>> [exception RIP: fscache_enqueue_operation+246]
>> RIP: ffffffffa0b793c6 RSP: ffff883fff083c18 RFLAGS: 00010046
>> RAX: 0000000000000019 RBX: ffff8832ed1a9ec0 RCX: 0000000000000006
>> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
>> RBP: ffff883fff083c20 R8: 0000000000000086 R9: 000000000000178f
>> R10: ffffffff816aeb00 R11: ffff883fff08392e R12: ffff8802f0525620
>> R13: ffff88407ffc01d8 R14: 0000000000000000 R15: 0000000000000003
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
>> #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
>> #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
>> #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
>>
>> Reported-by: Lei Xue <carmark.dlut@...il.com>
>> Reported-by: Vegard Nossum <vegard.nossum@...il.com>
>> Reported-by: Anthony DeRobertis <aderobertis@...rics.net>
>> Reported-by: NeilBrown <neilb@...e.com>
>> Reported-by: Daniel Axtens <dja@...ens.net>
>> Reported-by: KiranKumar Modukuri <kiran.modukuri@...il.com>
>> Signed-off-by: David Howells <dhowells@...hat.com>
>> ---
[...]
> Thanks - I like this approach. Taking the extra reference makes it a
> lot more clear what is happening and why.
The changelog is a bit sparse, no? We have more info here:
https://lkml.org/lkml/2018/5/8/520
https://lkml.org/lkml/2018/7/3/1184
Why not crib some of that and explain the issue properly (or at
minimum link the previous threads)?
Thanks,
Vegard
Powered by blists - more mailing lists