[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <474b0519-b354-4370-84ac-411fd3d6d14b@suse.cz>
Date: Sat, 21 Sep 2024 23:25:55 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Guenter Roeck <linux@...ck-us.net>,
KUnit Development <kunit-dev@...glegroups.com>,
Brendan Higgins <brendanhiggins@...gle.com>, David Gow <davidgow@...gle.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
Christoph Lameter <cl@...ux.com>, David Rientjes <rientjes@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
Julia Lawall <Julia.Lawall@...ia.fr>, Jakub Kicinski <kuba@...nel.org>,
"Jason A. Donenfeld" <Jason@...c4.com>,
"Uladzislau Rezki (Sony)" <urezki@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
Alexander Potapenko <glider@...gle.com>, Marco Elver <elver@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>, kasan-dev@...glegroups.com,
Jann Horn <jannh@...gle.com>, Mateusz Guzik <mjguzik@...il.com>
Subject: Re: [PATCH v2 7/7] kunit, slub: add test_kfree_rcu() and
test_leak_destroy()
On 9/21/24 23:08, Guenter Roeck wrote:
> On 9/21/24 13:40, Vlastimil Babka wrote:
>> +CC kunit folks
>>
>> On 9/20/24 15:35, Guenter Roeck wrote:
>>> Hi,
>>
>> Hi,
>>
>>> On Wed, Aug 07, 2024 at 12:31:20PM +0200, Vlastimil Babka wrote:
>>>> Add a test that will create cache, allocate one object, kfree_rcu() it
>>>> and attempt to destroy it. As long as the usage of kvfree_rcu_barrier()
>>>> in kmem_cache_destroy() works correctly, there should be no warnings in
>>>> dmesg and the test should pass.
>>>>
>>>> Additionally add a test_leak_destroy() test that leaks an object on
>>>> purpose and verifies that kmem_cache_destroy() catches it.
>>>>
>>>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>>>
>>> This test case, when run, triggers a warning traceback.
>>>
>>> kmem_cache_destroy TestSlub_kfree_rcu: Slab cache still has objects when called from test_leak_destroy+0x70/0x11c
>>> WARNING: CPU: 0 PID: 715 at mm/slab_common.c:511 kmem_cache_destroy+0x1dc/0x1e4
>>
>> Yes that should be suppressed like the other slub_kunit tests do. I have
>> assumed it's not that urgent because for example the KASAN kunit tests all
>> produce tons of warnings and thus assumed it's in some way acceptable for
>> kunit tests to do.
>>
>
> I have all tests which generate warning backtraces disabled. Trying to identify
> which warnings are noise and which warnings are on purpose doesn't scale,
> so it is all or nothing for me. I tried earlier to introduce a patch series
> which would enable selective backtrace suppression, but that died the death
> of architecture maintainers not caring and people demanding it to be perfect
> (meaning it only addressed WARNING: backtraces and not BUG: backtraces,
> and apparently that wasn't good enough).
Ah, didn't know, too bad.
> If the backtrace is intentional (and I think you are saying that it is),
> I'll simply disable the test. That may be a bit counter-productive, but
> there is really no alternative for me.
It's intentional in the sense that the test intentionally triggers a
condition that normally produces a warning. Many if the slub kunit test do
that, but are able to suppress printing the warning when it happens in the
kunit context. I forgot to do that for the new test initially as the warning
there happens from a different path that those that already have the kunit
suppression, but we'll implement that suppression there too ASAP.
>>> That is, however, not the worst of it. It also causes boot stalls on
>>> several platforms and architectures (various arm platforms, arm64,
>>> loongarch, various ppc, and various x86_64). Reverting it fixes the
>>> problem. Bisect results are attached for reference.
>>
>> OK, this part is unexpected. I assume you have the test built-in and not a
>> module, otherwise it can't affect boot? And by stall you mean a delay or a
>
> Yes.
>
>> complete lockup? I've tried to reproduce that with virtme, but it seemed
>> fine, maybe it's .config specific?
>
> It is a complete lockup.
>
>>
>> I do wonder about the placement of the call of kunit_run_all_tests() from
>> kernel_init_freeable() as that's before a bunch of initialization finishes.
>>
>> For example, system_state = SYSTEM_RUNNING; and rcu_end_inkernel_boot() only
>> happens later in kernel_init(). I wouldn't be surprised if that means
>> calling kfree_rcu() or rcu_barrier() or kvfree_rcu_barrier() as part of the
>> slub tests is too early.
>>
>> Does the diff below fix the problem? Any advice from kunit folks? I could
>> perhaps possibly make the slab test module-only instead of tristate or do
>> some ifdef builtin on the problematic tests, but maybe it wouldn't be
>> necessary with kunit_run_all_tests() happening a bit later.
>>
>
> It does, at least based on my limited testing. However, given that the
> backtrace is apparently intentional, it doesn't really matter - I'll disable
> the test instead.
Right, thanks. I will notify you when the suppression is done so the test
can be re-enabled.
But as the lockup should not be caused by the warning itself, we will still
have to address it, as others configuring kunit tests built-in might see the
lockup, or you would again see it as well, once the warning is addressed and
test re-enabled.
Your testing suggests moving the kunit_run_all_tests() call might be the
right fix so let's see what kunit folks think. I would hope that no other
kunit test would become broken by being executed later in boot, because when
compiled as modules, it already happens even much later already.
Thanks!
Vlastimil
> Thanks,
> Guenter
>
Powered by blists - more mailing lists