linux-kernel - Re: [PATCH 3/3] kfence: test: try to avoid test_gfpzero trigger rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1dfeea09-cd4a-39fc-18f4-775bec99afa4@linuxfoundation.org>
Date:   Wed, 9 Mar 2022 14:39:28 -0700
From:   Shuah Khan <skhan@...uxfoundation.org>
To:     Peng Liu <liupeng256@...wei.com>, brendanhiggins@...gle.com,
        glider@...gle.com, elver@...gle.com, dvyukov@...gle.com,
        akpm@...ux-foundation.org, linux-kselftest@...r.kernel.org,
        kunit-dev@...glegroups.com, linux-kernel@...r.kernel.org,
        kasan-dev@...glegroups.com, linux-mm@...ck.org
Cc:     wangkefeng.wang@...wei.com, Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH 3/3] kfence: test: try to avoid test_gfpzero trigger
 rcu_stall

On 3/8/22 6:47 PM, Peng Liu wrote:
> When CONFIG_KFENCE_DYNAMIC_OBJECTS is set to a big number, kfence
> kunit-test-case test_gfpzero will eat up nearly all the CPU's
> resources and rcu_stall is reported as the following log which is
> cut from a physical server.
> 
>    rcu: INFO: rcu_sched self-detected stall on CPU
>    rcu: 	68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002
>    softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019)
>    Task dump for CPU 68:
>    task:kunit_try_catch state:R  running task
>    stack:    0 pid: 9728 ppid:     2 flags:0x0000020a
>    Call trace:
>     dump_backtrace+0x0/0x1e4
>     show_stack+0x20/0x2c
>     sched_show_task+0x148/0x170
>     ...
>     rcu_sched_clock_irq+0x70/0x180
>     update_process_times+0x68/0xb0
>     tick_sched_handle+0x38/0x74
>     ...
>     gic_handle_irq+0x78/0x2c0
>     el1_irq+0xb8/0x140
>     kfree+0xd8/0x53c
>     test_alloc+0x264/0x310 [kfence_test]
>     test_gfpzero+0xf4/0x840 [kfence_test]
>     kunit_try_run_case+0x48/0x20c
>     kunit_generic_run_threadfn_adapter+0x28/0x34
>     kthread+0x108/0x13c
>     ret_from_fork+0x10/0x18
> 
> To avoid rcu_stall and unacceptable latency, a schedule point is
> added to test_gfpzero.
> 
> Signed-off-by: Peng Liu <liupeng256@...wei.com>
> ---
>   mm/kfence/kfence_test.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
> index caed6b4eba94..1b50f70a4c0f 100644
> --- a/mm/kfence/kfence_test.c
> +++ b/mm/kfence/kfence_test.c
> @@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *test)
>   			kunit_warn(test, "giving up ... cannot get same object back\n");
>   			return;
>   		}
> +		cond_resched();

This sounds like a band-aid - is there a better way to fix this?

>   	}
>   
>   	for (i = 0; i < size; i++)
> 

thanks,
-- Shuah