lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210204220427.GM2743@paulmck-ThinkPad-P72>
Date:   Thu, 4 Feb 2021 14:04:27 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Uladzislau Rezki (Sony)" <urezki@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Daniel Axtens <dja@...ens.net>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [PATCH 2/2] kvfree_rcu: Use same set of flags as for
 single-argument

On Fri, Jan 29, 2021 at 09:05:05PM +0100, Uladzislau Rezki (Sony) wrote:
> Running an rcuscale stress-suite can lead to "Out of memory"
> of a system. This can happen under high memory pressure with
> a small amount of physical memory.
> 
> For example a KVM test configuration with 64 CPUs and 512 megabytes
> can lead to of memory after running rcuscale with below parameters:
> 
> ../kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig CONFIG_NR_CPUS=64 \
> --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 \
>   rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" --trust-make
> 
> <snip>
> [   12.054448] kworker/1:1H invoked oom-killer: gfp_mask=0x2cc0(GFP_KERNEL|__GFP_NOWARN), order=0, oom_score_adj=0
> [   12.055303] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
> [   12.055416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> [   12.056485] Workqueue: events_highpri fill_page_cache_func
> [   12.056485] Call Trace:
> [   12.056485]  dump_stack+0x57/0x6a
> [   12.056485]  dump_header+0x4c/0x30a
> [   12.056485]  ? del_timer_sync+0x20/0x30
> [   12.056485]  out_of_memory.cold.47+0xa/0x7e
> [   12.056485]  __alloc_pages_slowpath.constprop.123+0x82f/0xc00
> [   12.056485]  __alloc_pages_nodemask+0x289/0x2c0
> [   12.056485]  __get_free_pages+0x8/0x30
> [   12.056485]  fill_page_cache_func+0x39/0xb0
> [   12.056485]  process_one_work+0x1ed/0x3b0
> [   12.056485]  ? process_one_work+0x3b0/0x3b0
> [   12.060485]  worker_thread+0x28/0x3c0
> [   12.060485]  ? process_one_work+0x3b0/0x3b0
> [   12.060485]  kthread+0x138/0x160
> [   12.060485]  ? kthread_park+0x80/0x80
> [   12.060485]  ret_from_fork+0x22/0x30
> [   12.062156] Mem-Info:
> [   12.062350] active_anon:0 inactive_anon:0 isolated_anon:0
> [   12.062350]  active_file:0 inactive_file:0 isolated_file:0
> [   12.062350]  unevictable:0 dirty:0 writeback:0
> [   12.062350]  slab_reclaimable:2797 slab_unreclaimable:80920
> [   12.062350]  mapped:1 shmem:2 pagetables:8 bounce:0
> [   12.062350]  free:10488 free_pcp:1227 free_cma:0
> ...
> [   12.101610] Out of memory and no killable processes...
> [   12.102042] Kernel panic - not syncing: System is deadlocked on memory
> [   12.102583] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
> [   12.102600] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> <snip>
> 
> Having a fallback mechanism we should not go with "GFP_KERNEL | __GFP_NOWARN"
> that implies a "hard" page request involving OOM killer. Replace such set with
> the same as the one used for a single argument.
> 
> Thus it will follow same rules:
>     a) minimize a fallback hitting;
>     b) avoid of OOM invoking;
>     c) do a light-wait page request;
>     d) avoid of dipping into the emergency reserves.
> 
> With this change an rcuscale and the parameters which are in question
> never runs into "Kernel panic".
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>

I did have some misgivings about this one, but after a closer look at
the GFP flags you suggest along with offlist discussions it looks like
what needs to happen.  So thank you for persisting!  ;-)

I did the usual wordsmithing as shown below, so please check to make
sure that I did not mess anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit f60c420f9de536e7fc397f0ad8d6677d782d0141
Author: Uladzislau Rezki (Sony) <urezki@...il.com>
Date:   Fri Jan 29 21:05:05 2021 +0100

    kvfree_rcu: Use same set of GFP flags as does single-argument
    
    Running an rcuscale stress-suite can lead to "Out of memory" of a
    system. This can happen under high memory pressure with a small amount
    of physical memory.
    
    For example, a KVM test configuration with 64 CPUs and 512 megabytes
    can result in OOM when running rcuscale with below parameters:
    
    ../kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig CONFIG_NR_CPUS=64 \
    --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 \
      rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" --trust-make
    
    <snip>
    [   12.054448] kworker/1:1H invoked oom-killer: gfp_mask=0x2cc0(GFP_KERNEL|__GFP_NOWARN), order=0, oom_score_adj=0
    [   12.055303] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
    [   12.055416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
    [   12.056485] Workqueue: events_highpri fill_page_cache_func
    [   12.056485] Call Trace:
    [   12.056485]  dump_stack+0x57/0x6a
    [   12.056485]  dump_header+0x4c/0x30a
    [   12.056485]  ? del_timer_sync+0x20/0x30
    [   12.056485]  out_of_memory.cold.47+0xa/0x7e
    [   12.056485]  __alloc_pages_slowpath.constprop.123+0x82f/0xc00
    [   12.056485]  __alloc_pages_nodemask+0x289/0x2c0
    [   12.056485]  __get_free_pages+0x8/0x30
    [   12.056485]  fill_page_cache_func+0x39/0xb0
    [   12.056485]  process_one_work+0x1ed/0x3b0
    [   12.056485]  ? process_one_work+0x3b0/0x3b0
    [   12.060485]  worker_thread+0x28/0x3c0
    [   12.060485]  ? process_one_work+0x3b0/0x3b0
    [   12.060485]  kthread+0x138/0x160
    [   12.060485]  ? kthread_park+0x80/0x80
    [   12.060485]  ret_from_fork+0x22/0x30
    [   12.062156] Mem-Info:
    [   12.062350] active_anon:0 inactive_anon:0 isolated_anon:0
    [   12.062350]  active_file:0 inactive_file:0 isolated_file:0
    [   12.062350]  unevictable:0 dirty:0 writeback:0
    [   12.062350]  slab_reclaimable:2797 slab_unreclaimable:80920
    [   12.062350]  mapped:1 shmem:2 pagetables:8 bounce:0
    [   12.062350]  free:10488 free_pcp:1227 free_cma:0
    ...
    [   12.101610] Out of memory and no killable processes...
    [   12.102042] Kernel panic - not syncing: System is deadlocked on memory
    [   12.102583] CPU: 1 PID: 377 Comm: kworker/1:1H Not tainted 5.11.0-rc3+ #510
    [   12.102600] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
    <snip>
    
    Because kvfree_rcu() has a fallback path, memory allocation failure is
    not the end of the world.  Furthermore, the added overhead of aggressive
    GFP settings must be balanced against the overhead of the fallback path,
    which is a cache miss for double-argument kvfree_rcu() and a call to
    synchronize_rcu() for single-argument kvfree_rcu().  The current choice
    of GFP_KERNEL|__GFP_NOWARN can result in longer latencies than a call
    to synchronize_rcu(), so less-tenacious GFP flags would be helpful.
    
    Here is the tradeoff that must be balanced:
        a) Minimize use of the fallback path,
        b) Avoid pushing the system into OOM,
        c) Bound allocation latency to that of synchronize_rcu(), and
        d) Leave the emergency reserves to use cases lacking fallbacks.
    
    This commit therefore changes GFP flags from GFP_KERNEL|__GFP_NOWARN to
    GFP_KERNEL|__GFP_NORETRY|__GFP_NOMEMALLOC|__GFP_NOWARN.  This combination
    leaves the emergency reserves alone and can initiate reclaim, but will
    not invoke the OOM killer.
    
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1e86212..2c9cf4d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3435,7 +3435,7 @@ static void fill_page_cache_func(struct work_struct *work)
 
 	for (i = 0; i < rcu_min_cached_objs; i++) {
 		bnode = (struct kvfree_rcu_bulk_data *)
-			__get_free_page(GFP_KERNEL | __GFP_NOWARN);
+			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
 		if (bnode) {
 			raw_spin_lock_irqsave(&krcp->lock, flags);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ