[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <597352bb-6afa-4fa4-a5ee-1f0aa14e61be@redhat.com>
Date: Fri, 1 Dec 2023 17:22:26 -0500
From: Waiman Long <longman@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>, linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] kmemleak: Avoid RCU stalls when freeing metadata for
per-CPU pointers
On 12/1/23 14:08, Catalin Marinas wrote:
> On systems with large number of CPUs, the following soft lockup splat
> might sometimes happen:
>
> [ 2656.001617] watchdog: BUG: soft lockup - CPU#364 stuck for 21s! [ksoftirqd/364:2206]
> :
> [ 2656.141194] RIP: 0010:_raw_spin_unlock_irqrestore+0x3d/0x70
> :
> 2656.241214] Call Trace:
> [ 2656.243971] <IRQ>
> [ 2656.246237] ? show_trace_log_lvl+0x1c4/0x2df
> [ 2656.251152] ? show_trace_log_lvl+0x1c4/0x2df
> [ 2656.256066] ? kmemleak_free_percpu+0x11f/0x1f0
> [ 2656.261173] ? watchdog_timer_fn+0x379/0x470
> [ 2656.265984] ? __pfx_watchdog_timer_fn+0x10/0x10
> [ 2656.271179] ? __hrtimer_run_queues+0x5f3/0xd00
> [ 2656.276283] ? __pfx___hrtimer_run_queues+0x10/0x10
> [ 2656.281783] ? ktime_get_update_offsets_now+0x95/0x2c0
> [ 2656.287573] ? ktime_get_update_offsets_now+0xdd/0x2c0
> [ 2656.293380] ? hrtimer_interrupt+0x2e9/0x780
> [ 2656.298221] ? __sysvec_apic_timer_interrupt+0x184/0x640
> [ 2656.304211] ? sysvec_apic_timer_interrupt+0x8e/0xc0
> [ 2656.309807] </IRQ>
> [ 2656.312169] <TASK>
> [ 2656.326110] kmemleak_free_percpu+0x11f/0x1f0
> [ 2656.331015] free_percpu.part.0+0x1b/0xe70
> [ 2656.335635] free_vfsmnt+0xb9/0x100
> [ 2656.339567] rcu_do_batch+0x3c8/0xe30
> [ 2656.363693] rcu_core+0x3de/0x5a0
> [ 2656.367433] __do_softirq+0x2d0/0x9a8
> [ 2656.381119] run_ksoftirqd+0x36/0x60
> [ 2656.385145] smpboot_thread_fn+0x556/0x910
> [ 2656.394971] kthread+0x2a4/0x350
> [ 2656.402826] ret_from_fork+0x29/0x50
> [ 2656.406861] </TASK>
>
> The issue is caused by kmemleak registering each per_cpu_ptr()
> corresponding to the __percpu pointer. This is unnecessary since such
> individual per-CPU pointers are not tracked anyway. Create a new
> object_percpu_tree_root rbtree that stores a single __percpu pointer
> together with an OBJECT_PERCPU flag for the kmemleak metadata. Scanning
> needs to be done for all per_cpu_ptr() pointers with a cond_resched()
> between each CPU iteration to avoid RCU stalls.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@....com>
> Reported-by: Waiman Long <longman@...hat.com>
> Link: https://lore.kernel.org/r/20231127194153.289626-1-longman@redhat.com
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
> The only difference from the inlined patch I posted previously is some updated
> comments to include the new object tree.
>
> mm/kmemleak.c | 178 +++++++++++++++++++++++++++-----------------------
> 1 file changed, 97 insertions(+), 81 deletions(-)
>
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 1eacca03bedd..eb6cdc3e9af2 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -14,17 +14,15 @@
> * The following locks and mutexes are used by kmemleak:
> *
> * - kmemleak_lock (raw_spinlock_t): protects the object_list as well as
> - * del_state modifications and accesses to the object_tree_root (or
> - * object_phys_tree_root). The object_list is the main list holding the
> - * metadata (struct kmemleak_object) for the allocated memory blocks.
> - * The object_tree_root and object_phys_tree_root are red
> - * black trees used to look-up metadata based on a pointer to the
> - * corresponding memory block. The object_phys_tree_root is for objects
> - * allocated with physical address. The kmemleak_object structures are
> - * added to the object_list and object_tree_root (or object_phys_tree_root)
> - * in the create_object() function called from the kmemleak_alloc() (or
> - * kmemleak_alloc_phys()) callback and removed in delete_object() called from
> - * the kmemleak_free() callback
> + * del_state modifications and accesses to the object trees
> + * (object_tree_root, object_phys_tree_root, object_percpu_tree_root). The
> + * object_list is the main list holding the metadata (struct
> + * kmemleak_object) for the allocated memory blocks. The object trees are
> + * red black trees used to look-up metadata based on a pointer to the
> + * corresponding memory block. The kmemleak_object structures are added to
> + * the object_list and the object tree root in the create_object() function
> + * called from the kmemleak_alloc() (or kmemleak_alloc_phys()) callback and
> + * removed in delete_object() called from the kmemleak_free() callback
Just a minor nit. For completeness, should we mention
kmemleak_alloc_percpu() and kmemleak_free_percpu() here?
Anyway, I won't mind if you want to keep it as it is.
Reviewed-by: Waiman Long <longman@...hat.com>
Powered by blists - more mailing lists