[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66e526c8-9d06-460b-b5df-92697634106b@redhat.com>
Date: Tue, 28 Nov 2023 21:35:34 -0500
From: Waiman Long <longman@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
On 11/28/23 11:04, Catalin Marinas wrote:
> On Mon, Nov 27, 2023 at 02:41:53PM -0500, Waiman Long wrote:
>> /**
>> * kmemleak_free_percpu - unregister a previously registered __percpu object
>> * @ptr: __percpu pointer to beginning of the object
>> *
>> * This function is called from the kernel percpu allocator when an object
>> - * (memory block) is freed (free_percpu).
>> + * (memory block) is freed (free_percpu). Since this function is inherently
>> + * slow especially on systems with a large number of CPUs, defer the actual
>> + * removal of kmemleak objects associated with the percpu pointer to a
>> + * workqueue if it is not in a task context.
>> */
>> void __ref kmemleak_free_percpu(const void __percpu *ptr)
>> {
>> - unsigned int cpu;
>> -
>> pr_debug("%s(0x%px)\n", __func__, ptr);
>>
>> - if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
>> - for_each_possible_cpu(cpu)
>> - delete_object_full((unsigned long)per_cpu_ptr(ptr,
>> - cpu));
>> + if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr))
>> + return;
>> +
>> + if (!in_task()) {
>> + struct kmemleak_percpu_addr *addr;
>> +
>> + addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
>> + if (addr) {
>> + INIT_WORK(&addr->work, kmemleak_free_percpu_workfn);
>> + addr->ptr = ptr;
>> + queue_work(system_long_wq, &addr->work);
>> + return;
>> + }
> We can't defer this freeing. It can mess up the kmemleak metadata if the
> per-cpu pointer is re-allocated before kmemleak removed it from its
> object tree.
You are right. In fact, it is possible for kmemleak_free_percpu() be
called from softIRQ context. And if the system has hundreds of CPUs, it
will take a long time to process all the free request.
>
> The problem is looking up the object tree for each per-cpu offset. We
> can make the percpu pointer handling O(1) since freeing is only done by
> the main __percpu pointer, so that's the only one needing a look-up. So
> far the per-cpu pointers are not tracked for leaking, only scanned.
>
> We could just add the per_cpu_ptr(ptr, 0) to the kmemleak
> object_tree_root but when scanning we don't have an inverse function to
> get the __percpu pointer back and calculate the pointers for the other
> CPUs (well, we could with some hacks but they are probably fragile).
We could keep a separate tree to track the percpu area. We will know the
max percpu offset in each percpu area. The base of the percpu area is
just per_cpu_ptr(0, cpu).
>
> What I came up with is a separate object_percpu_tree_root similar to the
> object_phys_tree_root. The only reason for these additional trees is to
> look up the kmemleak metadata when needed (usually freeing). They don't
> contain objects that are tracked for actual leaking, only scanned. A
> briefly tested patch below. I need to go through it again, update some
> comments and write a commit log:
That sounds like a good idea like what I have said above. I will do a
more careful review of the change tomorrow as it is getting late for me
today.
Cheers,
Longman
Powered by blists - more mailing lists