[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de42b70a-c69c-4777-ab07-2921d34ecb85@redhat.com>
Date: Wed, 27 Mar 2024 14:58:43 -0400
From: Waiman Long <longman@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Audra Mitchell <aubaker@...hat.com>
Subject: Re: [PATCH v2] mm/kmemleak: Don't hold kmemleak_lock when calling
printk()
On 3/27/24 13:43, Catalin Marinas wrote:
> On Thu, Mar 07, 2024 at 11:46:30AM -0800, Andrew Morton wrote:
>> On Thu, 7 Mar 2024 13:47:07 -0500 Waiman Long <longman@...hat.com> wrote:
>>> When some error conditions happen (like OOM), some kmemleak functions
>>> call printk() to dump out some useful debugging information while holding
>>> the kmemleak_lock. This may cause deadlock as the printk() function
>>> may need to allocate additional memory leading to a create_object()
>>> call acquiring kmemleak_lock again.
>>>
>>> An abbreviated lockdep splat is as follows:
>>>
>>> ...
>>>
>>> Fix this deadlock issue by making sure that printk() is only called
>>> after releasing the kmemleak_lock.
>>>
>>> ...
>>>
>>> @@ -427,9 +442,19 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
>>> else if (untagged_objp == untagged_ptr || alias)
>>> return object;
>>> else {
>>> + if (!get_object(object))
>>> + break;
>>> + /*
>>> + * Release kmemleak_lock temporarily to avoid deadlock
>>> + * in printk(). dump_object_info() is called without
>>> + * holding object->lock (race unlikely).
>>> + */
>>> + raw_spin_unlock(&kmemleak_lock);
>>> kmemleak_warn("Found object by alias at 0x%08lx\n",
>>> ptr);
>>> dump_object_info(object);
>>> + put_object(object);
>>> + raw_spin_lock(&kmemleak_lock);
>>> break;
>> Please include a full description of why this is safe. Once we've
>> dropped that lock, the tree is in an unknown state and we shouldn't
>> touch it again. This consideration should be added to the relevant
>> functions' interface documentation and the code should be reviewed to
>> ensure that we're actually adhering to this. Or something like that.
>>
>> To simply drop and reacquire a lock without supporting analysis and
>> comments does not inspire confidence!
> I agree it looks fragile. I think it works, the code tends to bail out
> on those errors and doesn't expect the protected data to have remained
> intact. But we may change it in the future and forgot about this.
>
> I wonder whether we can actually make things slightly easier to reason
> about, defer the printing until unlock, store the details in some
> per-cpu variable. Another option would be to have a per-CPU array to
> store potential recursive kmemleak_*() callbacks during the critical
> regions. This should be bounded since the interrupts are disabled. On
> unlock, we'd replay the array and add those pointers.
It looks like most of the callers of __lookup_object() will bail out
when an error happen. So there should be no harm in temporarily
releasing the lock. However, I do agree that it is fragile and future
changes may break it. This patch certainly need more work.
Cheers,
Longman
>
Powered by blists - more mailing lists