linux-kernel - Re: [PATCH v2] mm/kmemleak: Don't hold kmemleak

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <de42b70a-c69c-4777-ab07-2921d34ecb85@redhat.com>
Date: Wed, 27 Mar 2024 14:58:43 -0400
From: Waiman Long <longman@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Audra Mitchell <aubaker@...hat.com>
Subject: Re: [PATCH v2] mm/kmemleak: Don't hold kmemleak_lock when calling
 printk()

On 3/27/24 13:43, Catalin Marinas wrote:
> On Thu, Mar 07, 2024 at 11:46:30AM -0800, Andrew Morton wrote:
>> On Thu,  7 Mar 2024 13:47:07 -0500 Waiman Long <longman@...hat.com> wrote:
>>> When some error conditions happen (like OOM), some kmemleak functions
>>> call printk() to dump out some useful debugging information while holding
>>> the kmemleak_lock. This may cause deadlock as the printk() function
>>> may need to allocate additional memory leading to a create_object()
>>> call acquiring kmemleak_lock again.
>>>
>>> An abbreviated lockdep splat is as follows:
>>>
>>> ...
>>>
>>> Fix this deadlock issue by making sure that printk() is only called
>>> after releasing the kmemleak_lock.
>>>
>>> ...
>>>
>>> @@ -427,9 +442,19 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
>>>   		else if (untagged_objp == untagged_ptr || alias)
>>>   			return object;
>>>   		else {
>>> +			if (!get_object(object))
>>> +				break;
>>> +			/*
>>> +			 * Release kmemleak_lock temporarily to avoid deadlock
>>> +			 * in printk(). dump_object_info() is called without
>>> +			 * holding object->lock (race unlikely).
>>> +			 */
>>> +			raw_spin_unlock(&kmemleak_lock);
>>>   			kmemleak_warn("Found object by alias at 0x%08lx\n",
>>>   				      ptr);
>>>   			dump_object_info(object);
>>> +			put_object(object);
>>> +			raw_spin_lock(&kmemleak_lock);
>>>   			break;
>> Please include a full description of why this is safe.  Once we've
>> dropped that lock, the tree is in an unknown state and we shouldn't
>> touch it again.  This consideration should be added to the relevant
>> functions' interface documentation and the code should be reviewed to
>> ensure that we're actually adhering to this.  Or something like that.
>>
>> To simply drop and reacquire a lock without supporting analysis and
>> comments does not inspire confidence!
> I agree it looks fragile. I think it works, the code tends to bail out
> on those errors and doesn't expect the protected data to have remained
> intact. But we may change it in the future and forgot about this.
>
> I wonder whether we can actually make things slightly easier to reason
> about, defer the printing until unlock, store the details in some
> per-cpu variable. Another option would be to have a per-CPU array to
> store potential recursive kmemleak_*() callbacks during the critical
> regions. This should be bounded since the interrupts are disabled. On
> unlock, we'd replay the array and add those pointers.

It looks like most of the callers of __lookup_object() will bail out 
when an error happen. So there should be no harm in temporarily 
releasing the lock. However, I do agree that it is fragile and future 
changes may break it. This patch certainly need more work.

Cheers,
Longman

>