linux-kernel - Re: [PATCH v2] kmemleak: Turn kmemleak

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e7795912-7d93-8f4e-b997-67c4ac1f3549@windriver.com>
Date:   Wed, 5 Dec 2018 21:53:37 +0800
From:   He Zhe <zhe.he@...driver.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
CC:     <catalin.marinas@....com>, <tglx@...utronix.de>,
        <rostedt@...dmis.org>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, <linux-rt-users@...r.kernel.org>
Subject: Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT

On 2018/12/1 02:19, Sebastian Andrzej Siewior wrote:
> On 2018-11-24 22:26:46 [+0800], He Zhe wrote:
>> On latest v4.19.1-rt3, both of the call traces can be reproduced with kmemleak
>> enabied. And none can be reproduced with kmemleak disabled.
> okay. So it needs attention.
>
>> On latest mainline tree, none can be reproduced no matter kmemleak is enabled
>> or disabled.
>>
>> I don't get why kfree from a preempt-disabled section should cause a warning
>> without kmemleak, since kfree can't sleep.
> it might. It will acquire a sleeping lock if it has go down to the
> memory allocator to actually give memory back.

Got it. Thanks.

>
>> If I understand correctly, the call trace above is caused by trying to schedule
>> after preemption is disabled, which cannot be reached in mainline kernel. So
>> we might need to turn to use raw lock to keep preemption disabled.
> The buddy-allocator runs with spin locks so it is okay on !RT. So you
> can use kfree() with disabled preemption or disabled interrupts.
> I don't think that we want to use raw-locks in the buddy-allocator.

For call trace 1:

I went through the calling paths inside kfree() and found that there have already
been things using raw lock in it as follow.

1) in the path of kfree() itself
kfree -> slab_free -> do_slab_free -> __slab_free -> raw_spin_lock_irqsave

2) in the path of CONFIG_DEBUG_OBJECTS_FREE
kfree -> slab_free -> slab_free_freelist_hook -> slab_free_hook -> debug_check_no_obj_freed -> __debug_check_no_obj_freed -> raw_spin_lock_irqsave

3) in the path of CONFIG_LOCKDEP
kfree -> __free_pages -> free_unref_page -> free_unref_page_prepare -> free_pcp_prepare -> free_pages_prepare -> debug_check_no_locks_freed -> debug_check_no_locks_freed -> raw_local_irq_save

Since kmemleak would most likely be used to debug in environments where
we would not expect as great performance as without it, and kfree() has raw locks
in its main path and other debug function paths, I suppose it wouldn't hurt that
we change to raw locks.

>
>> >From what I reached above, this is RT-only and happens on v4.18 and v4.19.
>>
>> The call trace above is caused by grabbing kmemleak_lock and then getting
>> scheduled and then re-grabbing kmemleak_lock. Using raw lock can also solve
>> this problem.
> But this is a reader / writer lock. And if I understand the other part
> of the thread then it needs multiple readers.

For call trace 2:

I don't get what "it needs multiple readers" exactly means here.

In this call trace, the kmemleak_lock is grabbed as write lock, and then scheduled
away, and then grabbed again as write lock from another path. It's a
write->write locking, compared to the discussion in the other part of the thread.

This is essentially because kmemleak hooks on the very low level memory
allocation and free operations. After scheduled away, it can easily re-enter itself.
We need raw locks to prevent this from happening.

> Couldn't we just get rid of that kfree() or move it somewhere else?
> I mean if the free() memory on CPU-down and allocate it again CPU-up
> then we could skip that, rigth? Just allocate it and don't free it
> because the CPU will likely get up again.

For call trace 1:

I went through the CPU hotplug code and found that the allocation of the
problematic data, cpuc->shared_regs, is done in intel_pmu_cpu_prepare. And
the free is done in intel_pmu_cpu_dying. They are handlers triggered by two
different perf events.

It seems we can hardly form a convincing method that holds the data while
CPUs are off and then uses it again. raw locks would be easy and good enough.

Thanks,
Zhe

>
>> Thanks,
>> Zhe
> Sebastian
>