netdev - Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7748be1-8b0b-6613-159f-1c45f932c617@redhat.com>
Date:   Wed, 30 Jan 2019 21:48:27 -0500
From:   Waiman Long <longman@...hat.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net,
        daniel@...earbox.net, edumazet@...gle.com, jannh@...gle.com,
        netdev@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap

On 01/30/2019 09:01 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote:
>> On 01/30/2019 04:11 PM, Waiman Long wrote:
>>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>>>> Lockdep warns about false positive:
>>>>>>> This is not a false positive, and you probably also need to use
>>>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>>>
>>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>>>> relies on 'owner' tracking.
>>>>>> Can you point out in the code the spin bit?
>>>>>> As far as I can see sem->owner is debug only feature.
>>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>>>> The debugging part is just an add-on. It is not the reason for the
>>>>> presence of sem->owner.
>>>> I see. Got it.
>>>>
>>>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>>>> with this patch today.
>>>>>>
>>>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>>>> should be easy to do.
>>>> Yes, but looking through the code it's not clear to me that it's safe
>>>> to mix non_owner() versions with regular.
>>>> bpf/stackmap.c does down_read_trylock + up_read.
>>>> If we add new down_read_trylock_non_owner that set the owner to
>>>> NULL | RWSEM_* bits is this safe with conccurent read/write
>>>> that do regular versions?
>>>> rwsem_can_spin_on_owner() does:
>>>>         if (owner) {
>>>>                 ret = is_rwsem_owner_spinnable(owner) &&
>>>>                       owner_on_cpu(owner);
>>>> that looks correct.
>>>> For a second I thought there could be fault here due to non_owner.
>>>> But there could be other places where it's assumed that owner
>>>> is never null?
>>> The content of owner is not the cause of the lockdep warning. The
>>> lockdep code assumes that the task that acquires the lock will release
>>> it some time later. That is not the case when you need to acquire the
>>> lock by one task and released by another. In this case, you have to use
>>> the non_owner version of down/up_read which disable the lockdep
>>> acquire/release tracking. That will be the only difference between the
>>> two set of APIs.
>>>
>>> Cheers,
>>> Longman
>> BTW, you may want to do something like that to make sure that the lock
>> ownership is probably tracked.
>>
>> -Longman
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index d43b145..79eef9d 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
>> bpf_stack_
>>         } else {
>>                 work->sem = &current->mm->mmap_sem;
>>                 irq_work_queue(&work->irq_work);
>> +
>> +               /*
>> +                * The irq_work will release the mmap_sem with
>> +                * up_read_non_owner(). The rwsem_release() is called
>> +                * here to release the lock from lockdep's perspective.
>> +                */
>> +               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
> are you saying the above is enough coupled with up_read_non_owner?
> Looking at how amdgpu is using this api... I think they just use non_owner
> version when doing things from different task.
> So I don't think pairing non_owner with non_owner is strictly necessary.
> It seems fine to use down_read_trylock() with above rwsem_release() hack
> plus up_read_non_owner().
> Am I missing something?
>
The statement above is to clear the state for the lockdep so that it
knows that the task no longer owns the lock. Without doing that, there
is a possibility of some other kind of incorrect lockdep warning may be
produced because the code will still think the task hold a read-lock on
the mmap_sem. It is also possible no warning will be reported.

The bpf code is kind of special that it acquires the mmap_sem. Later on,
it either releases it itself (non-NMI) or request irq_work to release it
(NMI). So either, you use the _non_owner versions for both acquire and
release or fake the release like the code segment above.

Peter, please chime in if you have other suggestion.

Cheers,
Longman