linux-kernel - Re: [syzbot] BUG: sleeping function called from invalid context in _copy_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0171777a-2f4b-2b0c-4887-86f6d8563bea@oracle.com>
Date:   Mon, 9 Aug 2021 13:18:17 -0700
From:   Shoaib Rao <rao.shoaib@...cle.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+8760ca6c1ee783ac4abd@...kaller.appspotmail.com>,
        andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
        christian.brauner@...ntu.com, cong.wang@...edance.com,
        daniel@...earbox.net, davem@...emloft.net, edumazet@...gle.com,
        jamorris@...ux.microsoft.com, john.fastabend@...il.com,
        kafai@...com, kpsingh@...nel.org, kuba@...nel.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        netdev@...r.kernel.org, shuah@...nel.org, songliubraving@...com,
        syzkaller-bugs@...glegroups.com, yhs@...com
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in
 _copy_to_iter


On 8/9/21 12:57 PM, Al Viro wrote:
> On Mon, Aug 09, 2021 at 12:16:27PM -0700, Shoaib Rao wrote:
>> On 8/9/21 11:06 AM, Dmitry Vyukov wrote:
>>> On Mon, 9 Aug 2021 at 19:33, Shoaib Rao <rao.shoaib@...cle.com> wrote:
>>>> This seems like a false positive. 1) The function will not sleep because
>>>> it only calls copy routine if the byte is present. 2). There is no
>>>> difference between this new call and the older calls in
>>>> unix_stream_read_generic().
>>> Hi Shoaib,
>>>
>>> Thanks for looking into this.
>>> Do you have any ideas on how to fix this tool's false positive? Tools
>>> with false positives are order of magnitude less useful than tools w/o
>>> false positives. E.g. do we turn it off on syzbot? But I don't
>>> remember any other false positives from "sleeping function called from
>>> invalid context" checker...
>> Before we take any action I would like to understand why the tool does not
>> single out other calls to recv_actor in unix_stream_read_generic(). The
>> context in all cases is the same. I also do not understand why the code
>> would sleep, Let's assume the user provided address is bad, the code will
>> return EFAULT, it will never sleep, if the kernel provided address is bad
>> the system will panic. The only difference I see is that the new code holds
>> 2 locks while the previous code held one lock, but the locks are acquired
>> before the call to copy.
>>
>> So please help me understand how the tool works. Even though I have
>> evaluated the code carefully, there is always a possibility that the tool is
>> correct.
> Huh???
>
> What do you mean "address is bad"?  "Address is inside an area mmapped from
> NFS file".  And it bloody well will sleep on attempt to read the page.
That is exactly what I said :-). There are times when copying 
thread/task may sleep when the page is not there and it does not have to 
be an NFS file, Linux supports mmap without backing memory and page 
faults occur with files all the time. With the bad address I meant that 
the user passes in an incorrect address.
>
> You should never, ever do copy_{to,from}_user() or equivalents while holding
> a spinlock, period.

Yes spinlock should not be held if the process can sleep. In this case 
it wont but there is no way to indicate that. Thanks for pointing that 
out, as the second lock I am holding is indeed a spinlock (it is 
accessed via unix_state_unlock so I missed the spinlock). I will modify 
the code and resubmit. I am glad we found the root cause.

Shoaib