linux-kernel - Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <816870e9-9354-ffbd-936b-40e38e4276a4@codethink.co.uk>
Date:   Fri, 12 Mar 2021 16:34:23 +0000
From:   Ben Dooks <ben.dooks@...ethink.co.uk>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     syzbot <syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Benjamin Segall <bsegall@...gle.com>, dietmar.eggemann@....com,
        Juri Lelli <juri.lelli@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [syzbot] BUG: unable to handle kernel access to user memory in
 schedule_tail

On 12/03/2021 16:30, Ben Dooks wrote:
> On 12/03/2021 15:12, Dmitry Vyukov wrote:
>> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.dooks@...ethink.co.uk> 
>> wrote:
>>>
>>> On 10/03/2021 17:16, Dmitry Vyukov wrote:
>>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot
>>>> <syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    0d7588ab riscv: process: Fix no prototype for 
>>>>> arch_dup_tas..
>>>>> git tree:       
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
>>>>> console output: 
>>>>> https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
>>>>> kernel config:  
>>>>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
>>>>> dashboard link: 
>>>>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
>>>>> userspace arch: riscv64
>>>>>
>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>
>>>>> IMPORTANT: if you fix the issue, please add the following tag to 
>>>>> the commit:
>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com
>>>>
>>>> +riscv maintainers
>>>>
>>>> This is riscv64-specific.
>>>> I've seen similar crashes in put_user in other places. It looks like
>>>> put_user crashes in the user address is not mapped/protected (?).
>>>
>>> I've been having a look, and this seems to be down to access of the
>>> tsk->set_child_tid variable. I assume the fuzzing here is to pass a
>>> bad address to clone?
>>>
>>>   From looking at the code, the put_user() code should have set the
>>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the
>>> s2 register in the crash report) and from looking at the compiler
>>> output from my gcc-10, the code looks to be dong the relevant csrs
>>> and then csrc around the put_user
>>>
>>> So currently I do not understand how the above could have happened
>>> over than something re-tried the code seqeunce and ended up retrying
>>> the faulting instruction without the SR_SUM bit set.
>>
>> I would maybe blame qemu for randomly resetting SR_SUM, but it's
>> strange that 99% of these crashes are in schedule_tail. If it would be
>> qemu, then they would be more evenly distributed...
>>
>> Another observation: looking at a dozen of crash logs, in none of
>> these cases fuzzer was actually trying to fuzz clone with some insane
>> arguments. So it looks like completely normal clone's (e..g coming
>> from pthread_create) result in this crash.
>>
>> I also wonder why there is ret_from_exception, is it normal? I see
>> handle_exception disables SR_SUM:
>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73 
>>
> 
> So I think if SR_SUM is set, then it faults the access to user memory
> which the _user() routines clear to allow them access.
> 
> I'm thinking there is at least one issue here:
> 
> - the test in fault is the wrong way around for die kernel
> - the handler only catches this if the page has yet to be mapped.
> 
> So I think the test should be:
> 
>          if (!user_mode(regs) && addr < TASK_SIZE &&
>                          unlikely(regs->status & SR_SUM)
> 
> This then should continue on and allow the rest of the handler to
> complete mapping the page if it is not there.
> 
> I have been trying to create a very simple clone test, but so far it
> has yet to actually trigger anything.

I should have added there doesn't seem to be a good way to use mmap()
to allocate memory but not insert a vm-mapping post the mmap().


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html