linux-kernel - Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2267b15a-6ed1-f24e-32fe-7f905c532b00@ghiti.fr>
Date:   Tue, 16 Jun 2020 15:18:21 -0400
From:   Alex Ghiti <alex@...ti.fr>
To:     Palmer Dabbelt <palmer@...belt.com>, atishp@...shpatra.org
Cc:     Bjorn Topel <bjorn.topel@...il.com>, shorne@...il.com,
        walken@...gle.com, linux-riscv@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V

Le 6/16/20 à 2:07 PM, Palmer Dabbelt a écrit :
> On Tue, 16 Jun 2020 10:54:51 PDT (-0700), atishp@...shpatra.org wrote:
>> On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse <walken@...gle.com> 
>> wrote:
>>>
>>> I am also unable to reproduce the issue so far.
>>>
>>> I wanted to point to a few things in case this helps:
>>> - Commit 42fc541404f2 was bisected as the cause. This commit changes
>>> walk_page_range_novma() to use mmap_assert_locked() instead of
>>> lockdep_assert_held()
>>> - mmap_assert_locked() checks lockdep_assert_held(), but also checks
>>> that the rwsem itself is locked.
>>>
>>> Now how could lockdep think the lock is held, but the lock itself is
>>> not marked as locked ???
>>>
>>> I'm not sure if it helps at all, but a few commits earlier,
>>> 0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
>>> exclusively by stackmap, and does the opposite: it acquires the mmap
>>> lock without telling lockdep about it. I can't see any smoking gun
>>> linking this to our bug, but I thought it may be worth mentioning as
>>> it involves the same suspects (stackmap and the difference between
>>> owning the lock vs lockdep thinking we own the lock).
>>>
>>> I'm sorry, that's only how far I was able to go on this bug - I'm not
>>> sure how to investigate it further as I can not reproduce the issue...
>>>
>>> On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt <palmer@...belt.com> 
>>> wrote:
>>> >
>>> > On Mon, 15 Jun 2020 21:51:08 PDT (-0700), shorne@...il.com wrote:
>>> > > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
>>> > >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
>>> > >> > Hi,
>>> > >> > I encountered the following issue while booting 5.8-rc1 on 
>>> Qemu for RV64.
>>> > >> > I added additional dump_stack and observed that it's 
>>> happening in bpf free path.
>>> > >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
>>> > >> > compiled away without that.
>>> > >> > 
>>> ------------------------------------------------------------------------ 
>>>
>>> > >> > forked to background, child pid 113
>>> > >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
>>> > >> > 5.8.0-rc1-dirty #732
>>> > >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
>>> > >> > [   10.334133] Call Trace:
>>> > >> > [   10.338039] [<ffffffe000202698>] walk_stackframe+0x0/0xa4
>>> > >> > [   10.339988] [<ffffffe000202880>] show_stack+0x2e/0x3a
>>> > >> > [   10.340902] [<ffffffe00047074c>] dump_stack+0x72/0x8c
>>> > >> > [   10.341451] [<ffffffe0002db4ce>] 
>>> mmap_assert_locked.part.13+0x14/0x1c
>>> > >> > [   10.342131] [<ffffffe0002db330>] 
>>> walk_page_range_novma+0x0/0x4e
>>> > >> > [   10.342973] [<ffffffe000204f94>] 
>>> set_direct_map_invalid_noflush+0x66/0x6e
>>> > >> > [   10.343917] [<ffffffe0002e0706>] __vunmap+0xe8/0x212
>>> > >> > [   10.344680] [<ffffffe0002e0882>] __vfree+0x22/0x6e
>>> > >> > [   10.345270] [<ffffffe0002e0902>] vfree+0x34/0x56
>>> > >> > [   10.345834] [<ffffffe00027d752>] __bpf_prog_free+0x2c/0x36
>>> > >> > [   10.346529] [<ffffffe0002801a2>] 
>>> bpf_prog_free_deferred+0x74/0x8a
>>> > >> > [   10.347394] [<ffffffe000219c70>] process_one_work+0x13a/0x272
>>> > >> > [   10.348239] [<ffffffe00021a4b4>] worker_thread+0x50/0x2e4
>>> > >> > [   10.348900] [<ffffffe00021ed98>] kthread+0xfc/0x10a
>>> > >> > [   10.349470] [<ffffffe0002013da>] ret_from_exception+0x0/0xc
>>> > >> > [   10.354405] mm ffffffe001018600 mmap 0000000000000000 
>>> seqnum 0 task_size 0
>>> > >> > [   10.354405] get_unmapped_area 0000000000000000
>>> > >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
>>> > >> > [   10.354405] pgd ffffffe001074000 mm_users 2 mm_count 1
>>> > >> > pgtables_bytes 8192 map_count 0
>>> > >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
>>> > >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
>>> > >> > [   10.354405] start_code ffffffe000200000 end_code 
>>> ffffffe00084acc2
>>> > >> > start_data 0 end_data ffffffe00106dfe4
>>> > >> > [   10.354405] start_brk 0 brk ffffffe0010bd6d0 start_stack 0
>>> > >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
>>> > >> > [   10.354405] binfmt 0000000000000000 flags 0 core_state 
>>> 0000000000000000
>>> > >> > [   10.354405] ioctx_table 0000000000000000
>>> > >> > [   10.354405] exe_file 0000000000000000
>>> > >> > [   10.354405] tlb_flush_pending 0
>>> > >> > [   10.354405] def_flags: 0x0()
>>> > >> > [   10.369325] ------------[ cut here ]------------
>>> > >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
>>> > >> > [   10.375235] Kernel BUG [#1]
>>> > >> > [   10.377198] Modules linked in:
>>> > >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 
>>> 5.8.0-rc1-dirty #732
>>> > >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
>>> > >> > [   10.381270] epc: ffffffe0002db4d4 ra : ffffffe0002db4d4 sp 
>>> : ffffffe3eaea7c70
>>> > >> > [   10.382561]  gp : ffffffe00106d950 tp : ffffffe3ef752f80 t0 :
>>> > >> > ffffffe0010836e8
>>> > >> > [   10.383996]  t1 : 0000000000000064 t2 : 0000000000000000 s0 :
>>> > >> > ffffffe3eaea7c90
>>> > >> > [   10.385119]  s1 : ffffffe001018600 a0 : 0000000000000289 a1 :
>>> > >> > 0000000000000020
>>> > >> > [   10.386099]  a2 : 0000000000000005 a3 : 0000000000000000 a4 :
>>> > >> > ffffffe001012758
>>> > >> > [   10.387294]  a5 : 0000000000000000 a6 : 0000000000000102 a7 :
>>> > >> > 0000000000000006
>>> > >> > [   10.388265]  s2 : ffffffe3f00674c0 s3 : ffffffe00106e108 s4 :
>>> > >> > ffffffe00106e100
>>> > >> > [   10.389250]  s5 : ffffffe00106e908 s6 : 0000000000000000 s7 :
>>> > >> > 6db6db6db6db6db7
>>> > >> > [   10.390272]  s8 : 0000000000000001 s9 : ffffffe00021a4f8 s10:
>>> > >> > ffffffffffffffff
>>> > >> > [   10.391293]  s11: ffffffe3f0066600 t3 : 000000000001a7a8 t4 :
>>> > >> > 000000000001a7a8
>>> > >> > [   10.392314]  t5 : 0000000000000000 t6 : ffffffe00107b76b
>>> > >> > [   10.393096] status: 0000000000000120 badaddr: 
>>> 0000000000000000
>>> > >> > cause: 0000000000000003
>>> > >> > [   10.397755] ---[ end trace 861659596ac28841 ]---
>>> > >> > 
>>> ---------------------------------------------------------------------------------------------------
>>> > >> >
>>> > >> > I haven't had the chance to bisect to figure out which commit 
>>> caused
>>> > >> > the issue. Just wanted
>>> > >> > to check if it is a known issue already.
>>> > >>
>>> > >> Hi Atish,
>>> > >>
>>> > >> Note, I am getting the same (just now) when booting v5.8-rc1 on 
>>> OpenRISC.  If
>>> > >> you have any updates please post back.  I will try to look into 
>>> this today or
>>> > >> tomorrow.
>>> > >
>>> > > I have bisected this to, 42fc541404f249778e752ab39c8bc25fcb2dbe1e:
>>> > >
>>> > >   mmap locking API: add mmap_assert_locked() and 
>>> mmap_assert_write_locked()
>>> > >
>>> > > This should have just changed the existing lockdep api's but 
>>> something has
>>> > > changed.  I haven't had time to look at it yet.
>>> > >
>>> > > Ccing: Michel Lespinasse <walken@...gle.com>
>>> >
>>> > This isn't manifesting on boot for me, on either rc1 or that 
>>> commit.  I'm
>>> > running a simple buildroot-based userspace, so I doubt anything is 
>>> triggering
>>> > BPF.  I don't run the BPF selftests, as they're a bit of a pain 
>>> (IIRC they
>>> > don't cross compile and need LLVM) -- does anyone have a userspace 
>>> I can use to
>>> > trigger the bug?
>>>
>> I am also using buildroot based userspace but it's a bit bulky because
>> of my config.
>> You can access it from here:
>> https://wdc.box.com/s/r8j0d5ynp5gr27n2wo124xi9t8fp0tls
>
> I'm up to ~900M trying to get enough of a userspace to run the BPF 
> selftests,
> so this would be way better.  Unfortunately I'm not actually getting the
> crash with your rootfs.
>

I have what looks like the same bug by simply using the TEST_BPF config, 
with a very small rootfs
generated by buildroot if that can help.


>>
>> A defconfig build & boot in Qemu with above userspace is sufficient to
>> trigger the bug.
>>
>> FYI: I noticed the kernel bug message every time during ssh-key
>> generation. Not sure if that is related.
>>
>>>
>>>
>>> -- 
>>> Michel "Walken" Lespinasse
>>> A program is never fully debugged until the last user dies.
>