linux-kernel - Re: [syzbot] kernel panic: corrupted stack end in openat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACT4Y+Y+x=Lj6dC+ozvttq_RNrSUzX+pQmyg8N9ELokr5ce0Mg@mail.gmail.com>
Date:   Wed, 17 Mar 2021 09:50:20 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Russell King - ARM Linux admin <linux@...linux.org.uk>,
        syzbot <syzbot+0b06ef9b44d00d600183@...kaller.appspotmail.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Uwe Kleine-König 
        <u.kleine-koenig@...gutronix.de>
Subject: Re: [syzbot] kernel panic: corrupted stack end in openat

On Wed, Mar 17, 2021 at 9:32 AM Arnd Bergmann <arnd@...db.de> wrote:
> > > > > <linux@...linux.org.uk> wrote:
> > > > > > On Tue, Mar 16, 2021 at 04:44:45PM +0100, Arnd Bergmann wrote:
> > > > > > > On Tue, Mar 16, 2021 at 11:17 AM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > > > > > > > The compiler is gcc version 10.2.1 20210110 (Debian 10.2.1-6)
> > > > > > >
> > > > > > > Ok, building with Ubuntu 10.2.1-1ubuntu1 20201207 locally, that's
> > > > > > > the closest I have installed, and I think the Debian and Ubuntu versions
> > > > > > > are generally quite close in case of gcc since they are maintained by
> > > > > > > the same packagers.
> > > > > >
> > > > > > ... which shouldn't be a problem - that's just over 1/4 of the stack
> > > > > > space. Could it be the syzbot's gcc is doing something weird and
> > > > > > inflating the stack frames?
> > > > >
> > > > > It's possible, I think that's really unlikely given that it's just Debian's
> > > > > gcc, which is as close to mainline as the version I was using.
> > > > >
> > > > > Uwe's DEBUG_STACKOVERFLOW patch from a while ago might
> > > > > help if this was the problem though:
> > > > > https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koenig@pengutronix.de/
> > > > >
> > > > > My best guess is something going wrong in the interrupt
> > > > > that triggered the preempt_schedule() which ended up calling
> > > > > task_stack_end_corrupted() in schedule_debug(), as you suggested
> > > > > earlier.
> > > >
> > > > FWIW I see slightly larger frames with the config:
> > > >
> > > > 073ab64 <ima_calc_field_array_hash_tfm>:
> > > > 8073ab64:       e1a0c00d        mov     ip, sp
> > > > 8073ab68:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl,
> > > > fp, ip, lr, pc}
> > > > 8073ab6c:       e24cb004        sub     fp, ip, #4
> > > > 8073ab70:       e24ddfa7        sub     sp, sp, #668    ; 0x29c
> > >
> > > Yes, this is the one that the compiler complained about when warning
> > > for stack over 600 bytes. It's not called in this call chain though.
> > >
> > > > page_alloc can also do reclaim, I had the impression that reclaim can
> > > > be quite heavy-weight in all respects.
> > >
> > > Yes, that is another possibility. What writable file systems or swap
> > > do you normally have mounted that it could be writing to, and on
> > > what storage device?
> >
> > The root fs is ext4 on virtio-blk.
> >
> > There are also several dozens of shrinkers that can be called during reclaim:
> > https://elixir.bootlin.com/linux/latest/C/ident/unregister_shrinker
>
> Right, unfortunately I don't see a smoking gun there either, unless you are
> also using NFS or devicemapper.
>
> Implementing VMAP_STACK as you suggested earlier is probably the
> best way to figure out if there is an actual overrun of the stack.
> Alternatively, adding support for GCC_PLUGIN_STACKLEAK might
> also help find out if we ever get close to the limit. This is probably
> less work, but it might not actually help in this case.

VMAP_STACK is quite intrusive as far as I understand. For KASAN I
considered a simpler option: have a debug config that allocates an
extra page after the stack and mprotect's it. It wastes a physical
page per task (fine for a debug config), but I would assume should be
radically simpler to implement. In the end somebody implemented proper
VMAP_STACK support for KASAN, but I still think it may be a reasonable
compromise between time investment and value.