linux-kernel - Re: BUG in alloc_workqueue (linux-next)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyBrm0iowGdX8=NDr=tBG8qM8rke2ouxWVhJRTP+pxXGJw@mail.gmail.com>
Date:   Fri, 9 Jul 2021 11:59:01 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Pavel Skripkin <paskripkin@...il.com>
Cc:     Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Yang Yingliang <yangyingliang@...wei.com>,
        Xu Qiang <xuqiang36@...wei.com>
Subject: Re: BUG in alloc_workqueue (linux-next)

Hello, Pavel
Thanks for the report.

Huawei (CC-ed) is also dealing with the problem:
https://lore.kernel.org/lkml/20210708093136.2195752-1-yangyingliang@huawei.com/t/#u


Could you have a try on the fix, please?

Thanks
Lai

On Thu, Jul 8, 2021 at 9:24 PM Pavel Skripkin <paskripkin@...il.com> wrote:

>
> I've spent some time trying to came up with a fix, but I gave
> up :( But! I have an idea about what's happening, maybe it will help
> somehow...
>
>
> So, all 3 reports have same stack trace: alloc_workqueue() in
> loop_configure(). I skimmed through syzbot's log and found, that syzbot injected
> failure into alloc_unbound_pwq() in all 3 cases:
>
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> CPU: 1 PID: 17986 Comm: syz-executor.0 Tainted: G        W         5.13.0-next-20210706 #9
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> Call Trace:
>    dump_stack_lvl (lib/dump_stack.c:106 (discriminator 4))
>    should_fail.cold (lib/fault-inject.c:52 lib/fault-inject.c:146)
>    should_failslab (mm/slab_common.c:1327)
>    kmem_cache_alloc_node (mm/slab.h:487 mm/slub.c:2902 mm/slub.c:3017)
>    ? alloc_unbound_pwq (kernel/workqueue.c:3813)
>    alloc_unbound_pwq (kernel/workqueue.c:3813)
>    apply_wqattrs_prepare (kernel/workqueue.c:3963)
>    apply_workqueue_attrs_locked (kernel/workqueue.c:4041)
>    alloc_workqueue (kernel/workqueue.c:4078 kernel/workqueue.c:4201 kernel/workqueue.c:4309)
>
>
> So, if alloc_unbound_pwq() fails, apply_wqattrs_prepare() will jump to
> this code:
>
> out_free:
>         free_workqueue_attrs(tmp_attrs);
>         free_workqueue_attrs(new_attrs);
>         apply_wqattrs_cleanup(ctx);     <----|
>         return NULL;                         |
>                                              |
> put_pwq_unlocked() -> put_pwq() -> schedule_work(&pwq->unbound_release_work);
>
>
> and apply_wqattrs_cleanup() will schedule pwq_unbound_release_workfn()
> [2], but alloc_workqueue() will free workqueue_struct in case of
> alloc_unbound_pwq() error [1]. In that case we will get UAF in pwq_unbound_release_workfn()
> like in 3rd report.
>
>
> Does written above make some sence? :)
>
>
>
> With regards,
> Pavel Skripkin