netdev - Re: [PATCH net] net/sched: sch_taprio: fix possible use-after-free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+42Yk50N+D9KQmm+gvO84Wjnmk8WJa2mk++-kXy5CvEQ@mail.gmail.com>
Date:   Mon, 16 Jan 2023 10:36:28 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        eric.dumazet@...il.com, syzbot <syzkaller@...glegroups.com>,
        Alexander Potapenko <glider@...gle.com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>
Subject: Re: [PATCH net] net/sched: sch_taprio: fix possible use-after-free

On Mon, Jan 16, 2023 at 1:35 AM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote:
> > syzbot reported a nasty crash [1] in net_tx_action() which
> > made little sense until we got a repro.
> >
> > This repro installs a taprio qdisc, but providing an
> > invalid TCA_RATE attribute.
> >
> > qdisc_create() has to destroy the just initialized
> > taprio qdisc, and taprio_destroy() is called.
> >
> > However, the hrtimer used by taprio had already fired,
> > therefore advance_sched() called __netif_schedule().
> >
> > Then net_tx_action was trying to use a destroyed qdisc.
> >
> > We can not undo the __netif_schedule(), so we must wait
> > until one cpu serviced the qdisc before we can proceed.
> >
>
> This workaround looks a bit ugly. I think we _may_ be able to make
> hrtimer_start() as the last step of the initialization, IOW, move other
> validations and allocations before it.
>

taprio_init() detects no error.

So moving around the hrtimer_start() inside it won't help.

The error comes later from a wrong TCA_RATE attempt can then:

static struct Qdisc *qdisc_create(...
...
err = gen_new_estimator(...);
if (err) {
    NL_SET_ERR_MSG(extack, "Failed to generate new estimator");
    goto err_out4;
}

...

err_out4:
qdisc_put_stab(rtnl_dereference(sch->stab));
 if (ops->destroy)
     ops->destroy(sch);
goto err_out3;

This is why we need to make sure ->destroy will fully undo what ->init did,
including the possible fact that the hrtimer already fired.
This seems to be taprio specific.

Or we would need a new method, like   ->post_init(), that should be
called once all steps have been a success.

Or call the hrtimer_start() at first taprio_enqueue(), adding a
conditional in fast path...

> Can you share your reproducer?

Not publicly.

Although I think the bug is clear enough.