[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=qvBxXS_1eheyhCKbNMRbK_qTTFMa1fFBFQp_hRbzpQQ@mail.gmail.com>
Date: Fri, 16 Aug 2024 11:04:25 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: yangzhuorao <alex000young@...il.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
xiyou.wangcong@...il.com, jiri@...nulli.us, davem@...emloft.net,
security@...nel.org, xkaneiki@...il.com, hackerzheng666@...il.com
Subject: Re: [PATCH] net: sched: use-after-free in tcf_action_destroy
On Fri, Aug 16, 2024 at 12:06 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Thu, Aug 15, 2024 at 9:54 PM yangzhuorao <alex000young@...il.com> wrote:
> >
> > There is a uaf bug in net/sched/act_api.c.
> > When Thread1 call [1] tcf_action_init_1 to alloc act which saves
> > in actions array. If allocation failed, it will go to err path.
> > Meanwhile thread2 call tcf_del_walker to delete action in idr.
> > So thread 1 in err path [3] tcf_action_destroy will cause
> > use-after-free read bug.
> >
> > Thread1 Thread2
> > tcf_action_init
> > for(i;i<TCA_ACT_MAX_PRIO;i++)
> > act=tcf_action_init_1 //[1]
> > if(IS_ERR(act))
> > goto err
> > actions[i] = act
> > tcf_del_walker
> > idr_for_each_entry_ul(idr,p,id)
> > __tcf_idr_release(p,false,true)
> > free_tcf(p) //[2]
> > err:
> > tcf_action_destroy
> > a=actions[i]
> > ops = a->ops //[3]
> >
> > We add lock and unlock in tcf_action_init and tcf_del_walker function
> > to aviod race condition.
> >
Hi Alex,
Thanks for your valiant effort, unfortunately there's nothing to fix
here for the current kernels.
For your edification:
This may have been an issue on the 4.x kernels you ran but i dont see
a valid codepath that would create the kernel parallelism scenario you
mentioned above (currently or ever actually). Kernel entry is
syncronized from user space via the rtnetlink lock - meaning you cant
have two control plane threads (as you show in your nice diagram above
in your commit) entering from user space in parallel to trigger the
bug.
I believe the reason it happens in 4.x is there is likely a bug(hand
waving here) where within a short window upon a) creating a batch of
actions of the same kind b) followed by partial updates of said action
you then c) flush that action kind. Theory is the flush will trigger
the bug. IOW, it is not parallel but rather a single entry. I didnt
have time to look but whatever this bug is was most certainly fixed at
some point - perhaps nobody bothered to backport. If this fix is so
important to you please dig into newer kernels and backport.
There are other technical issues with your solution but I hope the above helps.
The repro doesnt cause any issues in newer kernels - but please verify
for yourself as well.
So from me, I am afraid, this is a nack
cheers,
jamal
Powered by blists - more mailing lists