netdev - Re: [PATCH] net: sched: use-after-free in tcf_action

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFC++j15p9Ey3qc4ZsY4CXBsL3LHn7TsFTi6=N9=H+_Yx_k=+Q@mail.gmail.com>
Date: Sat, 17 Aug 2024 17:27:17 +0800
From: Alex Young <alex000young@...il.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	xiyou.wangcong@...il.com, jiri@...nulli.us, davem@...emloft.net, 
	security@...nel.org, xkaneiki@...il.com, hackerzheng666@...il.com
Subject: Re: [PATCH] net: sched: use-after-free in tcf_action_destroy

Hi Jamal,

Thanks your mention. I have reviewed the latest kernel code.
I understand why these two tc function threads can enter the kernel at the same
time. It's because the request_module[2] function in tcf_action_init_1. When the
tc_action_init_1 function to add a new action, it will load the action
module. It will
call rtnl_unlock to let the Thread2 into the kernel space.

Thread1                                                 Thread2
rtnetlink_rcv_msg                                   rtnetlink_rcv_msg
 rtnl_lock();
 tcf_action_init
  for(i;i<TCA_ACT_MAX_PRIO;i++)
   act=tcf_action_init_1 //[1]
        if (rtnl_held)
           rtnl_unlock(); //[2]
        request_module("act_%s", act_name);

                                                                tcf_del_walker

idr_for_each_entry_ul(idr,p,id)

__tcf_idr_release(p,false,true)

 free_tcf(p) //[3]
if (rtnl_held)
rtnl_lock();

   if(IS_ERR(act))
    goto err
   actions[i] = act

  err:
   tcf_action_destroy
    a=actions[i]
    ops = a->ops //[4]
I know this time window is small, but it can indeed cause the bug. And
in the latest
kernel, it have fixed the bug. But version 4.19.x is still a
maintenance version.
Is there anyone who can introduce this change into version 4.19.

Best regards,
Alex

Jamal Hadi Salim <jhs@...atatu.com> 于2024年8月16日周五 23:04写道：

>
> On Fri, Aug 16, 2024 at 12:06 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > On Thu, Aug 15, 2024 at 9:54 PM yangzhuorao <alex000young@...il.com> wrote:
> > >
> > > There is a uaf bug in net/sched/act_api.c.
> > > When Thread1 call [1] tcf_action_init_1 to alloc act which saves
> > > in actions array. If allocation failed, it will go to err path.
> > > Meanwhile thread2 call tcf_del_walker to delete action in idr.
> > > So thread 1 in err path [3] tcf_action_destroy will cause
> > > use-after-free read bug.
> > >
> > > Thread1                            Thread2
> > >  tcf_action_init
> > >   for(i;i<TCA_ACT_MAX_PRIO;i++)
> > >    act=tcf_action_init_1 //[1]
> > >    if(IS_ERR(act))
> > >     goto err
> > >    actions[i] = act
> > >                                    tcf_del_walker
> > >                                     idr_for_each_entry_ul(idr,p,id)
> > >                                      __tcf_idr_release(p,false,true)
> > >                                       free_tcf(p) //[2]
> > >   err:
> > >    tcf_action_destroy
> > >     a=actions[i]
> > >     ops = a->ops //[3]
> > >
> > > We add lock and unlock in tcf_action_init and tcf_del_walker function
> > > to aviod race condition.
> > >
>
> Hi Alex,
>
> Thanks for your valiant effort, unfortunately there's nothing to fix
> here for the current kernels.
> For your edification:
>
> This may have been an issue on the 4.x kernels you ran but i dont see
> a valid codepath that would create the kernel parallelism scenario you
> mentioned above (currently or ever actually). Kernel entry is
> syncronized from user space via the rtnetlink lock - meaning you cant
> have two control plane threads (as you show in your nice diagram above
> in your commit) entering from user space in parallel to trigger the
> bug.
>
> I believe the reason it happens in 4.x is there is likely a bug(hand
> waving here) where within a short window upon a) creating a batch of
> actions of the same kind b) followed by partial updates of said action
> you then c) flush that action kind. Theory is the flush will trigger
> the bug. IOW, it is not parallel but rather a single entry. I didnt
> have time to look but whatever this bug is was most certainly fixed at
> some point - perhaps nobody bothered to backport. If this fix is so
> important to you please dig into newer kernels and backport.
>
> There are other technical issues with your solution but I hope the above helps.
> The repro doesnt cause any issues in newer kernels - but please verify
> for yourself as well.
>
> So from me, I am afraid, this is a nack
>
> cheers,
> jamal