lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+h21hrCScMMA9cm0fhF+eLRWda403pX=t3PKRoBhkE8rrR-rw@mail.gmail.com>
Date:   Tue, 16 Jun 2020 13:38:35 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Davide Caratti <dcaratti@...hat.com>
Cc:     Po Liu <Po.Liu@....com>, Cong Wang <xiyou.wangcong@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net v2 2/2] net/sched: act_gate: fix configuration of the
 periodic timer

Hi Davide,

On Tue, 16 Jun 2020 at 13:13, Davide Caratti <dcaratti@...hat.com> wrote:
>
> hello Vladimir,
>
> thanks a lot for reviewing this.
>
> On Tue, 2020-06-16 at 00:55 +0300, Vladimir Oltean wrote:
>
> [...]
>
> > > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> > > index 6775ccf355b0..3c529a4bcca5 100644
> > > --- a/net/sched/act_gate.c
> > > +++ b/net/sched/act_gate.c
> > > @@ -272,6 +272,27 @@ static int parse_gate_list(struct nlattr *list_attr,
> > >         return err;
> > >  }
> > >
> > > +static void gate_setup_timer(struct tcf_gate *gact, u64 basetime,
> > > +                            enum tk_offsets tko, s32 clockid,
> > > +                            bool do_init)
> > > +{
> > > +       if (!do_init) {
> > > +               if (basetime == gact->param.tcfg_basetime &&
> > > +                   tko == gact->tk_offset &&
> > > +                   clockid == gact->param.tcfg_clockid)
> > > +                       return;
> > > +
> > > +               spin_unlock_bh(&gact->tcf_lock);
> > > +               hrtimer_cancel(&gact->hitimer);
> > > +               spin_lock_bh(&gact->tcf_lock);
> >
> > I think it's horrible to do this just to get out of atomic context.
> > What if you split the "replace" functionality of gate_setup_timer into
> > a separate gate_cancel_timer function, which you could call earlier
> > (before taking the spin lock)?
>
> I think it would introduce the following 2 problems:
>
> problem #1) a race condition, see below:
>

I must have been living under a stone since I missed the entire
unlocked tc filter rework done by Vlad Buslov, I thought that
tcf_action_init always runs under rtnl. So it is clear now.

> > That change would look like this:
> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> > index 3c529a4bcca5..47c625a0e70c 100644
> > --- a/net/sched/act_gate.c
> > +++ b/net/sched/act_gate.c
> > @@ -273,19 +273,8 @@ static int parse_gate_list(struct nlattr *list_attr,
> >  }
> >
> >  static void gate_setup_timer(struct tcf_gate *gact, u64 basetime,
> > -                 enum tk_offsets tko, s32 clockid,
> > -                 bool do_init)
> > +                 enum tk_offsets tko, s32 clockid)
> >  {
> > -    if (!do_init) {
> > -        if (basetime == gact->param.tcfg_basetime &&
> > -            tko == gact->tk_offset &&
> > -            clockid == gact->param.tcfg_clockid)
> > -            return;
> > -
> > -        spin_unlock_bh(&gact->tcf_lock);
> > -        hrtimer_cancel(&gact->hitimer);
> > -        spin_lock_bh(&gact->tcf_lock);
> > -    }
> >      gact->param.tcfg_basetime = basetime;
> >      gact->param.tcfg_clockid = clockid;
> >      gact->tk_offset = tko;
> > @@ -293,6 +282,17 @@ static void gate_setup_timer(struct tcf_gate
> > *gact, u64 basetime,
> >      gact->hitimer.function = gate_timer_func;
> >  }
> >
> > +static void gate_cancel_timer(struct tcf_gate *gact, u64 basetime,
> > +                  enum tk_offsets tko, s32 clockid)
> > +{
> > +    if (basetime == gact->param.tcfg_basetime &&
> > +        tko == gact->tk_offset &&
> > +        clockid == gact->param.tcfg_clockid)
> > +        return;
> > +
> > +    hrtimer_cancel(&gact->hitimer);
> > +}
> > +
>
> the above function either cancels a timer, or does nothing: it depends on
> the value of the 3-ple {tcfg_basetime, tk_offset, tcfg_clockid}. If we run
> this function without holding tcf_lock, nobody will guarantee that
> {tcfg_basetime, tk_offset, tcfg_clockid} is not being concurrently
> rewritten by some other command like:
>
> # tc action replace action gate <parameters> index <x>
>
> >  static int tcf_gate_init(struct net *net, struct nlattr *nla,
> >               struct nlattr *est, struct tc_action **a,
> >               int ovr, int bind, bool rtnl_held,
> > @@ -381,6 +381,8 @@ static int tcf_gate_init(struct net *net, struct
> > nlattr *nla,
> >      gact = to_gate(*a);
> >      if (ret == ACT_P_CREATED)
> >          INIT_LIST_HEAD(&gact->param.entries);
> > +    else
> > +        gate_cancel_timer(gact, basetime, tk_offset, clockid);
> >
>
> IOW, the above line is racy unless we do spin_lock()/spin_unlock() around
> the
>
> if (<expression depending on gact-> members>)
>         return;
>
> statement before hrtimer_cancel(), which does not seem much different
> than what I did in gate_setup_timer().
>
> [...]
>
> > @@ -433,6 +448,11 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > >         if (goto_ch)
> > >                 tcf_chain_put_by_act(goto_ch);
> > >  release_idr:
> > > +       /* action is not in: hitimer can be inited without taking tcf_lock */
> > > +       if (ret == ACT_P_CREATED)
> > > +               gate_setup_timer(gact, gact->param.tcfg_basetime,
> > > +                                gact->tk_offset, gact->param.tcfg_clockid,
> > > +                                true);
>
> please note, here I felt the need to add a comment, because when ret ==
> ACT_P_CREATED the action is not inserted in any list, so there is no
> concurrent writer of gact-> members for that action.
>

Then please rephrase the comment. I had read it and it still wasn't
clear at all for me what you were talking about.

> > >         tcf_idr_release(*a, bind);
> > >         return err;
> > >  }
>
> problem #2) a functional issue that originates in how 'cycle_time' and
> 'entries' are validated (*). See below:
>
> On Tue, 2020-06-16 at 00:55 +0300, Vladimir Oltean wrote:
>
> > static int tcf_gate_init(struct net *net, struct nlattr *nla,
> >               struct nlattr *est, struct tc_action **a,
> >               int ovr, int bind, bool rtnl_held,
> > @@ -381,6 +381,8 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> >      gact = to_gate(*a);
> >      if (ret == ACT_P_CREATED)
> >          INIT_LIST_HEAD(&gact->param.entries);
> > +    else
> > +        gate_cancel_timer(gact, basetime, tk_offset, clockid);
>
> here you propose to cancel the timer, but few lines after we have this:
>
> 385         err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> 386         if (err < 0)
> 387                 goto release_idr;
> 388
>
> so, when users try the following commands:
>
> # tc action add action gate <good parameters> index 2
> # tc action replace action gate <other good parameters> goto chain 42 index 2
>
> and chain 42 does not exist, the second command will fail. But the timer
> is erroneously stopped, and never started again. So, the first rule is
> correctly inserted but it becomes no more functional after users try to
> replace it with another one having invalid control action.
>

Yes, correct.

> Moving the call to gate_cancel_timer() after the validation of the control
> action will not fix this problem, because 'cycle_time' and 'entries' are
> validated together, and with the spinlock taken. Because of this, we need
> to cancel that timer only when we know that we will not do
> tcf_idr_release() and return some error to the user.
>
> please let me know if you think my doubts are not well-founded.
>
> --
> davide
>
> (*) now that I see parse_gate_list() again, I noticed another potential
> issue with replace (that I need to verify first): apparently the list is
> not replaced, it's just "updated" with new entries appended at the end. I
> will try to write a fix for that (separate from this series).
>
>

I wonder, could you call tcf_gate_cleanup instead of just canceling the hrtimer?

Thanks,
-Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ