lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <gwO399q2_pH73eDkd2OatN9SsR8aHtm8CbVIEPOf0nuxXjqbjY6vgCqNyFya75mMVicn8NwUgG8zaNFkR_JuZmFk0UcWNCHlpzSxFbaycf4=@1g4.org>
Date: Tue, 20 Jan 2026 22:47:19 +0000
From: Paul Moses <p@....org>
To: Victor Nogueira <victor@...atatu.com>
Cc: netdev@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, "David S . Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/2] net/sched: act_gate: fix schedule updates with RCU swap

act_gate is not just a list and a timer. It is a small state machine that keeps runtime gating insulated from control plane updates using tcf_lock, the hrtimer, and the PENDING bit. The live decision variables are protected, but the schedule backing store (the entries list) was still being mutated or freed on the control path, so updates were not transactional and could expose partially constructed schedules or freed nodes to concurrent readers. The fix is to treat the schedule as an immutable published object: validate and build off to the side, publish atomically, then retire after readers drain (RCU). That preserves the original goal of not interfering with the gate.

I’m trying to strike the right balance of input validation for stable. I kept this as one patch because the validate and build before publish path is part of making the RCU update model correct. I did try implementing the additional validation independently first, but it did not meaningfully reduce the size or complexity because the RCU swap still requires a fully formed schedule object before publish. Without preliminary validation, it is easy to corrupt the timer state or end up with undefined schedule behavior. At minimum, I think we need:
  - interval > 0: prevents zero length entries and immediate refire loops
  - at least one entry: avoids empty schedule deref and undefined state
  - cycle time > 0: required for division and close time computations
  - overflow checks: prevent wrap or underflow in close time arithmetic

I can do the conversion without these checks, but the resulting behavior would be fragile. If there is guidance on what validation level you would prefer here (minimal representable range checks vs stricter sane inputs), I am happy to align with it.

thanks
Paul


On Tuesday, January 20th, 2026 at 3:04 PM, Victor Nogueira <victor@...atatu.com> wrote:

> 
> 
> On 19/01/2026 21:48, Paul Moses wrote:
> 
> > Switch act_gate parameters to an RCU-protected pointer and update schedule
> > changes using a prepare-then-swap pattern. This avoids races between the
> > timer/data paths and configuration updates, and cancels the hrtimer
> > before swapping schedules.
> > 
> > A gate action replace could free and swap schedules while the hrtimer
> > callback or data path still dereferences the old entries, leaving a
> > use-after-free window during updates. The deferred swap and RCU free
> > close that window. A reproducer is available on request.
> > 
> > Also clear params on early error for newly created actions to avoid
> > leaving a dangling reference.
> > [...]
> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> > index c1f75f2727576..3ee07c3deaf97 100644
> > --- a/net/sched/act_gate.c
> > +++ b/net/sched/act_gate.c
> > @@ -6,6 +6,7 @@
> > #include <linux/kernel.h>
> > #include <linux/string.h>
> > #include <linux/errno.h>
> > +#include <linux/limits.h>
> 
> 
> Do you really need to include this?
> 
> > [...]
> > @@ -69,12 +71,14 @@ static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
> > {
> > struct tcf_gate *gact = container_of(timer, struct tcf_gate,
> > hitimer);
> > - struct tcf_gate_params *p = &gact->param;
> > + struct tcf_gate_params *p;
> 
> 
> When adding/editing local variables, you should adhere to the
> reverse xmas tree style [1].
> 
> > spin_lock(&gact->tcf_lock);
> 
> 
> Shouldn't you call rcu_read_lock before this line now?
> 
> > + p = rcu_dereference_protected(gact->param,
> > + lockdep_is_held(&gact->tcf_lock));
> > [...]
> > static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > @@ -296,20 +296,26 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > struct netlink_ext_ack *extack)
> > {
> > struct tc_action_net *tn = net_generic(net, act_gate_ops.net_id);
> > - enum tk_offsets tk_offset = TK_OFFS_TAI;
> > - bool bind = flags & TCA_ACT_FLAGS_BIND;
> > struct nlattr *tb[TCA_GATE_MAX + 1];
> > struct tcf_chain *goto_ch = NULL;
> > - u64 cycletime = 0, basetime = 0;
> > - struct tcf_gate_params *p;
> > - s32 clockid = CLOCK_TAI;
> > + struct tcf_gate_params *p, *oldp;
> > struct tcf_gate *gact;
> > struct tc_gate *parm;
> > - int ret = 0, err;
> > - u32 gflags = 0;
> > - s32 prio = -1;
> > + struct tcf_gate_params newp = { };
> 
> 
> Abide by reverse xmas tree when adding local variables.
> 
> > [...]
> > + bool clockid_set = false;
> 
> 
> I could be missing something, but I don't believe you need this
> boolean.
> 
> > [...]
> > @@ -323,6 +329,7 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > 
> > if (tb[TCA_GATE_CLOCKID]) {
> > clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> > + clockid_set = true;
> > switch (clockid) {
> 
> 
> Instead of using clockid_set and repeating the switch statament.
> You could put this if-statement after you already have oldp and do the
> following:
> 
> if (tb[TCA_GATE_CLOCKID]) {
> clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> switch (clockid) {
> case CLOCK_REALTIME:
> tk_offset = TK_OFFS_REAL;
> break;
> case CLOCK_MONOTONIC:
> tk_offset = TK_OFFS_MAX;
> break;
> case CLOCK_BOOTTIME:
> tk_offset = TK_OFFS_BOOT;
> break;
> case CLOCK_TAI:
> tk_offset = TK_OFFS_TAI;
> break;
> default:
> NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> return -EINVAL;
> }
> } else if (ret != ACT_P_CREATED) {
> clockid = oldp->tcfg_clockid;
> 
> tk_offset = gact->tk_offset;
> 
> }
> 
> > [...]
> > - if (tb[TCA_GATE_CYCLE_TIME])
> > + if (ret == ACT_P_CREATED)
> > + update_timer = true;
> > [...]
> 
> 
> Here you are assigning update_timer to true when the op is a create...
> 
> > [...]
> > + if (update_timer && ret != ACT_P_CREATED)
> > + hrtimer_cancel(&gact->hitimer);
> 
> 
> .. however in the if-statement where it is used you are only allowing
> updates. This looks weird.
> 
> > [...]
> > +free_p:
> > + release_entry_list(&p->entries);
> > + kfree(p);
> 
> 
> The 2 lines of code above are being repeated below and in
> tcf_gate_params_release. You should put them in a common function.
> 
> > +release_new_entries:
> > + release_entry_list(&newp.entries);
> > +put_chain:
> > if (goto_ch)
> > tcf_chain_put_by_act(goto_ch);
> > release_idr:
> > - /* action is not inserted in any list: it's safe to init hitimer
> > - * without taking tcf_lock.
> > - */
> > - if (ret == ACT_P_CREATED)
> > - gate_setup_timer(gact, gact->param.tcfg_basetime,
> > - gact->tk_offset, gact->param.tcfg_clockid,
> > - true);
> > + if (ret == ACT_P_CREATED) {
> > + p = rcu_dereference_protected(gact->param, 1);
> > + if (p) {
> > + release_entry_list(&p->entries);
> > + kfree(p);
> > + rcu_assign_pointer(gact->param, NULL);
> > + }
> > + }
> > tcf_idr_release(*a, bind);
> 
> 
> Also, the AI review [2] pointed out a real issue.
> It's easy to reproduce by running something like:
> 
> tc action add action gate base-time 200000000000ns \
> sched-entry close 0ns index 10
> 
> I think overall you have the right idea - RCU seems like a good fit here.
> The issue is that this patch is confusing because it seems like you are
> trying to fix the bug and perform cleanups at the same time.
> If that is the case, can you try breaking this patch into two? Do one to
> fix the bug (introducing RCU and etc) and another for the cleanups.
> 
> [1]
> https://www.kernel.org/doc/html/v6.3/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs
> [2]
> https://netdev-ai.bots.linux.dev/ai-review.html?id=cdc17d0d-fd59-41a8-9c8d-1a42699167fd#patch-0
> 
> cheers,
> Victor

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ