lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABk29Nuuq6s1+FBftOPAcMkYU+F1n2nebcP5tDK9dH4_KXA2cw@mail.gmail.com>
Date: Sat, 15 Mar 2025 20:25:53 -0700
From: Josh Don <joshdon@...gle.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>, 
	K Prateek Nayak <kprateek.nayak@....com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Mel Gorman <mgorman@...e.de>, Chengming Zhou <chengming.zhou@...ux.dev>, 
	Chuyi Zhou <zhouchuyi@...edance.com>, Xi Wang <xii@...gle.com>
Subject: Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle

Hi Aaron,

>  static int tg_throttle_down(struct task_group *tg, void *data)
>  {
>         struct rq *rq = data;
>         struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> +       struct task_struct *p;
> +       struct rb_node *node;
> +
> +       cfs_rq->throttle_count++;
> +       if (cfs_rq->throttle_count > 1)
> +               return 0;
>
>         /* group is entering throttled state, stop time */
> -       if (!cfs_rq->throttle_count) {
> -               cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> -               list_del_leaf_cfs_rq(cfs_rq);
> +       cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
> +       list_del_leaf_cfs_rq(cfs_rq);
>
> -               SCHED_WARN_ON(cfs_rq->throttled_clock_self);
> -               if (cfs_rq->nr_queued)
> -                       cfs_rq->throttled_clock_self = rq_clock(rq);
> +       SCHED_WARN_ON(cfs_rq->throttled_clock_self);
> +       if (cfs_rq->nr_queued)
> +               cfs_rq->throttled_clock_self = rq_clock(rq);
> +
> +       WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
> +       /*
> +        * rq_lock is held, current is (obviously) executing this in kernelspace.
> +        *
> +        * All other tasks enqueued on this rq have their saved PC at the
> +        * context switch, so they will go through the kernel before returning
> +        * to userspace. Thus, there are no tasks-in-userspace to handle, just
> +        * install the task_work on all of them.
> +        */
> +       node = rb_first(&cfs_rq->tasks_timeline.rb_root);
> +       while (node) {
> +               struct sched_entity *se = __node_2_se(node);
> +
> +               if (!entity_is_task(se))
> +                       goto next;
> +
> +               p = task_of(se);
> +               task_throttle_setup_work(p);
> +next:
> +               node = rb_next(node);
> +       }

I'd like to strongly push back on this approach. This adds quite a lot
of extra computation to an already expensive path
(throttle/unthrottle). e.g. this function is part of the cgroup walk
and so it is already O(cgroups) for the number of cgroups in the
hierarchy being throttled. This gets even worse when you consider that
we repeat this separately across all the cpus that the
bandwidth-constrained group is running on. Keep in mind that
throttle/unthrottle is done with rq lock held and IRQ disabled.

In K Prateek's last RFC, there was discussion of using context
tracking; did you consider that approach any further? We could keep
track of the number of threads within a cgroup hierarchy currently in
kernel mode (similar to h_nr_runnable), and thus simplify down the
throttling code here.

Best,
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ