lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFpoUr1zGNf9vTbWjwsfY9E8YBjyE5xJ0SwzLebPiS7b=xz_Zw@mail.gmail.com>
Date:   Thu, 27 May 2021 09:50:42 +0200
From:   Odin Ugedal <odin@...d.al>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Odin Ugedal <odin@...d.al>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        "open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking

Hi,

Did some quick testing with bpftrace, and it looks like it is
"update_tg_cfs_load" that does this;

 kretfunc:attach_entity_load_avg: cfs_rq: 0xffff8a3e6773e400,
se.load_sum: 0, se.load_avg: 0, se->load.weight: 1048576 diff.sum: 0,
diff.load: 0, new_sum: 0, new_load: 0, period_contrib: 821

 // quite some time passes

 kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum
0, old load_avg: 0, new load_sum: 47876096, new load_avg: 1022
tg_load_avg_contrib: 0, period_contrib: 82, on_list: 0, throttled: 0/0
stack:
         bpf_prog_a37cf01922e02958_update_tg_cfs_l+504
         bpf_get_stackid_raw_tp+115
         bpf_prog_a37cf01922e02958_update_tg_cfs_l+504
         bpf_trampoline_6442466486_1+144
         update_tg_cfs_load+5
         update_load_avg+450
         enqueue_entity+91
         enqueue_task_fair+134
         move_queued_task+172
         affine_move_task+1282
         __set_cpus_allowed_ptr+317
         update_tasks_cpumask+70
         update_cpumasks_hier+448
         update_cpumask+345
         cpuset_write_resmask+796
         cgroup_file_write+162
         kernfs_fop_write_iter+284
         new_sync_write+345
         vfs_write+511
         ksys_write+103
         do_syscall_64+51
         entry_SYSCALL_64_after_hwframe+68

 // There might be more stuff happening here
 // __update_load_avg_cfs_rq runs way too often to be able to
 // trace it without filtering, and didn't look into that.

 kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum
48117265, old load_avg: 1025, new load_sum: 0, new load_avg: 2
tg_load_avg_contrib: 1022, period_contrib: 223, on_list: 1, throttled:
1/1 stack:
         bpf_prog_a37cf01922e02958_update_tg_cfs_l+504
         bpf_get_stackid_raw_tp+115
         bpf_prog_a37cf01922e02958_update_tg_cfs_l+504
         bpf_trampoline_6442466486_1+144
         update_tg_cfs_load+5
         update_load_avg+450
         update_blocked_averages+723
         newidle_balance+533
         pick_next_task_fair+57
         __schedule+376
         schedule+91
         schedule_hrtimeout_range_clock+164
         do_sys_poll+1043
         __x64_sys_poll+179
         do_syscall_64+51
         entry_SYSCALL_64_after_hwframe+68

This particular example resulted in a 75/25 share of cpu time for the
two groups.

>From this, it also looks like it doesn't matter if new load_avg is 0 or not
(when new load_sum is 0), since tg_load_avg_contrib will not be properly
removed either way (without this patch). There might be some other precision


> I have added a trace in cfs_rq_is_decayed() but I'm not able to
> reproduce a situation where load_sum == 0 but not load_avg  even with
> the script in the cover letter

Hmm, strange. I am able to reproduce on both v5.12 and v5.13. I am running on
a non SMT 4 core CPU (4670k), with CONFIG_HZ=250. Since this is timing
sensitive,
you might need to tweak the timings for either the CFS bw period/quota, or the
time waiting between changing what cpu it runs on. If you have more than 4 cpus,
you can also try to start on CPU 0 and move it through all cores and
then onto CPU n.
Maybe that makes it possible to reproduce.

Since bpftrace doesn't work properly in v5.13, this particular test was
on v1.12 with cherry-pick of 0258bdfaff5bd13c4d2383150b7097aecd6b6d82 and
the other patch in this patchset.

testscript.bt:

  kfunc:attach_entity_load_avg {@...um[tid] =
args->cfs_rq->avg.load_sum; @a_load[tid] =
args->cfs_rq->avg.load_avg;}
  kretfunc:attach_entity_load_avg{printf("%s: cfs_rq: %p, se.load_sum:
%llu, se.load_avg: %llu, se->load.weight: %llu diff.sum: %llu,
diff.load: %llu, new_sum: %llu, new_load: %llu, period_contrib:
%llu\n", probe, args->cfs_rq,args->se->avg.load_sum,
args->se->avg.load_avg,
args->se->load.weight,args->cfs_rq->avg.load_sum - @a_sum[tid],
args->cfs_rq->avg.load_avg - @a_load[tid], args->cfs_rq->avg.load_sum,
args->cfs_rq->avg.load_avg, args->cfs_rq->avg.period_contrib)}

  kfunc:update_tg_cfs_load {@..._load_avg[tid] =
args->cfs_rq->avg.load_avg; @beg_load_sum[tid] =
args->cfs_rq->avg.load_sum}
  kretfunc:update_tg_cfs_load {printf("%s: cfs_rq: %p, old load_sum
%llu, old load_avg: %llu, new load_sum: %llu, new load_avg: %llu
tg_load_avg_contrib: %llu, period_contrib: %llu, on_list: %d,
throttled: %d/%d stack: %s\n",probe, args->cfs_rq, @beg_load_sum[tid],
@beg_load_avg[tid], args->cfs_rq->avg.load_sum,
args->cfs_rq->avg.load_avg, args->cfs_rq->tg_load_avg_contrib,
args->cfs_rq->avg.period_contrib,args->cfs_rq->on_list,
args->cfs_rq->throttled, args->cfs_rq->throttle_count,kstack)}

Thanks
Odin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ