linux-kernel - Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <44b3098e-f98c-4e68-8d13-9d668f92fe36@didichuxing.com>
Date:   Sat, 18 Nov 2023 23:29:52 +0800
From:   Honglei Wang <wanghonglei@...ichuxing.com>
To:     Abel Wu <wuyun.abel@...edance.com>,
        Tobias Huschle <huschle@...ux.ibm.com>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        <kvm@...r.kernel.org>, <virtualization@...ts.linux.dev>,
        <netdev@...r.kernel.org>
CC:     Peterz <peterz@...radead.org>, <mst@...hat.com>,
        <jasowang@...hat.com>
Subject: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add
 lag based placement)



On 2023/11/18 15:33, Abel Wu wrote:
> On 11/17/23 2:58 AM, Tobias Huschle Wrote:
>> #################### TRACE EXCERPT ####################
>> The sched_place trace event was added to the end of the place_entity 
>> function and outputs:
>> sev -> sched_entity vruntime
>> sed -> sched_entity deadline
>> sel -> sched_entity vlag
>> avg -> cfs_rq avg_vruntime
>> min -> cfs_rq min_vruntime
>> cpu -> cpu of cfs_rq
>> nr  -> cfs_rq nr_running
>> ---
>>      CPU 3/KVM-2950    [014] d....   576.161432: sched_migrate_task: 
>> comm=vhost-2920 pid=2941 prio=120 orig_cpu=15 dest_cpu=14
>> --> migrates task from cpu 15 to 14
>>      CPU 3/KVM-2950    [014] d....   576.161433: sched_place: 
>> comm=vhost-2920 pid=2941 sev=4242563284 sed=4245563284 sel=0 
>> avg=4242563284 min=4242563284 cpu=14 nr=0
>> --> places vhost 2920 on CPU 14 with vruntime 4242563284
>>      CPU 3/KVM-2950    [014] d....   576.161433: sched_place: comm= 
>> pid=0 sev=16329848593 sed=16334604010 sel=0 avg=16329848593 
>> min=16329848593 cpu=14 nr=0
>>      CPU 3/KVM-2950    [014] d....   576.161433: sched_place: comm= 
>> pid=0 sev=42560661157 sed=42627443765 sel=0 avg=42560661157 
>> min=42560661157 cpu=14 nr=0
>>      CPU 3/KVM-2950    [014] d....   576.161434: sched_place: comm= 
>> pid=0 sev=53846627372 sed=54125900099 sel=0 avg=53846627372 
>> min=53846627372 cpu=14 nr=0
>>      CPU 3/KVM-2950    [014] d....   576.161434: sched_place: comm= 
>> pid=0 sev=86640641980 sed=87255041979 sel=0 avg=86640641980 
>> min=86640641980 cpu=14 nr=0
> 
> As the following 2 lines indicates that vhost-2920 is on_rq so can be
> picked as next, thus its cfs_rq must have at least one entity.
> 
> While the above 4 lines shows nr=0, so the "comm= pid=0" task(s) can't
> be in the same cgroup with vhost-2920.
> 
> Say vhost is in cgroupA, and "comm= pid=0" task with sev=86640641980
> is in cgroupB ...
> 
This looks like an hierarchy enqueue staff. The temporary trace can get 
comm and pid of vhost-2920, but failed for the other 4. I think the 
reason is they were just se but not tasks. Seems this came from the 
for_each_sched_entity(se) when doing enqueue vhost-2920. And the last 
one with cfs_rq vruntime=86640641980 might be the root cgroup which was 
on same level with kworkers.

So just from this tiny part of the trace log, there won't be thousands 
ms level difference. Actually, it might be only 86642125805-86640641980 
= 1.5 ms.

correct me if there is anything wrong..

Thanks,
Honglei
>>      CPU 3/KVM-2950    [014] dN...   576.161434: sched_stat_wait: 
>> comm=vhost-2920 pid=2941 delay=9958 [ns]
>>      CPU 3/KVM-2950    [014] d....   576.161435: sched_switch: 
>> prev_comm=CPU 3/KVM prev_pid=2950 prev_prio=120 prev_state=S ==> 
>> next_comm=vhost-2920 next_pid=2941 next_prio=120
>>     vhost-2920-2941    [014] D....   576.161439: sched_waking: 
>> comm=vhost-2286 pid=2309 prio=120 target_cpu=008
>>     vhost-2920-2941    [014] d....   576.161446: sched_waking: 
>> comm=kworker/14:0 pid=6525 prio=120 target_cpu=014
>>     vhost-2920-2941    [014] d....   576.161447: sched_place: 
>> comm=kworker/14:0 pid=6525 sev=86642125805 sed=86645125805 sel=0 
>> avg=86642125805 min=86642125805 cpu=14 nr=1
>> --> places kworker 6525 on cpu 14 with vruntime 86642125805
>> -->  which is far larger than vhost vruntime of  4242563284
> 
> Here nr=1 means there is another entity in the same cfs_rq with the
> newly woken kworker, but which? According to the vruntime, I would
> assume kworker is in cgroupB.