lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Sep 2018 17:24:49 -0700
From:   Nishanth Aravamudan <naravamudan@...italocean.com>
To:     Jan H. Schönherr <jschoenh@...zon.de>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 00/60] Coscheduling for Linux

[ I am not subscribed to LKML, please keep me CC'd on replies ]

On 07.09.2018 [23:39:47 +0200], Jan H. Schönherr wrote:
> This patch series extends CFS with support for coscheduling. The
> implementation is versatile enough to cover many different
> coscheduling use-cases, while at the same time being non-intrusive, so
> that behavior of legacy workloads does not change.

I tried a simple test with several VMs (in my initial test, I have 48
idle 1-cpu 512-mb VMs and 2 idle 2-cpu, 2-gb VMs) using libvirt, none
pinned to any CPUs. When I tried to set all of the top-level libvirt cpu
cgroups' to be co-scheduled (/bin/echo 1 >
/sys/fs/cgroup/cpu/machine/<VM-x>.libvirt-qemu/cpu.scheduled), the
machine hangs. This is using cosched_max_level=1.

There are several moving parts there, so I tried narrowing it down, by
only coscheduling one VM, and thing seemed fine:

/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# echo 1 > cpu.scheduled 
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat cpu.scheduled 
1

One thing that is not entirely obvious to me (but might be completely
intentional) is that since by default the top-level libvirt cpu cgroups
are empty:

/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat tasks 

the result of this should be a no-op, right? [This becomes relevant
below] Specifically, all of the threads of qemu are in sub-cgroups,
which do not indicate they are co-scheduling:

/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat emulator/cpu.scheduled 
0
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat vcpu0/cpu.scheduled 
0

When I then try to coschedule the second VM, the machine hangs.

/sys/fs/cgroup/cpu/machine/<VM-2>.libvirt-qemu# echo 1 > cpu.scheduled 
Timeout, server <HOST> not responding.

On the console, I see the same backtraces I see when I try to set all of
the VMs to be coscheduled:

[  144.494091] watchdog: BUG: soft lockup - CPU#87 stuck for 22s! [CPU 0/KVM:25344]
[  144.507629] Modules linked in: act_police cls_basic ebtable_filter ebtables ip6table_filter iptable_filter nbd ip6table_raw ip6_tables xt_CT iptable_raw ip_tables s
[  144.578858]  xxhash raid10 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ses raid6_pq enclosure libcrc32c raid1 scsi
[  144.599227] CPU: 87 PID: 25344 Comm: CPU 0/KVM Tainted: G           O      4.19.0-rc2-amazon-cosched+ #1
[  144.608819] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018
[  144.616403] RIP: 0010:smp_call_function_single+0xa7/0xd0
[  144.621818] Code: 01 48 89 d1 48 89 f2 4c 89 c6 e8 64 fe ff ff c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 54 fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 <8b> 54 24 18 83 e25
[  144.640703] RSP: 0018:ffffb2a4a75abb40 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[  144.648390] RAX: 0000000000000000 RBX: 0000000000000057 RCX: 0000000000000000
[  144.655607] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000202
[  144.662826] RBP: ffffb2a4a75abb60 R08: 0000000000000000 R09: 0000000000000f39
[  144.670073] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a9c03fc8000
[  144.677301] R13: ffff8ab4589dc100 R14: 0000000000000057 R15: 0000000000000000
[  144.684519] FS:  00007f51cd41a700(0000) GS:ffff8ab45fac0000(0000) knlGS:0000000000000000
[  144.692710] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  144.698542] CR2: 000000c4203c0000 CR3: 000000178a97e005 CR4: 00000000007626e0
[  144.705771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  144.712989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  144.720215] PKRU: 55555554
[  144.723016] Call Trace:
[  144.725553]  ? vmx_sched_in+0xc0/0xc0 [kvm_intel]
[  144.730341]  vmx_vcpu_load+0x244/0x310 [kvm_intel]
[  144.735220]  ? __switch_to_asm+0x40/0x70
[  144.739231]  ? __switch_to_asm+0x34/0x70
[  144.743235]  ? __switch_to_asm+0x40/0x70
[  144.747240]  ? __switch_to_asm+0x34/0x70
[  144.751243]  ? __switch_to_asm+0x40/0x70
[  144.755246]  ? __switch_to_asm+0x34/0x70
[  144.759250]  ? __switch_to_asm+0x40/0x70
[  144.763272]  ? __switch_to_asm+0x34/0x70
[  144.767284]  ? __switch_to_asm+0x40/0x70
[  144.771296]  ? __switch_to_asm+0x34/0x70
[  144.775299]  ? __switch_to_asm+0x40/0x70
[  144.779313]  ? __switch_to_asm+0x34/0x70
[  144.783317]  ? __switch_to_asm+0x40/0x70
[  144.787338]  kvm_arch_vcpu_load+0x40/0x270 [kvm]
[  144.792056]  finish_task_switch+0xe2/0x260
[  144.796238]  __schedule+0x316/0x890
[  144.799810]  schedule+0x32/0x80
[  144.803039]  kvm_vcpu_block+0x7a/0x2e0 [kvm]
[  144.807399]  kvm_arch_vcpu_ioctl_run+0x1a7/0x1990 [kvm]
[  144.812705]  ? futex_wake+0x84/0x150
[  144.816368]  kvm_vcpu_ioctl+0x3ab/0x5d0 [kvm]
[  144.820810]  ? wake_up_q+0x70/0x70
[  144.824311]  do_vfs_ioctl+0x92/0x600
[  144.827985]  ? syscall_trace_enter+0x1ac/0x290
[  144.832517]  ksys_ioctl+0x60/0x90
[  144.835913]  ? exit_to_usermode_loop+0xa6/0xc2
[  144.840436]  __x64_sys_ioctl+0x16/0x20
[  144.844267]  do_syscall_64+0x55/0x110
[  144.848012]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  144.853160] RIP: 0033:0x7f51cf82bea7
[  144.856816] Code: 44 00 00 48 8b 05 e1 cf 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff8
[  144.875752] RSP: 002b:00007f51cd419a18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010

I am happy to do any further debugging I can do, or try patches on top
of those posted on the mailing list.

Thanks,
Nish

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ