lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250108083307.74220-1-changwoo@igalia.com>
Date: Wed,  8 Jan 2025 17:33:07 +0900
From: Changwoo Min <changwoo@...lia.com>
To: tj@...nel.org,
	void@...ifault.com,
	arighi@...dia.com
Cc: kernel-dev@...lia.com,
	linux-kernel@...r.kernel.org,
	Changwoo Min <changwoo@...lia.com>
Subject: [PATCH v2] sched_ext: Replace rq_lock() to raw_spin_rq_lock() in scx_ops_bypass()

scx_ops_bypass() iterates all CPUs to re-enqueue all the scx tasks.
For each CPU, it acquires a lock using rq_lock() regardless of whether
a CPU is offline or the CPU is currently running a task in a higher
scheduler class (e.g., deadline). The rq_lock() is supposed to be used
for online CPUs, and the use of rq_lock() may trigger an unnecessary
warning in rq_pin_lock(). Therefore, replace rq_lock() to
raw_spin_rq_lock() in scx_ops_bypass().

This change fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")

Without this change, we observe the following warnings:

===== START =====
[    6.615204] ------------[ cut here ]------------
[    6.615205] rq->balance_callback && rq->balance_callback != &balance_push_callback
[    6.615208] WARNING: CPU: 2 PID: 0 at kernel/sched/sched.h:1730 __schedule+0x1130/0x1c90
[    6.615214] Modules linked in: nf_tables vfat fat intel_rapl_msr amd_atl intel_rapl_common kvm_amd snd_hda_codec_realtek snd_hda_scodec_component kvm snd_hda_codec_generic crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg eeepc_wmi snd_usb_audio snd_intel_sdw_acpi sha512_ssse3 sha1_ssse3 asus_wmi snd_hda_codec aesni_intel snd_usbmidi_lib ee1004 platform_profile gf128mul snd_ump asus_ec_sensors snd_hda_core i8042 crypto_simd snd_rawmidi sparse_keymap snd_hwdep snd_seq_device cryptd serio rapl rfkill snd_pcm wmi_bmof pcspkr k10temp snd_timer i2c_piix4 snd i2c_smbus soundcore ccp mc igc mousedev ptp joydev pps_core leetmouse(OE) mac_hid tcp_bbr pkcs8_key_parser ntsync(OE) i2c_dev crypto_user dm_mod loop nfnetlink lz4 zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables btrfs libcrc32c crc32c_generic raid6_pq xor crc32c_intel nvme sha256_ssse3 nvme_core nvme_auth nvidia_drm(OE) drm_ttm_helper ttm hid_cmedia nvidia_uvm(OE)
[    6.615294]  nvidia_modeset(OE) hid_generic mxm_wmi video wmi usbhid nvidia(OE)
[    6.615302] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G           OE      6.12.6-2-cachyos #1 c963cd2b82aa9cdd05160d5f7838a69b51110706
[    6.615307] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[    6.615308] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 5013 03/18/2024
[    6.615310] Sched_ext: lavd (enabling+all)
[    6.615311] RIP: 0010:__schedule+0x1130/0x1c90
[    6.615314] Code: 90 56 65 94 0f 84 e1 ef ff ff f6 05 4a 78 3d 01 01 0f 85 d4 ef ff ff c6 05 3d 78 3d 01 01 48 c7 c7 8b a3 cd 93 e8 90 24 0e ff <0f> 0b 41 8b 86 38 0c 00 00 e9 b3 ef ff ff e8 bd 8c ff ff 65 ff 0d
[    6.615316] RSP: 0018:ffffb23e4019fe28 EFLAGS: 00010046
[    6.615319] RAX: e9cdb54dc06b0200 RBX: ffffa02e00a93680 RCX: 0000000000000027
[    6.615320] RDX: ffffb23e4019fc90 RSI: 00000000ffffefff RDI: ffffa0350eb21948
[    6.615322] RBP: ffffb23e4019fee0 R08: 0000000000000000 R09: ffffffff9465a840
[    6.615323] R10: 0000000000002ffd R11: 0000000000000004 R12: 0000000000000000
[    6.615325] R13: ffffa02e00a93680 R14: ffffa0350eb366c0 R15: 00000000ffffffff
[    6.615327] FS:  0000000000000000(0000) GS:ffffa0350eb00000(0000) knlGS:0000000000000000
[    6.615329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.615331] CR2: 00007cb2b7100008 CR3: 000000012222a000 CR4: 0000000000f50ef0
[    6.615333] PKRU: 55555554
[    6.615334] Call Trace:
[    6.615336]  <TASK>
[    6.615338]  ? __warn+0xd5/0x1d0
[    6.615341]  ? __schedule+0x1130/0x1c90
[    6.615345]  ? report_bug+0x144/0x1f0
[    6.615348]  ? __schedule+0x1130/0x1c90
[    6.615350]  ? handle_bug+0x6a/0x90
[    6.615353]  ? exc_invalid_op+0x1a/0x50
[    6.615356]  ? asm_exc_invalid_op+0x1a/0x20
[    6.615361]  ? __schedule+0x1130/0x1c90
[    6.615363]  ? __schedule+0x1130/0x1c90
[    6.615366]  ? pv_native_safe_halt+0x13/0x20
[    6.615369]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615372]  ? ct_kernel_enter+0x2e/0x90
[    6.615374]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615376]  ? local_clock_noinstr+0xc/0xc0
[    6.615380]  schedule_idle+0x23/0x40
[    6.615382]  cpu_startup_entry+0x1c2/0x250
[    6.615386]  start_secondary+0x9e/0xa0
[    6.615389]  common_startup_64+0x13e/0x140
[    6.615395]  </TASK>
[    6.615396] ---[ end trace 0000000000000000 ]---
[    6.615398] ------------[ cut here ]------------
[    6.615401] rq->balance_callback && rq->balance_callback != &balance_push_callback
[    6.615403] WARNING: CPU: 6 PID: 2269 at kernel/sched/sched.h:1730 scx_ops_bypass+0x178/0x240
[    6.615408] Modules linked in: nf_tables vfat fat intel_rapl_msr amd_atl intel_rapl_common kvm_amd snd_hda_codec_realtek snd_hda_scodec_component kvm snd_hda_codec_generic crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg eeepc_wmi snd_usb_audio snd_intel_sdw_acpi sha512_ssse3 sha1_ssse3 asus_wmi snd_hda_codec aesni_intel snd_usbmidi_lib ee1004 platform_profile gf128mul snd_ump asus_ec_sensors snd_hda_core i8042 crypto_simd snd_rawmidi sparse_keymap snd_hwdep snd_seq_device cryptd serio rapl rfkill snd_pcm wmi_bmof pcspkr k10temp snd_timer i2c_piix4 snd i2c_smbus soundcore ccp mc igc mousedev ptp joydev pps_core leetmouse(OE) mac_hid tcp_bbr pkcs8_key_parser ntsync(OE) i2c_dev crypto_user dm_mod loop nfnetlink lz4 zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables btrfs libcrc32c crc32c_generic raid6_pq xor crc32c_intel nvme sha256_ssse3 nvme_core nvme_auth nvidia_drm(OE) drm_ttm_helper ttm hid_cmedia nvidia_uvm(OE)
[    6.615482]  nvidia_modeset(OE) hid_generic mxm_wmi video wmi usbhid nvidia(OE)
[    6.615490] CPU: 6 UID: 0 PID: 2269 Comm: scx_lavd Tainted: G        W  OE      6.12.6-2-cachyos #1 c963cd2b82aa9cdd05160d5f7838a69b51110706
[    6.615494] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[    6.615495] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 5013 03/18/2024
[    6.615497] Sched_ext: lavd (enabling+all), task: runnable_at=+0ms
[    6.615498] RIP: 0010:scx_ops_bypass+0x178/0x240
[    6.615501] Code: eb 42 0f 1f 44 00 00 4c 89 ef e8 c3 fd d1 00 49 ff c4 e9 5b ff ff ff c6 05 9d dc 0e 02 01 48 c7 c7 8b a3 cd 93 e8 b8 88 df ff <0f> 0b eb a5 0f 0b 41 8b 85 6c 0a 00 00 eb a9 0f 0b 41 8b 85 6c 0a
[    6.615503] RSP: 0018:ffffb23e619479e8 EFLAGS: 00010046
[    6.615506] RAX: 3730614603e1d700 RBX: 0000000000000000 RCX: 0000000000000027
[    6.615507] RDX: ffffb23e61947850 RSI: 00000000ffffefff RDI: ffffa0350ed21948
[    6.615509] RBP: ffffa0350eb00000 R08: 0000000000000000 R09: ffffffff9465a840
[    6.615511] R10: 0000000000002ffd R11: 0000000000000004 R12: 0000000000000002
[    6.615512] R13: ffffa0350eb366c0 R14: 0000000000000286 R15: ffffb23e619479f0
[    6.615514] FS:  0000703f64c53880(0000) GS:ffffa0350ed00000(0000) knlGS:0000000000000000
[    6.615516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.615517] CR2: 00005b2a1d0d23d0 CR3: 000000011089a000 CR4: 0000000000f50ef0
[    6.615519] PKRU: 55555554
[    6.615520] Call Trace:
[    6.615522]  <TASK>
[    6.615524]  ? __warn+0xd5/0x1d0
[    6.615527]  ? scx_ops_bypass+0x178/0x240
[    6.615530]  ? report_bug+0x144/0x1f0
[    6.615533]  ? scx_ops_bypass+0x178/0x240
[    6.615536]  ? handle_bug+0x6a/0x90
[    6.615538]  ? exc_invalid_op+0x1a/0x50
[    6.615541]  ? asm_exc_invalid_op+0x1a/0x20
[    6.615545]  ? scx_ops_bypass+0x178/0x240
[    6.615548]  ? scx_ops_bypass+0x178/0x240
[    6.615551]  bpf_scx_reg+0xfb5/0x1380
[    6.615559]  bpf_struct_ops_link_create+0x13c/0x190
[    6.615563]  __sys_bpf+0x765/0x6080
[    6.615567]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615570]  ? syscall_exit_to_user_mode+0x38/0xc0
[    6.615573]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615578]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615580]  ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[    6.615583]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615585]  ? syscall_exit_to_user_mode+0x38/0xc0
[    6.615587]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615589]  ? do_syscall_64+0x9b/0x170
[    6.615592]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615594]  ? syscall_exit_to_user_mode+0x38/0xc0
[    6.615596]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615598]  ? do_syscall_64+0x9b/0x170
[    6.615601]  ? __se_sys_close.llvm.4416965578177173658+0x6d/0xa0
[    6.615604]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615606]  ? kmem_cache_free.cold+0x138/0x32a
[    6.615610]  __x64_sys_bpf+0x1c/0x30
[    6.615613]  do_syscall_64+0x8f/0x170
[    6.615615]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615617]  ? do_syscall_64+0x9b/0x170
[    6.615621]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615624]  ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[    6.615626]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615628]  ? syscall_exit_to_user_mode+0x38/0xc0
[    6.615631]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615633]  ? do_syscall_64+0x9b/0x170
[    6.615635]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615637]  ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[    6.615640]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.615643]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    6.615645] RIP: 0033:0x703f64e8315d
[    6.615656] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 6b 0d 00 f7 d8 64 89 01 48
[    6.615658] RSP: 002b:00007ffebaea0a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[    6.615661] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000703f64e8315d
[    6.615662] RDX: 0000000000000040 RSI: 00007ffebaea0a70 RDI: 000000000000001c
[    6.615664] RBP: 00007ffebaea0b40 R08: 000000000000000f R09: 0000000000000000
[    6.615665] R10: 000000000000000f R11: 0000000000000246 R12: 000000000000000f
[    6.615667] R13: 000000000000002c R14: 0000000000000010 R15: 0000703f650d8000
[    6.615671]  </TASK>
[    6.615673] ---[ end trace 0000000000000000 ]---
[    6.615712] sched_ext: BPF scheduler "lavd" enabled
[    6.623157] sched_ext: kworker/1:0[29] has zero slice in pick_task_scx()
=====  END  =====

Signed-off-by: Changwoo Min <changwoo@...lia.com>
---
 kernel/sched/ext.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8fe64c27004e..cb6eb49d16be 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4803,10 +4803,9 @@ static void scx_ops_bypass(bool bypass)
 	 */
 	for_each_possible_cpu(cpu) {
 		struct rq *rq = cpu_rq(cpu);
-		struct rq_flags rf;
 		struct task_struct *p, *n;
 
-		rq_lock(rq, &rf);
+		raw_spin_rq_lock(rq);
 
 		if (bypass) {
 			WARN_ON_ONCE(rq->scx.flags & SCX_RQ_BYPASSING);
@@ -4822,7 +4821,7 @@ static void scx_ops_bypass(bool bypass)
 		 * sees scx_rq_bypassing() before moving tasks to SCX.
 		 */
 		if (!scx_enabled()) {
-			rq_unlock(rq, &rf);
+			raw_spin_rq_unlock(rq);
 			continue;
 		}
 
@@ -4842,10 +4841,11 @@ static void scx_ops_bypass(bool bypass)
 			sched_enq_and_set_task(&ctx);
 		}
 
-		rq_unlock(rq, &rf);
-
 		/* resched to restore ticks and idle state */
-		resched_cpu(cpu);
+		if (cpu_online(cpu) || cpu == smp_processor_id())
+			resched_curr(rq);
+
+		raw_spin_rq_unlock(rq);
 	}
 
 	atomic_dec(&scx_ops_breather_depth);
-- 
2.47.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ