linux-kernel - Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250418031550.GA1516180@bytedance>
Date: Fri, 18 Apr 2025 11:15:50 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Florian Bezdeka <florian.bezdeka@...mens.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Peter Zijlstra <peterz@...radead.org>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting
 throttle time

Hi Florian,

On Thu, Apr 17, 2025 at 04:06:16PM +0200, Florian Bezdeka wrote:
> Hi Aaron,
> 
> On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
> > @@ -5889,27 +5943,21 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> >  	cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
> >  		cfs_rq->throttled_clock_pelt;
> >  
> > -	if (cfs_rq->throttled_clock_self) {
> > -		u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
> > -
> > -		cfs_rq->throttled_clock_self = 0;
> > -
> > -		if (WARN_ON_ONCE((s64)delta < 0))
> > -			delta = 0;
> > -
> > -		cfs_rq->throttled_clock_self_time += delta;
> > -	}
> > +	if (cfs_rq->throttled_clock_self)
> > +		account_cfs_rq_throttle_self(cfs_rq);
> >  
> >  	/* Re-enqueue the tasks that have been throttled at this level. */
> >  	list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> >  		list_del_init(&p->throttle_node);
> > -		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
> > +		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE);
> >  	}
> >  
> >  	/* Add cfs_rq with load or one or more already running entities to the list */
> >  	if (!cfs_rq_is_decayed(cfs_rq))
> >  		list_add_leaf_cfs_rq(cfs_rq);
> >  
> > +	WARN_ON_ONCE(cfs_rq->h_nr_throttled);
> > +
> >  	return 0;
> >  }
> >  
> 
> I got this warning while testing in our virtual environment:

Thanks for the report.

> 
> Any idea?
>

Most likely the accounting of h_nr_throttle is incorrect somewhere.

> [   26.639641] ------------[ cut here ]------------
> [   26.639644] WARNING: CPU: 5 PID: 0 at kernel/sched/fair.c:5967 tg_unthrottle_up+0x1a6/0x3d0

The line doesn't match the code though, the below warning should be at
line 5959:
WARN_ON_ONCE(cfs_rq->h_nr_throttled); 

> [   26.639653] Modules linked in: veth xt_nat nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc xt_recent rfkill ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt vsock_loopback vmw_vsock_virtio_transport_common ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog vmw_vsock_vmci_transport xt_comment vsock nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables intel_rapl_msr intel_rapl_common nfnetlink binfmt_misc intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac_common nfit libnvdimm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel snd_pcm crypto_simd cryptd snd_timer rapl snd soundcore vmw_balloon vmwgfx pcspkr drm_ttm_helper ttm drm_client_lib button ac drm_kms_helper sg vmw_vmci evdev joydev serio_raw drm loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc16 mbcache jbd2 squashfs dm_verity dm_bufio reed_solomon dm_mod
> [   26.639715]  sd_mod ata_generic mptspi mptscsih ata_piix mptbase libata scsi_transport_spi psmouse scsi_mod vmxnet3 i2c_piix4 i2c_smbus scsi_common
> [   26.639726] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Not tainted 6.14.2-CFSfixes #1

6.14.2-CFSfixes seems to be a backported kernel?
Do you also see this warning when using this series on top of the said
base commit 6432e163ba1b("sched/isolation: Make use of more than one
housekeeping cpu")? Just want to make sure it's not a problem due to
backport.

Thanks,
Aaron

> [   26.639729] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.24224532.B64.2408191458 08/19/2024
> [   26.639731] RIP: 0010:tg_unthrottle_up+0x1a6/0x3d0
> [   26.639735] Code: 00 00 48 39 ca 74 14 48 8b 52 10 49 8b 8e 58 01 00 00 48 39 8a 28 01 00 00 74 24 41 8b 86 68 01 00 00 85 c0 0f 84 8d fe ff ff <0f> 0b e9 86 fe ff ff 49 8b 9e 38 01 00 00 41 8b 86 40 01 00 00 48
> [   26.639737] RSP: 0000:ffffa5df8029cec8 EFLAGS: 00010002
> [   26.639739] RAX: 0000000000000001 RBX: ffff981c6fcb6a80 RCX: ffff981943752e40
> [   26.639741] RDX: 0000000000000005 RSI: ffff981c6fcb6a80 RDI: ffff981943752d00
> [   26.639742] RBP: ffff9819607dc708 R08: ffff981c6fcb6a80 R09: 0000000000000000
> [   26.639744] R10: 0000000000000001 R11: ffff981969936a10 R12: ffff9819607dc708
> [   26.639745] R13: ffff9819607dc9d8 R14: ffff9819607dc800 R15: ffffffffad913fb0
> [   26.639747] FS:  0000000000000000(0000) GS:ffff981c6fc80000(0000) knlGS:0000000000000000
> [   26.639749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   26.639750] CR2: 00007ff1292dc44c CR3: 000000015350e006 CR4: 00000000007706f0
> [   26.639779] PKRU: 55555554
> [   26.639781] Call Trace:
> [   26.639783]  <IRQ>
> [   26.639787]  ? __pfx_tg_unthrottle_up+0x10/0x10
> [   26.639790]  ? __pfx_tg_nop+0x10/0x10
> [   26.639793]  walk_tg_tree_from+0x58/0xb0
> [   26.639797]  unthrottle_cfs_rq+0xf0/0x360
> [   26.639800]  ? sched_clock_cpu+0xf/0x190
> [   26.639808]  __cfsb_csd_unthrottle+0x11c/0x170
> [   26.639812]  ? __pfx___cfsb_csd_unthrottle+0x10/0x10
> [   26.639816]  __flush_smp_call_function_queue+0x103/0x410
> [   26.639822]  __sysvec_call_function_single+0x1c/0xb0
> [   26.639826]  sysvec_call_function_single+0x6c/0x90
> [   26.639832]  </IRQ>
> [   26.639833]  <TASK>
> [   26.639834]  asm_sysvec_call_function_single+0x1a/0x20
> [   26.639840] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [   26.639844] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 45 c1 13 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> [   26.639846] RSP: 0000:ffffa5df80117ed8 EFLAGS: 00000242
> [   26.639848] RAX: 0000000000000005 RBX: ffff981940804000 RCX: ffff9819a9df7000
> [   26.639849] RDX: 0000000000000005 RSI: 0000000000000005 RDI: 000000000005c514
> [   26.639851] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
> [   26.639852] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   26.639853] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [   26.639858]  default_idle+0x9/0x20
> [   26.639861]  default_idle_call+0x30/0x100
> [   26.639863]  do_idle+0x1fd/0x240
> [   26.639869]  cpu_startup_entry+0x29/0x30
> [   26.639872]  start_secondary+0x11e/0x140
> [   26.639875]  common_startup_64+0x13e/0x141
> [   26.639881]  </TASK>
> [   26.639882] ---[ end trace 0000000000000000 ]---