[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <618bc3b199f19be916913301edb5ec832131e842.camel@siemens.com>
Date: Wed, 07 May 2025 11:33:42 +0200
From: Florian Bezdeka <florian.bezdeka@...mens.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Ben Segall
 <bsegall@...gle.com>,  K Prateek Nayak <kprateek.nayak@....com>, Peter
 Zijlstra <peterz@...radead.org>, Josh Don <joshdon@...gle.com>,  Ingo
 Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi
 Wang <xii@...gle.com>, 	linux-kernel@...r.kernel.org, Juri Lelli
 <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Mel Gorman	 <mgorman@...e.de>,
 Chengming Zhou <chengming.zhou@...ux.dev>, Chuyi Zhou	
 <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting
 throttle time
On Wed, 2025-05-07 at 17:09 +0800, Aaron Lu wrote:
> Hi Florian,
> 
> On Thu, Apr 17, 2025 at 04:06:16PM +0200, Florian Bezdeka wrote:
> > Hi Aaron,
> > 
> > On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
> > > @@ -5889,27 +5943,21 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> > >  	cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
> > >  		cfs_rq->throttled_clock_pelt;
> > >  
> > > -	if (cfs_rq->throttled_clock_self) {
> > > -		u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
> > > -
> > > -		cfs_rq->throttled_clock_self = 0;
> > > -
> > > -		if (WARN_ON_ONCE((s64)delta < 0))
> > > -			delta = 0;
> > > -
> > > -		cfs_rq->throttled_clock_self_time += delta;
> > > -	}
> > > +	if (cfs_rq->throttled_clock_self)
> > > +		account_cfs_rq_throttle_self(cfs_rq);
> > >  
> > >  	/* Re-enqueue the tasks that have been throttled at this level. */
> > >  	list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> > >  		list_del_init(&p->throttle_node);
> > > -		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
> > > +		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE);
> > >  	}
> > >  
> > >  	/* Add cfs_rq with load or one or more already running entities to the list */
> > >  	if (!cfs_rq_is_decayed(cfs_rq))
> > >  		list_add_leaf_cfs_rq(cfs_rq);
> > >  
> > > +	WARN_ON_ONCE(cfs_rq->h_nr_throttled);
> > > +
> > >  	return 0;
> > >  }
> > >  
> > 
> > I got this warning while testing in our virtual environment:
> > 
> > Any idea?
> > 
> 
> I made a stupid mistake here: I thought when a cfs_rq gets unthrottled,
> it should have no tasks in throttled state, hence I added that check in
> tg_unthrottle_up():
>         WARN_ON_ONCE(cfs_rq->h_nr_throttled);
> 
> But h_nr_throttled tracks hierarchical throttled task number, which
> means if this cfs_rq has descendent cfs_rqs that are still in throttled
> state, its h_nr_throttled can be > 0 when it gets unthrottled.
> 
> I just made a setup to emulate this scenario and can reproduce this
> warning. I guess in your setup, there are multiple cpu.max settings in a
> cgroup hierarchy.
I will have a look.
> 
> It's just the warn_on_once() itself is incorrect, I'll remove it in next
> version, thanks for the report!
You're welcome. IOW: I can ignore the warning. Great.
I meanwhile forward ported the 5.15 based series that you provided to
6.1 and applied massive testing in our lab. It looks very promising up
to now. Our freeze seems solved now.
Thanks for you're help! Very much appreciated!
We updated one device in the field today - at customer site. It will
take another week until I can report success. Let's hope.
The tests based on 6.14 are also looking good.
To sum up: This series fixes (or seems to fix, let's wait for one more
week to be sure) a critical RT issue. Is there a chance that once we
made it into mainline that we see (official) backports? 6.12 or 6.1
would be nice.
I could paste my 6.1 and 6.12 series, if that would help. But as there
will be at least one more iteration that work needs a refresh as well.
Best regards,
Florian
> 
> > [   26.639641] ------------[ cut here ]------------
> > [   26.639644] WARNING: CPU: 5 PID: 0 at kernel/sched/fair.c:5967 tg_unthrottle_up+0x1a6/0x3d0
> > [   26.639653] Modules linked in: veth xt_nat nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc xt_recent rfkill ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt vsock_loopback vmw_vsock_virtio_transport_common ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog vmw_vsock_vmci_transport xt_comment vsock nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables intel_rapl_msr intel_rapl_common nfnetlink binfmt_misc intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac_common nfit libnvdimm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel snd_pcm crypto_simd cryptd snd_timer rapl snd soundcore vmw_balloon vmwgfx pcspkr drm_ttm_helper ttm drm_client_lib button ac drm_kms_helper sg vmw_vmci evdev joydev serio_raw drm loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc16 mbcache jbd2 squashfs dm_verity dm_bufio reed_solomon dm_mod
> > [   26.639715]  sd_mod ata_generic mptspi mptscsih ata_piix mptbase libata scsi_transport_spi psmouse scsi_mod vmxnet3 i2c_piix4 i2c_smbus scsi_common
> > [   26.639726] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Not tainted 6.14.2-CFSfixes #1
> > [   26.639729] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.24224532.B64.2408191458 08/19/2024
> > [   26.639731] RIP: 0010:tg_unthrottle_up+0x1a6/0x3d0
> > [   26.639735] Code: 00 00 48 39 ca 74 14 48 8b 52 10 49 8b 8e 58 01 00 00 48 39 8a 28 01 00 00 74 24 41 8b 86 68 01 00 00 85 c0 0f 84 8d fe ff ff <0f> 0b e9 86 fe ff ff 49 8b 9e 38 01 00 00 41 8b 86 40 01 00 00 48
> > [   26.639737] RSP: 0000:ffffa5df8029cec8 EFLAGS: 00010002
> > [   26.639739] RAX: 0000000000000001 RBX: ffff981c6fcb6a80 RCX: ffff981943752e40
> > [   26.639741] RDX: 0000000000000005 RSI: ffff981c6fcb6a80 RDI: ffff981943752d00
> > [   26.639742] RBP: ffff9819607dc708 R08: ffff981c6fcb6a80 R09: 0000000000000000
> > [   26.639744] R10: 0000000000000001 R11: ffff981969936a10 R12: ffff9819607dc708
> > [   26.639745] R13: ffff9819607dc9d8 R14: ffff9819607dc800 R15: ffffffffad913fb0
> > [   26.639747] FS:  0000000000000000(0000) GS:ffff981c6fc80000(0000) knlGS:0000000000000000
> > [   26.639749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   26.639750] CR2: 00007ff1292dc44c CR3: 000000015350e006 CR4: 00000000007706f0
> > [   26.639779] PKRU: 55555554
> > [   26.639781] Call Trace:
> > [   26.639783]  <IRQ>
> > [   26.639787]  ? __pfx_tg_unthrottle_up+0x10/0x10
> > [   26.639790]  ? __pfx_tg_nop+0x10/0x10
> > [   26.639793]  walk_tg_tree_from+0x58/0xb0
> > [   26.639797]  unthrottle_cfs_rq+0xf0/0x360
> > [   26.639800]  ? sched_clock_cpu+0xf/0x190
> > [   26.639808]  __cfsb_csd_unthrottle+0x11c/0x170
> > [   26.639812]  ? __pfx___cfsb_csd_unthrottle+0x10/0x10
> > [   26.639816]  __flush_smp_call_function_queue+0x103/0x410
> > [   26.639822]  __sysvec_call_function_single+0x1c/0xb0
> > [   26.639826]  sysvec_call_function_single+0x6c/0x90
> > [   26.639832]  </IRQ>
> > [   26.639833]  <TASK>
> > [   26.639834]  asm_sysvec_call_function_single+0x1a/0x20
> > [   26.639840] RIP: 0010:pv_native_safe_halt+0xf/0x20
> > [   26.639844] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 45 c1 13 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> > [   26.639846] RSP: 0000:ffffa5df80117ed8 EFLAGS: 00000242
> > [   26.639848] RAX: 0000000000000005 RBX: ffff981940804000 RCX: ffff9819a9df7000
> > [   26.639849] RDX: 0000000000000005 RSI: 0000000000000005 RDI: 000000000005c514
> > [   26.639851] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
> > [   26.639852] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [   26.639853] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [   26.639858]  default_idle+0x9/0x20
> > [   26.639861]  default_idle_call+0x30/0x100
> > [   26.639863]  do_idle+0x1fd/0x240
> > [   26.639869]  cpu_startup_entry+0x29/0x30
> > [   26.639872]  start_secondary+0x11e/0x140
> > [   26.639875]  common_startup_64+0x13e/0x141
> > [   26.639881]  </TASK>
> > [   26.639882] ---[ end trace 0000000000000000 ]---
Powered by blists - more mailing lists
 
