lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Apr 2024 13:58:33 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Abel Wu <wuyun.abel@...edance.com>
Cc: Chen Yu <yu.c.chen@...el.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>, Tim Chen <tim.c.chen@...el.com>,
	Tiwei Bie <tiwei.btw@...group.com>,
	Honglei Wang <wanghonglei@...ichuxing.com>,
	Aaron Lu <aaron.lu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
	linux-kernel@...r.kernel.org,
	kernel test robot <oliver.sang@...el.com>
Subject: Re: [RFC PATCH] sched/eevdf: Return leftmost entity in pick_eevdf()
 if no eligible entity is found

On Thu, Feb 29, 2024 at 05:00:18PM +0800, Abel Wu wrote:

> > According to the log, vruntime is 18435852013561943404, the
> > cfs_rq->min_vruntime is 763383370431, the load is 629 + 2048 = 2677,
> > thus:
> > s64 delta = (s64)(18435852013561943404 - 763383370431) = -10892823530978643
> >      delta * 2677 = 7733399554989275921
> > that is to say, the multiply result overflow the s64, which turns the
> > negative value into a positive value, thus eligible check fails.
> 
> Indeed.

>From the data presented it looks like min_vruntime is wrong and needs
update. If you can readily reproduce this, dump the vruntime of all
tasks on the runqueue and see if min_vruntime is indeed correct.

> > So where is this insane huge vruntime 18435852013561943404 coming from?
> > My guess is that, it is because the initial value of cfs_rq->min_vruntime
> > is set to (unsigned long)(-(1LL << 20)). If the task(watchdog in this case)
> > seldom scheduled in, its vruntime might not move forward too much and
> > remain its original value by previous place_entity().
> 
> So why not just initialize to 0? The (unsigned long)(-(1LL << 20))
> thing is dangerous as it can easily blow up lots of calculations in
> lag, key, avg_vruntime and so on.

The reason is to ensure the wrap-around logic works -- which it must,
because with the weighting thing, the vruntime can wrap quite quickly,
something like one day IIRC (20 bit for precision etc.)

Better to have the wrap around happen quickly after boot and have
everybody suffer, rather than have it be special and hard to reproduce.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ