lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250422141351.GG14170@noisy.programming.kicks-ass.net>
Date: Tue, 22 Apr 2025 16:13:51 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Omar Sandoval <osandov@...ndov.com>
Cc: Rik van Riel <riel@...riel.com>, Chris Mason <clm@...a.com>,
	Pat Cody <pat@...cody.io>, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, linux-kernel@...r.kernel.org, patcody@...a.com,
	kernel-team@...a.com, Breno Leitao <leitao@...ian.org>
Subject: Re: [PATCH] sched/fair: Add null pointer check to pick_next_entity()

On Mon, Apr 21, 2025 at 05:06:45PM -0700, Omar Sandoval wrote:

> Hey, Peter,
> 
> We haven't been able to test your latest patch, but I dug through some
> core dumps from crashes with your initial zero_vruntime patch. It looks
> like on just about all of them, the entity vruntimes are way too spread
> out, so we would get overflows regardless of what we picked as
> zero_vruntime.
> 
> As a representative example, we have a cfs_rq with 3 entities with the
> follow vruntimes and (scaled down) weights:
> 
> vruntime           weight
> 39052385155836636  2      (curr)
> 43658311782076206  2
> 42824722322062111  4886
> 
> The difference between the minimum and maximum is 4605926626239570,

Right, that is quite beyond usable. The key question at this point
is how did we get here...

> which is 53 bits. The total load is 4890. Even if you picked
> zero_vruntime to be equidistant from the minimum and maximum, the
> (vruntime - zero_vruntime) * load calculation in entity_eligible() is
> doomed to overflow.
> 
> That range in vruntime seems too absurd to be due to only to running too
> long without preemption. We're only seeing these crashes on internal
> node cgroups (i.e., cgroups whose children are cgroups, not tasks). This
> all leads me to suspect reweight_entity().
> 
> Specifically, this line in reweight_entity():
> 
> 	se->vlag = div_s64(se->vlag * se->load.weight, weight);
> 
> seems like it could create a very large vlag, which could cause
> place_entity() to adjust vruntime by a large value.

Right, I fixed that not too long ago. At the time I convinced myself
clipping there wasn't needed (in fact, it would lead to some other
artifacts iirc). Let me go review that decision :-)

> (place_entity() has
> a similarly suspect adjustment on se->vlag, only update_entity_lag()
> clamps it).

place_entity() is a bit tricky -- but should be well behaved. The
problem is that by adding an entity, you affect the average, because the
average shifts, the lag after placement is different than the lag you
started with.

The scaling there is to ensure the lag after placement reflects the
initial lag. Much like how update_entity_lag() is the lag before
removal, place_entity() ensures the lag is measured after insertion.

> I'll try to reproduce something like this, but do you have any thoughts
> on this theory in the meantime?

It seems reasonable, but as said, let me go look too :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ