lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7548a890-d32b-7e7d-4f84-4ebf635c3e8a@arm.com>
Date:   Wed, 4 Dec 2019 00:13:50 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     John Stultz <john.stultz@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Quentin Perret <qperret@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Patrick Bellasi <Patrick.Bellasi@....com>,
        Ingo Molnar <mingo@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: Null pointer crash at find_idlest_group on db845c w/ linus/master

On 03/12/2019 23:49, Valentin Schneider wrote:
> On 03/12/2019 23:20, Valentin Schneider wrote:
>> Looking at the code, I think I got it. In find_idlest_group() we do
>> initialize 'idlest_sgs' (just like busiest_stat in LB) but 'idlest' is just
>> NULL. The latter is dereferenced in update_pick_idlest() just for the misfit
>> case, which goes boom. And I reviewed the damn thing... Bleh.
>>
>> Fixup looks easy enough, lemme write one up.
>>
> 
> Wait no, that can't be right. We can only get in there if both 'group' and
> 'idlest' have the same group_type, which can't be true on the first pass.
> So if we go through the misfit stuff, idlest *has* to be set to something.
> Bah.
> 

So I think the thing really is dying on a sched_group->sgc deref (pahole says
sgc is at offset #16), which means we have a NULL sched_group somewhere, but
I don't see how. That can either be 'local' (can't be, first group we visit
and doesn't go through update_pick_idlest()) or 'idlest' (see previous email).

Now, it's bedtime for me, if you get the chance in the meantime can you give
this a shot? I was about to send it out but realized it didn't really make
sense, but you never know...

Also, if it is indeed misfit related, I'm surprised we (Arm folks) haven't
hit it sooner. We've had our scheduler tests running on the LB rework for at
least a month, so we should've hit it.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..e19ab7bff0f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8348,7 +8348,14 @@ static bool update_pick_idlest(struct sched_group *idlest,
 		return false;
 
 	case group_misfit_task:
-		/* Select group with the highest max capacity */
+		/*
+		 * Select group with the highest max capacity. First group we
+		 * visit gets picked as idlest to allow later capacity
+		 * comparisons.
+		 */
+		if (!idlest)
+			return true;
+
 		if (idlest->sgc->max_capacity >= group->sgc->max_capacity)
 			return false;
 		break;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ