linux-kernel - Re: divide error: 0000 [#1] SMP in task_numa_migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160622061356.GW30154@twins.programming.kicks-ass.net>
Date:	Wed, 22 Jun 2016 08:13:56 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Campbell Steven <casteven@...il.com>
Cc:	Stefan Priebe - Profihost AG <s.priebe@...fihost.ag>,
	Greg KH <greg@...ah.com>, Vlastimil Babka <vbabka@...e.cz>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	linux-mm@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Rik van Riel <riel@...hat.com>
Subject: Re: divide error: 0000 [#1] SMP in task_numa_migrate -
 handle_mm_fault vanilla 4.4.6

On Wed, Jun 22, 2016 at 01:19:54PM +1200, Campbell Steven wrote:
> >>>>>>> This suggests the CONFIG_FAIR_GROUP_SCHED version of task_h_load:
> >>>>>>>
> >>>>>>>         update_cfs_rq_h_load(cfs_rq);
> >>>>>>>         return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
> >>>>>>>                         cfs_rq_load_avg(cfs_rq) + 1);
> >>>>>>>


---
commit 8974189222159154c55f24ddad33e3613960521a
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Thu Jun 16 10:50:40 2016 +0200

    sched/fair: Fix cfs_rq avg tracking underflow
    
    As per commit:
    
      b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")
    
    > the code generated from update_cfs_rq_load_avg():
    >
    > 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
    > 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
    > 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
    > 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
    > 		removed_load = 1;
    > 	}
    >
    > turns into:
    >
    > ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
    > ffffffff8108706b:       48 85 c0                test   %rax,%rax
    > ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
    > ffffffff81087070:       4c 89 f8                mov    %r15,%rax
    > ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
    > ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
    > ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
    > ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
    > ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
    > ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
    >
    > Which you'll note ends up with sa->load_avg -= r in memory at
    > ffffffff8108707a.
    
    So I _should_ have looked at other unserialized users of ->load_avg,
    but alas. Luckily nikbor reported a similar /0 from task_h_load() which
    instantly triggered recollection of this here problem.
    
    Aside from the intermediate value hitting memory and causing problems,
    there's another problem: the underflow detection relies on the signed
    bit. This reduces the effective width of the variables, IOW its
    effectively the same as having these variables be of signed type.
    
    This patch changes to a different means of unsigned underflow
    detection to not rely on the signed bit. This allows the variables to
    use the 'full' unsigned range. And it does so with explicit LOAD -
    STORE to ensure any intermediate value will never be visible in
    memory, allowing these unserialized loads.
    
    Note: GCC generates crap code for this, might warrant a look later.
    
    Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
           maybe we should do clamping on add too.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Cc: Andrey Ryabinin <aryabinin@...tuozzo.com>
    Cc: Chris Wilson <chris@...is-wilson.co.uk>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Mike Galbraith <efault@....de>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: Yuyang Du <yuyang.du@...el.com>
    Cc: bsegall@...gle.com
    Cc: kernel@...p.com
    Cc: morten.rasmussen@....com
    Cc: pjt@...gle.com
    Cc: steve.muckle@...aro.org
    Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
    Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a2348deab7a3..2ae68f0e3bf5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2904,6 +2904,23 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 	}
 }
 
+/*
+ * Unsigned subtract and clamp on underflow.
+ *
+ * Explicitly do a load-store to ensure the intermediate value never hits
+ * memory. This allows lockless observations without ever seeing the negative
+ * values.
+ */
+#define sub_positive(_ptr, _val) do {				\
+	typeof(_ptr) ptr = (_ptr);				\
+	typeof(*ptr) val = (_val);				\
+	typeof(*ptr) res, var = READ_ONCE(*ptr);		\
+	res = var - val;					\
+	if (res > var)						\
+		res = 0;					\
+	WRITE_ONCE(*ptr, res);					\
+} while (0)
+
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
 static inline int
 update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
@@ -2913,15 +2930,15 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 
 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
-		sa->load_avg = max_t(long, sa->load_avg - r, 0);
-		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->load_avg, r);
+		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
 		removed_load = 1;
 	}
 
 	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
-		sa->util_avg = max_t(long, sa->util_avg - r, 0);
-		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+		sub_positive(&sa->util_avg, r);
+		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
 		removed_util = 1;
 	}
 
@@ -2994,10 +3011,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
 			  cfs_rq->curr == se, NULL);
 
-	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
+	sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum);
+	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
 
 	cfs_rq_util_change(cfs_rq);
 }