lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <543F945F.4020303@jp.fujitsu.com>
Date:	Thu, 16 Oct 2014 18:48:15 +0900
From:	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
To:	<mingo@...hat.com>, <peterz@...radead.org>
CC:	<kernellwp@...il.com>, <riel@...hat.com>, <tkhai@...dex.ru>,
	<linux-kernel@...r.kernel.org>
Subject: [PATCH v2] sched/fair: Care divide error in update_task_scan_period()

While offling node by hot removing memory, the following divide error
occurs:

  divide error: 0000 [#1] SMP
  [...]
  Call Trace:
   [...] handle_mm_fault
   [...] ? try_to_wake_up
   [...] ? wake_up_state
   [...] __do_page_fault
   [...] ? do_futex
   [...] ? put_prev_entity
   [...] ? __switch_to
   [...] do_page_fault
   [...] page_fault
  [...]
  RIP  [<ffffffff810a7081>] task_numa_fault
   RSP <ffff88084eb2bcb0>

The issue occurs as follows:
  1. When page fault occurs and page is allocated from node 1,
     task_struct->numa_faults_buffer_memory[] of node 1 is
     incremented and p->numa_faults_locality[] is also incremented
     as follows:

     o numa_faults_buffer_memory[]       o numa_faults_locality[]
              NR_NUMA_HINT_FAULT_TYPES
             |      0     |     1     |
     ----------------------------------  ----------------------
      node 0 |      0     |     0     |   remote |      0     |
      node 1 |      0     |     1     |   locale |      1     |
     ----------------------------------  ----------------------

  2. node 1 is offlined by hot removing memory.

  3. When page fault occurs, fault_types[] is calculated by using
     p->numa_faults_buffer_memory[] of all online nodes in
     task_numa_placement(). But node 1 was offline by step 2. So
     the fault_types[] is calculated by using only
     p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
     are set to 0.

  4. The values(0) of fault_types[] pass to update_task_scan_period().

  5. numa_faults_locality[1] is set to 1. So the following division is
     calculated.

        static void update_task_scan_period(struct task_struct *p,
                                unsigned long shared, unsigned long private){
        ...
                ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
        }

  6. But both of private and shared are set to 0. So divide error
     occurs here.

The divide error is rare case because the trigger is node offline.
By this patch, when both of private and shared are set to 0,
denominator is set to 1 for avoiding divide error.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
CC: Wanpeng Li <kernellwp@...il.com>
CC: Rik van Riel <riel@...hat.com>
CC: Peter Zijlstra <peterz@...radead.org>
---
 kernel/sched/fair.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86..580fc74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1466,6 +1466,7 @@ static void update_task_scan_period(struct task_struct *p,

 	unsigned long remote = p->numa_faults_locality[0];
 	unsigned long local = p->numa_faults_locality[1];
+	unsigned long total_faults = shared + private;

 	/*
 	 * If there were no record hinting faults then either the task is
@@ -1496,6 +1497,14 @@ static void update_task_scan_period(struct task_struct *p,
 			slot = 1;
 		diff = slot * period_slot;
 	} else {
+		/*
+		 * This is a rare case. total_faults might become 0 after
+		 * offlining node. In this case, total_faults is set to 1
+		 * for avoiding divide error.
+		 */
+		if (unlikely(total_faults == 0))
+			total_faults = 1;
+
 		diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot;

 		/*
@@ -1506,7 +1515,7 @@ static void update_task_scan_period(struct task_struct *p,
 		 * scanning faster if shared accesses dominate as it may
 		 * simply bounce migrations uselessly
 		 */
-		ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
+		ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (total_faults));
 		diff = (diff * ratio) / NUMA_PERIOD_SLOTS;
 	}

-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ