lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZdNOb7fOLIoY5sgW@chenyu5-mobl2>
Date: Mon, 19 Feb 2024 20:49:51 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Abel Wu <wuyun.abel@...edance.com>
CC: kernel test robot <oliver.sang@...el.com>, <oe-lkp@...ts.linux.dev>,
	<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Peter Zijlstra
	<peterz@...radead.org>, <aubrey.li@...ux.intel.com>, Tiwei Bie
	<tiwei.btw@...group.com>, Honglei Wang <wanghonglei@...ichuxing.com>, "Aaron
 Lu" <aaron.lu@...el.com>
Subject: Re: [linus:master] [sched/eevdf] 2227a957e1:
 BUG:kernel_NULL_pointer_dereference,address

On 2024-01-30 at 18:13:32 +0800, Abel Wu wrote:
> On 1/30/24 3:24 PM, kernel test robot Wrote:
> > 
> > 
> > Hello,
> > 
> > (besides a previous performance report),
> > kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> > 
> > commit: 2227a957e1d5b1941be4e4207879ec74f4bb37f8 ("sched/eevdf: Sort the rbtree by virtual deadline")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > [test failed on linus/master 3a5879d495b226d0404098e3564462d5f1daa33b]
> > [test failed on linux-next/master 01af33cc9894b4489fb68fa35c40e9fe85df63dc]
> > 
> > in testcase: trinity
> > version: trinity-i386-abe9de86-1_20230429
> > with following parameters:
> > 
> > 	runtime: 300s
> > 	group: group-03
> > 	nr_groups: 5
> > 
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> > 
> > 
> > compiler: clang-17
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > 
> > we found this issue happens in very random way (23 out of 999 runs).
> > but keeps clean on parent.
> 
> Thanks for reporting, I will try to reproduce the issue. Does the 'parent'
> mean the same code branch without this commit?
> 
> > 
> > 84db47ca7146d7bd 2227a957e1d5b1941be4e420787
> > ---------------- ---------------------------
> >         fail:runs  %reproduction    fail:runs
> >             |             |             |
> >             :999          2%          23:999   dmesg.BUG:kernel_NULL_pointer_dereference,address
> >             :999          2%          23:999   dmesg.Kernel_panic-not_syncing:Fatal_exception
> >             :999          2%          23:999   dmesg.Oops:#[##]
> > 
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202401301012.2ed95df0-oliver.sang@intel.com
> > 
> > 
> > sorry for below parse failure which caused no real line numbers.
> > we will follow further. the orgial dmesg could be fetch from below link.
> > 
> > 
> > [  512.079810][ T8305] BUG: kernel NULL pointer dereference, address: 0000002c
> > [  512.080897][ T8305] #PF: supervisor read access in kernel mode
> > [  512.081636][ T8305] #PF: error_code(0x0000) - not-present page
> > [  512.082337][ T8305] *pde = 00000000
> > [  512.082829][ T8305] Oops: 0000 [#1] PREEMPT SMP
> > [  512.083407][ T8305] CPU: 1 PID: 8305 Comm: watchdog Tainted: G        W        N 6.7.0-rc1-00006-g2227a957e1d5 #1 819e6d1a8b887f5f97adb4aed77d98b15504c836
> > [  512.084986][ T8305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 512.086203][ T8305] EIP: set_next_entity (fair.c:?)
> 
> There was actually a NULL-test in pick_eevdf() before this commit,
> but I removed it by intent as I found it impossible to be NULL after
> examining 'all' the cases.
> 
> Also cc Tiwei who once proposed to add this check back.
> https://lore.kernel.org/all/20231208112100.18141-1-tiwei.btw@antgroup.com/
> 
> Thanks,
> 	Abel
>

While looking at pick_eevdf(), I have a thought.
Currently the sched entity is sorted by their deadline. During task
pickup, the pick_eevdf() scans for an candidate sched entity with the
smallest deadline. Meanwhile this candidate sched entity must also be
eligible.

The scan is O(lgn) on average, and O(1) at best case. How about making the
average scan even faster by sorting the sched entity not only by deadline,
but also the eligibility? The idea is that, the eligible sched entity with
smaller deadline is sorted at the front the tree. Otherwise, if the entity
is not eligible, even if it has a smaller deadline, it should be sorted
at the end of the tree.

After the change, pick_eevdf() get the leftmost sched entity at O(1) on
average. Besides, it is guaranteed to return non-NULL sched entity in
pick_eevdf(), which prevents suspicious NULL pointer exception in pick_eevdf().

For example, suppose there are two sched entities to be queued, se_a and se_b.
Consider their eligibility and deadline, there are 6 combination:

1. se_a is eligible, se_b is eligible, se_a.deadline < se_b.deadline
2. se_a is eligible, se_b is eligible, se_a.deadline >= se_b.deadline
3. se_a is eligible, se_b is not eligible
4. se_a is not eligible, se_b is eligible
5. se_a is not eligible, se_b is not eligible, se_a.deadline < se_b.deadline
6. se_a is not eligible, se_b is not eligible, se_a.deadline >= se_b.deadline

In scenario 1, 3, 5, sched_entity se_a should be sorted before se_b,
so pick_eevdf() would pick se_a first.

When enqueuing a new sched entity, it is regarded as eligible if its
vlag is positive. In theory later in pick_eevdf(), the eligibility
of this sched entity should be re-checked via entity_eligible(). But
consider if the sched entity is eliglble when enqueued, it is very
likely the same sched entity remains eligible when pick_eevdf(), because
the V keeps moving forward but the vruntime of this sched entity remain
unchanged - the vlag could get larger.

Something like this untested:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 533547e3c90a..831043cc1432 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -551,11 +551,19 @@ static inline u64 min_vruntime(u64 min_vruntime, u64 vruntime)
 static inline bool entity_before(const struct sched_entity *a,
 				 const struct sched_entity *b)
 {
-	/*
-	 * Tiebreak on vruntime seems unnecessary since it can
-	 * hardly happen.
-	 */
-	return (s64)(a->deadline - b->deadline) < 0;
+	bool eli_a, eli_b;
+
+	eli_a = (a->vlag >= 0) ? true : false;
+	eli_b = (b->vlag >= 0) ? true : false;
+
+	if ((eli_a && eli_b) || (!eli_a && !eli_b))
+		/*
+		 * Tiebreak on vruntime seems unnecessary since it can
+		 * hardly happen.
+		 */
+		return (s64)(a->deadline - b->deadline) < 0;
+
+	return eli_a ? 1 : 0;
 }
 
 static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -877,10 +885,8 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
  */
 static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 {
-	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
 	struct sched_entity *se = __pick_first_entity(cfs_rq);
 	struct sched_entity *curr = cfs_rq->curr;
-	struct sched_entity *best = NULL;
 
 	/*
 	 * We can safely skip eligibility check if there is only one entity
@@ -899,45 +905,8 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
 		return curr;
 
-	/* Pick the leftmost entity if it's eligible */
-	if (se && entity_eligible(cfs_rq, se)) {
-		best = se;
-		goto found;
-	}
-
-	/* Heap search for the EEVD entity */
-	while (node) {
-		struct rb_node *left = node->rb_left;
-
-		/*
-		 * Eligible entities in left subtree are always better
-		 * choices, since they have earlier deadlines.
-		 */
-		if (left && vruntime_eligible(cfs_rq,
-					__node_2_se(left)->min_vruntime)) {
-			node = left;
-			continue;
-		}
-
-		se = __node_2_se(node);
-
-		/*
-		 * The left subtree either is empty or has no eligible
-		 * entity, so check the current node since it is the one
-		 * with earliest deadline that might be eligible.
-		 */
-		if (entity_eligible(cfs_rq, se)) {
-			best = se;
-			break;
-		}
-
-		node = node->rb_right;
-	}
-found:
-	if (!best || (curr && entity_before(curr, best)))
-		best = curr;
-
-	return best;
+	/* Pick the leftmost entity */
+	return se;
 }
 
 #ifdef CONFIG_SCHED_DEBUG
-- 
2.25.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ