[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200605141607.GB4455@paulmck-ThinkPad-P72>
Date: Fri, 5 Jun 2020 07:16:07 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
tglx@...utronix.de, linux-kernel@...r.kernel.org
Subject: Re: BUG: kernel NULL pointer dereference from check_preempt_wakeup()
On Fri, Jun 05, 2020 at 03:14:51PM +0200, Peter Zijlstra wrote:
No KCSAN. GCC 8.2.1. No cgroups unless the kernel creates some.
No userspace other than a C-language binary named "init" that
sleeps in an infinite loop.
.config attached.
> On Thu, Jun 04, 2020 at 03:54:45PM -0700, Paul E. McKenney wrote:
>
> > BUG: kernel NULL pointer dereference, address: 0000000000000150
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 9 PID: 196 Comm: rcu_torture_rea Not tainted 5.7.0+ #3923
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> > RIP: 0010:check_preempt_wakeup+0xb1/0x180
> > Code: 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 75 05 48 85 ff 75 29 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 0f 95 c2 48 85 ff 0f 94
>
> That is:
>
> All code
> ========
> 0: 83 ea 01 sub $0x1,%edx
> 3: 48 8b 9b 48 01 00 00 mov 0x148(%rbx),%rbx
> a: 39 d0 cmp %edx,%eax
> c: 75 f2 jne 0x0
> e: 48 39 bb 50 01 00 00 cmp %rdi,0x150(%rbx)
> 15: 75 05 jne 0x1c
> 17: 48 85 ff test %rdi,%rdi
> 1a: 75 29 jne 0x45
> 1c: 48 8b ad 48 01 00 00 mov 0x148(%rbp),%rbp
> 23: 48 8b 9b 48 01 00 00 mov 0x148(%rbx),%rbx
> 2a:* 48 8b bd 50 01 00 00 mov 0x150(%rbp),%rdi <-- trapping instruction
> 31: 48 39 bb 50 01 00 00 cmp %rdi,0x150(%rbx)
> 38: 0f 95 c2 setne %dl
> 3b: 48 85 ff test %rdi,%rdi
> 3e: 0f .byte 0xf
> 3f: 94 xchg %eax,%esp
>
> > RSP: 0018:ffffaccdc02ecd38 EFLAGS: 00010006
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffafa0bc20
> > RDX: 0000000000000000 RSI: ffff946b5df50000 RDI: ffff946b5f469340
> > RBP: 0000000000000000 R08: ffff946b5df80d00 R09: 0000000000000001
>
> And you have RBP == NULL and RBX == NULL
>
> Now, my compiler generates very similar code for this function and tells
> me this is:
>
> check_preempt_wakeup()
> find_matching_se()
> is_same_group()
> if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
>
> and pahole gives us (for struct sched_entity):
>
> struct sched_entity * parent; /* 0x148 0x8 */
> struct cfs_rq * cfs_rq; /* 0x150 0x8 */
>
> apparently both your @se and @pse are NULL.
>
> Which shouldn't be possible..
>
> I also don't see a relation to my recent changes here:
>
> > Call Trace:
> > <IRQ>
> > check_preempt_curr+0x5d/0x90
> > ttwu_do_wakeup.isra.93+0xf/0x100
> > sched_ttwu_pending+0x66/0x90
> > smp_call_function_single_interrupt+0x2d/0xf0
> > call_function_single_interrupt+0xf/0x20
>
> Since I would expect that to blow up much earlier, like @p == NULL or
> something along those lines.
>
> Could you perhaps try something like the below ? It would splat when we
> run ouf of hierarchy to ascend, which is the only semi sane scenario for
> ending up where you are.
Let me interrupt my bisection and give your patch a try. This will
take some time because although this reproduces, it does so slowly.
> Are you actively using cgroups or just whatever systemd decides to gift
> you? Can you perhaps dump /proc/sched_debug while your test is running?
I have no userspace, so no /proc. Is there somewhere I can plant a
printk() to get you the information you need?
Thanx, Paul
> ---
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 35f4cc024dcfc..7aace0a5921e9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -416,8 +416,6 @@ static inline struct sched_entity *parent_entity(struct sched_entity *se)
> static void
> find_matching_se(struct sched_entity **se, struct sched_entity **pse)
> {
> - int se_depth, pse_depth;
> -
> /*
> * preemption test can be made between sibling entities who are in the
> * same cfs_rq i.e who have a common parent. Walk up the hierarchy of
> @@ -425,23 +423,22 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
> * parent.
> */
>
> - /* First walk up until both entities are at same depth */
> - se_depth = (*se)->depth;
> - pse_depth = (*pse)->depth;
> -
> - while (se_depth > pse_depth) {
> - se_depth--;
> - *se = parent_entity(*se);
> - }
> -
> - while (pse_depth > se_depth) {
> - pse_depth--;
> - *pse = parent_entity(*pse);
> - }
> -
> while (!is_same_group(*se, *pse)) {
> - *se = parent_entity(*se);
> - *pse = parent_entity(*pse);
> + int se_depth = (*se)->depth;
> + int pse_depth = (*pse)->depth;
> +
> + if (se_depth <= pse_depth) {
> + struct sched_entity *parent = parent_entity(*pse);
> + if (WARN_ON_ONCE(!parent))
> + return;
> + *pse = parent;
> + }
> + if (se_depth >= pse_depth) {
> + struct sched_entity *parent = parent_entity(*se);
> + if (WARN_ON_ONCE(!parent))
> + return;
> + *se = parent_entity(*se);
> + }
> }
> }
>
View attachment ".config" of type "text/plain" (127172 bytes)
Powered by blists - more mailing lists