lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200605141607.GB4455@paulmck-ThinkPad-P72>
Date:   Fri, 5 Jun 2020 07:16:07 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        tglx@...utronix.de, linux-kernel@...r.kernel.org
Subject: Re: BUG: kernel NULL pointer dereference from check_preempt_wakeup()

On Fri, Jun 05, 2020 at 03:14:51PM +0200, Peter Zijlstra wrote:

No KCSAN.  GCC 8.2.1.  No cgroups unless the kernel creates some.
No userspace other than a C-language binary named "init" that
sleeps in an infinite loop.

.config attached.

> On Thu, Jun 04, 2020 at 03:54:45PM -0700, Paul E. McKenney wrote:
> 
> > BUG: kernel NULL pointer dereference, address: 0000000000000150
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0 
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 9 PID: 196 Comm: rcu_torture_rea Not tainted 5.7.0+ #3923
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> > RIP: 0010:check_preempt_wakeup+0xb1/0x180
> > Code: 83 ea 01 48 8b 9b 48 01 00 00 39 d0 75 f2 48 39 bb 50 01 00 00 75 05 48 85 ff 75 29 48 8b ad 48 01 00 00 48 8b 9b 48 01 00 00 <48> 8b bd 50 01 00 00 48 39 bb 50 01 00 00 0f 95 c2 48 85 ff 0f 94
> 
> That is:
> 
> All code
> ========
>    0:	83 ea 01             	sub    $0x1,%edx
>    3:	48 8b 9b 48 01 00 00 	mov    0x148(%rbx),%rbx
>    a:	39 d0                	cmp    %edx,%eax
>    c:	75 f2                	jne    0x0
>    e:	48 39 bb 50 01 00 00 	cmp    %rdi,0x150(%rbx)
>   15:	75 05                	jne    0x1c
>   17:	48 85 ff             	test   %rdi,%rdi
>   1a:	75 29                	jne    0x45
>   1c:	48 8b ad 48 01 00 00 	mov    0x148(%rbp),%rbp
>   23:	48 8b 9b 48 01 00 00 	mov    0x148(%rbx),%rbx
>   2a:*	48 8b bd 50 01 00 00 	mov    0x150(%rbp),%rdi		<-- trapping instruction
>   31:	48 39 bb 50 01 00 00 	cmp    %rdi,0x150(%rbx)
>   38:	0f 95 c2             	setne  %dl
>   3b:	48 85 ff             	test   %rdi,%rdi
>   3e:	0f                   	.byte 0xf
>   3f:	94                   	xchg   %eax,%esp
> 
> > RSP: 0018:ffffaccdc02ecd38 EFLAGS: 00010006
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffafa0bc20
> > RDX: 0000000000000000 RSI: ffff946b5df50000 RDI: ffff946b5f469340
> > RBP: 0000000000000000 R08: ffff946b5df80d00 R09: 0000000000000001
> 
> And you have RBP == NULL and RBX == NULL
> 
> Now, my compiler generates very similar code for this function and tells
> me this is:
> 
>   check_preempt_wakeup()
>     find_matching_se()
>       is_same_group()
>         if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*
> 
> and pahole gives us (for struct sched_entity):
> 
>   struct sched_entity *      parent;                                               /* 0x148   0x8 */
>   struct cfs_rq *            cfs_rq;                                               /* 0x150   0x8 */
> 
> apparently both your @se and @pse are NULL.
> 
> Which shouldn't be possible..
> 
> I also don't see a relation to my recent changes here:
> 
> > Call Trace:
> >  <IRQ>
> >  check_preempt_curr+0x5d/0x90
> >  ttwu_do_wakeup.isra.93+0xf/0x100
> >  sched_ttwu_pending+0x66/0x90
> >  smp_call_function_single_interrupt+0x2d/0xf0
> >  call_function_single_interrupt+0xf/0x20
> 
> Since I would expect that to blow up much earlier, like @p == NULL or
> something along those lines.
> 
> Could you perhaps try something like the below ? It would splat when we
> run ouf of hierarchy to ascend, which is the only semi sane scenario for
> ending up where you are.

Let me interrupt my bisection and give your patch a try.  This will
take some time because although this reproduces, it does so slowly.

> Are you actively using cgroups or just whatever systemd decides to gift
> you? Can you perhaps dump /proc/sched_debug while your test is running?

I have no userspace, so no /proc.  Is there somewhere I can plant a
printk() to get you the information you need?

							Thanx, Paul

> ---
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 35f4cc024dcfc..7aace0a5921e9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -416,8 +416,6 @@ static inline struct sched_entity *parent_entity(struct sched_entity *se)
>  static void
>  find_matching_se(struct sched_entity **se, struct sched_entity **pse)
>  {
> -	int se_depth, pse_depth;
> -
>  	/*
>  	 * preemption test can be made between sibling entities who are in the
>  	 * same cfs_rq i.e who have a common parent. Walk up the hierarchy of
> @@ -425,23 +423,22 @@ find_matching_se(struct sched_entity **se, struct sched_entity **pse)
>  	 * parent.
>  	 */
>  
> -	/* First walk up until both entities are at same depth */
> -	se_depth = (*se)->depth;
> -	pse_depth = (*pse)->depth;
> -
> -	while (se_depth > pse_depth) {
> -		se_depth--;
> -		*se = parent_entity(*se);
> -	}
> -
> -	while (pse_depth > se_depth) {
> -		pse_depth--;
> -		*pse = parent_entity(*pse);
> -	}
> -
>  	while (!is_same_group(*se, *pse)) {
> -		*se = parent_entity(*se);
> -		*pse = parent_entity(*pse);
> +		int se_depth = (*se)->depth;
> +		int pse_depth = (*pse)->depth;
> +
> +		if (se_depth <= pse_depth) {
> +			struct sched_entity *parent = parent_entity(*pse);
> +			if (WARN_ON_ONCE(!parent))
> +				return;
> +			*pse = parent;
> +		}
> +		if (se_depth >= pse_depth) {
> +			struct sched_entity *parent = parent_entity(*se);
> +			if (WARN_ON_ONCE(!parent))
> +				return;
> +			*se = parent_entity(*se);
> +		}
>  	}
>  }
>  

View attachment ".config" of type "text/plain" (127172 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ