lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNNXOhRvdFDLAeTSyjLSZSb4qQWVbRgPcvxV_=zKUXrBqw@mail.gmail.com>
Date: Wed, 23 Oct 2024 15:18:53 +0200
From: Marco Elver <elver@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: paulmck@...nel.org, Alexander Potapenko <glider@...gle.com>, 
	syzbot <syzbot+0ec1e96c2cdf5c0e512a@...kaller.appspotmail.com>, 
	audit@...r.kernel.org, eparis@...hat.com, linux-kernel@...r.kernel.org, 
	paul@...l-moore.com, syzkaller-bugs@...glegroups.com, 
	kent.overstreet@...ux.dev
Subject: Re: [syzbot] [kernel?] KCSAN: assert: race in dequeue_entities

On Wed, 23 Oct 2024 at 11:36, Peter Zijlstra <peterz@...radead.org> wrote:
> On Wed, Oct 23, 2024 at 11:03:11AM +0200, Marco Elver wrote:
> > On Wed, 23 Oct 2024 at 10:54, Marco Elver <elver@...gle.com> wrote:
> > > On Tue, Oct 22, 2024 at 09:57PM +0200, Marco Elver wrote:
> > > > On Tue, 22 Oct 2024 at 21:12, Peter Zijlstra <peterz@...radead.org> wrote:
> > > [...]
> > > > > So KCSAn is trying to tell me these two paths run concurrently on the
> > > > > same 'p' ?!? That would be a horrible bug -- both these call chains
> > > > > should be holding rq->__lock (for task_rq(p)).
> > > >
> > > > Yes correct.
> > > >
> > > > And just to confirm this is no false positive, the way KCSAN works
> > > > _requires_ the race to actually happen before it reports anything;
> > > > this can also be seen in Alexander's report with just 1 stack trace
> > > > where it saw the value transition from 0 to 1 (TASK_ON_RQ_QUEUED) but
> > > > didn't know who did the write because kernel/sched was uninstrumented.
> > >
> > > Got another version of the splat with CONFIG_KCSAN_VERBOSE=y. Lockdep seems to
> > > think that both threads here are holding rq->__lock.
> >
> > Gotta read more carefully, one instance is ffffa2e57dc2f398 another is
> > ffffa2e57dd2f398. If I read it right, then they're not actually the
> > same lock.
>
> Yeah, as explained in the diagram below, the moment the ->on_rq = 0
> store goes through, we no longer own the task. And since
> ASSERT_EXCLUSIVE_WRITER is after that, we go splat.
>
> The below patch changes this order and switches to using
> smp_store_release() and ensures to not reference the task after it.
>
> I've boot tested it, but not much else.
>
> Could you please give this a go (on top of -rc3)?
>
> This also explains the SCHED_WARN_ON() Kent saw, that is subject to the
> same race.
>
> ---
>  kernel/sched/fair.c  | 21 ++++++++++++++-------
>  kernel/sched/sched.h | 34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 46 insertions(+), 9 deletions(-)
[...]

Tested-by: Marco Elver <elver@...gle.com>

Previously syzkaller would give us that report within ~1h of fuzzing.
Have been fuzzing with your patch applied for 3h now, and this report
has not resurfaced.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ