lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201118133333.GA1506553@elver.google.com>
Date:   Wed, 18 Nov 2020 14:33:33 +0100
From:   Marco Elver <elver@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Will Deacon <will@...nel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Davidlohr Bueso <dave@...olabs.net>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        paulmck@...nel.org
Subject: Re: [PATCH] sched: Fix data-race in wakeup

On Tue, Nov 17, 2020 at 10:29AM +0100, Peter Zijlstra wrote:
[...]
> > Now the million dollar question is why KCSAN hasn't run into this. Hrmph.
> 
> kernel/sched/Makefile:KCSAN_SANITIZE := n
> 
> might have something to do with that, I suppose.

For the record, I tried to reproduce this data race. I found a
read/write race on this bitfield, but not yet that write/write race
(perhaps I wasn't running the right workload).

	| read to 0xffff8d4e2ce39aac of 1 bytes by task 5269 on cpu 3:
	|  __sched_setscheduler+0x4a9/0x1070 kernel/sched/core.c:5297
	|  sched_setattr kernel/sched/core.c:5512 [inline]
	|  ...
	|
	| write to 0xffff8d4e2ce39aac of 1 bytes by task 5268 on cpu 1:
	|  __schedule+0x296/0xab0 kernel/sched/core.c:4462 		prev->sched_contributes_to_load =
	|  schedule+0xd1/0x130 kernel/sched/core.c:4601
	|  ...
	|
	| Full report: https://paste.debian.net/hidden/07a50732/

Getting to the above race also required some effort as 1) I kept hitting
other unrelated data races in the scheduler and had to silence those
first to be able to make progress, and 2) only enable KCSAN for
scheduler code to just ignore all other data races. Then I let syzkaller
run for a few minutes.

Also note our default KCSAN config is suboptimal. For serious debugging,
I'd recommend the same config that rcutorture uses with the --kcsan
flag, specifically:

	CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n,
	CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n

to get the full picture.

However, as a first step, it'd be nice to eventually remove the
KCSAN_SANITIZE := n from kernel/sched/Makefile when things are less
noisy (so that syzbot and default builds can start finding more serious
issues, too).

Thanks,
-- Marco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ