linux-kernel - Re: [PATCH] Convert struct pid count to refcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190329173209.GA23683@redhat.com>
Date:   Fri, 29 Mar 2019 18:32:09 +0100
From:   Oleg Nesterov <oleg@...hat.com>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     Jann Horn <jannh@...gle.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Kees Cook <keescook@...omium.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Android Kernel Team <kernel-team@...roid.com>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...e.com>,
        "Reshetova, Elena" <elena.reshetova@...el.com>,
        Alan Stern <stern@...land.harvard.edu>
Subject: Re: [PATCH] Convert struct pid count to refcount_t

On 03/28, Paul E. McKenney wrote:
>
> On Thu, Mar 28, 2019 at 05:26:42PM +0100, Oleg Nesterov wrote:
> >
> > Since you added Paul let me add more confusion to this thread ;)
>
> Woo-hoo!!!  More confusion!  Bring it on!!!  ;-)

OK, thanks, you certainly managed to confused me much more than I expected!

> > There were some concerns about the lack of barriers in put_pid(), but I can't
> > find that old discussion and I forgot the result of that discussion...
> >
> > Paul, could you confirm that this code
> >
> > 	CPU_0		CPU_1
> >
> > 	X = 1;		if (READ_ONCE(Y))
> > 	mb();			X = 2;
> > 	Y = 1;		BUG_ON(X != 2);
> >
> >
> > is correct? I think it is, control dependency pairs with mb(), right?
>
> The BUG_ON() is supposed to happen at the end of time, correct?

Yes,

> As written, there is (in the strict sense) a data race between the load
> of X in the BUG_ON() and CPU_0's store to X.

Well, this pseudo code is simply wrong, I meant that "X = 1" on CPU 0
must not happen after "X = 2" on CPU 1 or we have a problem.


> But the more I talk to compiler writers, the
> less comfortable I become with data races in general.  :-/
>
> So I would also feel better if the "Y = 1" was WRITE_ONCE().

If we forget about potential compiler bugs, then it is not clear to me how
WRITE_ONCE() can help in this case. mb() implies the compiler barrier, and
we do not really need the "once" semantics?

But as for put_pid(), it actually does atomic_dec_and_test(&Y), so this is
probably not relevant.

> On the other hand, this is a great opportunity to try out Alan Stern's
> prototype plain-accesses patch to the Linux Kernel Memory Model (LKMM)!
>
> https://lkml.kernel.org/r/Pine.LNX.4.44L0.1903191459270.1593-200000@iolanthe.rowland.org

Heh. Will do, but only after I buy more brains.

> Here is what I believe is the litmus test that your are interested in:
>
> ------------------------------------------------------------------------
> C OlegNesterov-put_pid
>
> {}
>
> P0(int *x, int *y)
> {
> 	*x = 1;
> 	smp_mb();
> 	*y = 1;
> }
>
> P1(int *x, int *y)
> {
> 	int r1;
>
> 	r1 = READ_ONCE(*y);
> 	if (r1)
> 		*x = 2;
> }
>
> exists (1:r1=1 /\ ~x=2)

I am not familiar with litmus, and I do not really understand what (and why)
it reports.

> Running this through herd with Alan's patch detects the data race
> and says that the undesired outcome is allowed:

OK, so it says that "*x = 2" can happen before "*x = 1" even if P1() observes
*y == 1.

Still can't understand how this can happen... Nevermind ;)

> Using WRITE_ONCE() for both P0()'s store to y and P1()'s store to x
> gets rid of both the "Flag data-race" and the undesired outcome:

...

> P1(int *x, int *y)
> {
> 	int r1;
>
> 	r1 = READ_ONCE(*y);
> 	if (r1)
> 		WRITE_ONCE(*x, 2);
> }

And this is what Documentation/memory-barriers.txt says in the "CONTROL
DEPENDENCIES" section:

		q = READ_ONCE(a);
		if (q) {
			WRITE_ONCE(b, 1);
		}

	Control dependencies pair normally with other types of barriers.
	That said, please note that neither READ_ONCE() nor WRITE_ONCE()
	are optional!

but again, I fail to really understand why WRITE_ONCE() is not optional
in this particular case.

Thanks!

Oleg.