linux-kernel - Re: [PATCH v1 2/2] Add selftests for pidfd polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190426203114.GA185553@google.com>
Date:   Fri, 26 Apr 2019 16:31:14 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Daniel Colascione <dancol@...gle.com>
Cc:     Christian Brauner <christian@...uner.io>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Ingo Molnar <mingo@...nel.org>, Jann Horn <jannh@...gle.com>,
        Jann Horn <jann@...jh.net>,
        Jonathan Kowalski <bl0pbl33p@...il.com>,
        Android Kernel Team <kernel-team@...roid.com>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        Andy Lutomirski <luto@...capital.net>,
        Michal Hocko <mhocko@...e.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Serge Hallyn <serge@...lyn.com>, Shuah Khan <shuah@...nel.org>,
        Sandeep Patil <sspatil@...gle.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tim Murray <timmurray@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Tycho Andersen <tycho@...ho.ws>
Subject: Re: [PATCH v1 2/2] Add selftests for pidfd polling

On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel@...lfernandes.org> wrote:
> > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@...uner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
> 
> You've been running them in an ideal environment.

One would hope for a reliable test environment.

> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
> 
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
> 
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())

No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.

> 
> You can replace each of these clauses with something like this:
> 
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
>   sleep(SHORT_DELAY)
> check(some_condition())
> 
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.

Yes, but try implementing some_condition()  :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.

> Note that this code is fine:
> 
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
> 
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that

As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.

I don't think its worth bikeshedding about hypothetical issues.

thanks,

 - Joel