[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190426203114.GA185553@google.com>
Date: Fri, 26 Apr 2019 16:31:14 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Daniel Colascione <dancol@...gle.com>
Cc: Christian Brauner <christian@...uner.io>,
linux-kernel <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Arnd Bergmann <arnd@...db.de>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Ingo Molnar <mingo@...nel.org>, Jann Horn <jannh@...gle.com>,
Jann Horn <jann@...jh.net>,
Jonathan Kowalski <bl0pbl33p@...il.com>,
Android Kernel Team <kernel-team@...roid.com>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@...r.kernel.org>,
Andy Lutomirski <luto@...capital.net>,
Michal Hocko <mhocko@...e.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Serge Hallyn <serge@...lyn.com>, Shuah Khan <shuah@...nel.org>,
Sandeep Patil <sspatil@...gle.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Suren Baghdasaryan <surenb@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Tim Murray <timmurray@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tycho Andersen <tycho@...ho.ws>
Subject: Re: [PATCH v1 2/2] Add selftests for pidfd polling
On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel@...lfernandes.org> wrote:
> > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@...uner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
>
> You've been running them in an ideal environment.
One would hope for a reliable test environment.
> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
>
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
>
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())
No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.
>
> You can replace each of these clauses with something like this:
>
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
> sleep(SHORT_DELAY)
> check(some_condition())
>
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.
Yes, but try implementing some_condition() :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.
> Note that this code is fine:
>
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
>
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that
As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.
I don't think its worth bikeshedding about hypothetical issues.
thanks,
- Joel
Powered by blists - more mailing lists