lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2093716774.45636590.1443610455084.JavaMail.zimbra@redhat.com>
Date:	Wed, 30 Sep 2015 06:54:15 -0400 (EDT)
From:	Ulrich Obergfell <uobergfe@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, dzickus@...hat.com,
	atomlin@...hat.com
Subject: Re: [PATCH 0/5] improve handling of errors returned by
 kthread_park()


Andrew,

> ... what inspired this patchset?
> Are you experiencing kthread_park() failures in practice?

I did not experience kthread_park() failures in practice.  Looking at
watchdog_park_threads() from 81a4beef91ba4a9e8ad6054ca9933dff7e25ff28
I realized that there is a theoretical corner case which would not be
handled well. Let's assume that kthread_park() would return an error
in the following flow of execution (the user changes watchdog_thresh).

  proc_watchdog_thresh
    set_sample_period()
    //
    // The watchdog_thresh and sample_period variable are now set to
    // the new value.
    //
    proc_watchdog_update
      watchdog_enable_all_cpus
        update_watchdog_all_cpus
          watchdog_park_threads

Let's say the system has eight CPUs and that kthread_park() failed to
park watchdog/4. In this example watchdog/0 .. watchdog/3 are already
parked and watchdog/5 .. watchdog/7 are not parked yet (we don't know
exactly what happened to watchdog/4). watchdog_park_threads() unparks
the threads if kthread_park() of one thread fails.

  for_each_watchdog_cpu(cpu) {
          ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
          if (ret)
                  break;
  }
  if (ret) {
          for_each_watchdog_cpu(cpu)
                  kthread_unpark(per_cpu(softlockup_watchdog, cpu));
  }

watchdog/0 .. watchdog/3 will pick up the new watchdog_thresh value
when they are unparked (please see the watchdog_enable() function),
whereas watchdog/5 .. watchdog/7 will continue to use the old value
for the hard lockup detector and begin using the new value for the
soft lockup detector (kthread_unpark() sees watchdog/5 .. watchdog/7
in the unparked state, so it skips these threads). The inconsistency
which results from using different watchdog_thresh values can cause
unexpected behaviour of the lockup detectors (e.g. false positives).

The new error handling that is introduced by this patch set aims to
handle the above corner case in a better way (this was my original
motivation to come up with a patch set). However, I also think that
_if_ kthread_park() would ever be changed in the future so that it
could return errors under various (other) conditions, the patch set
should prepare the watchdog code for this possibility.

Since I did not experience kthread_park() failures in practice, I
used some instrumentation to fake error returns from kthread_park()
in order to test the patches.


Regards,

Uli


----- Original Message -----
From: "Andrew Morton" <akpm@...ux-foundation.org>
To: "Ulrich Obergfell" <uobergfe@...hat.com>
Cc: linux-kernel@...r.kernel.org, dzickus@...hat.com, atomlin@...hat.com
Sent: Wednesday, September 30, 2015 1:30:36 AM
Subject: Re: [PATCH 0/5] improve handling of errors returned by kthread_park()

On Mon, 28 Sep 2015 22:44:07 +0200 Ulrich Obergfell <uobergfe@...hat.com> wrote:

> The original watchdog_park_threads() function that was introduced by
> commit 81a4beef91ba4a9e8ad6054ca9933dff7e25ff28 takes a very simple
> approach to handle errors returned by kthread_park(): It attempts to
> roll back all watchdog threads to the unparked state. However, this
> may be undesired behaviour from the perspective of the caller which
> may want to handle errors as appropriate in its specific context.
> Currently, there are two possible call chains:
> 
> - watchdog suspend/resume interface
> 
>     lockup_detector_suspend
>       watchdog_park_threads
> 
> - write to parameters in /proc/sys/kernel
> 
>     proc_watchdog_update
>       watchdog_enable_all_cpus
>         update_watchdog_all_cpus
>           watchdog_park_threads
> 
> Instead of 'blindly' attempting to unpark the watchdog threads if a 
> kthread_park() call fails, the new approach is to disable the lockup
> detectors in the above call chains. Failure becomes visible to the
> user as follows:
> 
> - error messages from lockup_detector_suspend()
>                    or watchdog_enable_all_cpus()
> 
> - the state that can be read from /proc/sys/kernel/watchdog_enabled
> 
> - the 'write' system call in the latter call chain returns an error
> 

hm, you made me look at kthread parking.  Why does it exist?  What is a
"parked" thread anyway, and how does it differ from, say, a sleeping
one?  The 2a1d446019f9a5983ec5a335b changelog is pretty useless and the
patch added no useful documentation, sigh.

Anwyay...  what inspired this patchset?  Are you experiencing
kthread_park() failures in practice?  If so, what is causing them?  And
what is the user-visible effect of these failures?  This is all pretty
important context for such a patchset.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ