lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170907092616.thsuyqklit4463wj@hirez.programming.kicks-ass.net>
Date:   Thu, 7 Sep 2017 11:26:16 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mike Galbraith <efault@....de>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Tejun Heo <tj@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/cpuset/pm: Fix cpuset vs suspend-resume

On Thu, Sep 07, 2017 at 11:13:38AM +0200, Peter Zijlstra wrote:
> Subject: sched/cpuset/pm: Fix cpuset vs suspend-resume
> 
> Cpusets vs suspend-resume is _completely_ broken. And it got noticed
> because it now resulted in non-cpuset usage breaking too.
> 
> On suspend cpuset_cpu_inactive() doesn't call into
> cpuset_update_active_cpus() because it doesn't want to move tasks about,
> there is no need, all tasks are frozen and won't run again until after
> we've resumed everything.
> 
> But this means that when we finally do call into
> cpuset_update_active_cpus() after resuming the last frozen cpu in
> cpuset_cpu_active(), the top_cpuset will not have any difference with
> the cpu_active_mask and this it will not in fact do _anything_.
> 
> So the cpuset configuration will not be restored. This was largely
> hidden because we would unconditionally create identity domains and
> mobile users would not in fact use cpusets much. And servers what do use
> cpusets tend to not suspend-resume much.
> 
> An addition problem is that we'd not in fact wait for the cpuset work to
> finish before resuming the tasks, allowing spurious migrations outside
> of the specified domains.
> 
> Fix the rebuild by introducing cpuset_force_rebuild() and fix the
> ordering with cpuset_wait_for_hotplug().
> 
> Cc: tj@...nel.org
> Cc: rjw@...ysocki.net
> Cc: efault@....de
> Reported-by: Andy Lutomirski <luto@...nel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>

TJ, I _think_ it was commit:

  deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling")

That wrecked things, but there's been so much changes in this area it is
really hard to tell. Note how before that commit it would
unconditionally rebuild the domains, and you 'optimized' that ;-)

That commit also introduced the work to do the async rebuild and failed
to do that flush on resume.

In any case, I think we should put a fixes tag on this commit such that
it gets picked up into stable kernels. Not sure anybody will try and
backport it into 4 year old kernels, but who knows.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ