lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1557279164.6speg3hhsy.astroid@bobo.none>
Date:   Wed, 08 May 2019 11:38:18 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     fweisbec@...il.com, hpa@...or.com, linux-kernel@...r.kernel.org,
        linux-tip-commits@...r.kernel.org, mingo@...nel.org,
        peterz@...radead.org, rafael.j.wysocki@...el.com,
        tglx@...utronix.de, torvalds@...ux-foundation.org
Subject: Re: [tip:sched/core] sched/isolation: Require a present CPU in
 housekeeping mask

Frederic Weisbecker's on May 8, 2019 10:35 am:
> On Tue, May 07, 2019 at 09:50:24AM +1000, Nicholas Piggin wrote:
>> Frederic Weisbecker's on May 7, 2019 1:16 am:
>> > On Sat, May 04, 2019 at 04:59:12PM +1000, Nicholas Piggin wrote:
>> >> Frederic Weisbecker's on May 4, 2019 10:27 am:
>> >> > On Fri, May 03, 2019 at 10:47:37AM -0700, tip-bot for Nicholas Piggin wrote:
>> >> >> Commit-ID:  9219565aa89033a9cfdae788c1940473a1253d6c
>> >> >> Gitweb:     https://git.kernel.org/tip/9219565aa89033a9cfdae788c1940473a1253d6c
>> >> >> Author:     Nicholas Piggin <npiggin@...il.com>
>> >> >> AuthorDate: Thu, 11 Apr 2019 13:34:47 +1000
>> >> >> Committer:  Ingo Molnar <mingo@...nel.org>
>> >> >> CommitDate: Fri, 3 May 2019 19:42:58 +0200
>> >> >> 
>> >> >> sched/isolation: Require a present CPU in housekeeping mask
>> >> >> 
>> >> >> During housekeeping mask setup, currently a possible CPU is required.
>> >> >> That does not guarantee the CPU would be available at boot time, so
>> >> >> check to ensure that at least one present CPU is in the mask.
>> >> > 
>> >> > I have a doubt about the requirements and semantics of cpu_present_mask.
>> >> > IIUC a present CPU means that it is physically plugged in (from ACPI
>> >> > perspective) but might not be logically plugged in (set on cpu_online_mask).
>> >> 
>> >> Right, a superset of cpu_possible_mask, subset of cpu_online_mask. It 
>> >> means that CPU can be brought online at any time.
>> >> 
>> >> > But do we have the guarantee that a present CPU _will_ be online at least once
>> >> > right after the boot? After all, kernel parameters such as "maxcpus=" can prevent
>> >> > from turning some CPUs on. I guess there are even more creative ways to achieve
>> >> > that.
>> >> > 
>> >> > In any case we really require the housekeeper to be forced online. Perhaps
>> >> > I missed that enforcement somewhere in the patchset?
>> >> 
>> >> No I think you're right, that may be able to boot without anything in
>> >> the housekeeping mask. Maybe we can just cpu_up() a CPU in the 
>> >> housekeeping mask with a warning that it has overidden their SMP
>> >> command line option. I'll take a look at it.
>> > 
>> > But then what if cpu_up() fails? In this case I can think of only two
>> > answers:
>> > 
>> > * Force the boot CPU as the housekeeper.
>> > * Rollback the whole thing: nohz and all isolation.
>> 
>> If cpu_up fails despite being in the present map and we explicitly
>> selected it as the housekeeper? I think it would be okay to print
>> a message telling admin to correct the config, and panic.
>> 
>> We try a best effort to make the system boot and limp along, but if
>> you misconfigure it, crashing is not unreasonable. There's lots of
>> command line option misconfiguration that will cause the same thing.
>> 
>> The primary problem with my patch that needs to be addressed is that
>> the error is not explicitly caught and printed if the housekeeper
>> does not come up, so the system might die in non-obvious ways.
> 
> I usually reserve panic and BUG_ON() to last resort when data integrity is
> directly threatened. But indeed I guess that's all we have for now.

Right, specifying a CPU for housekeeping that excluded from coming
up at boot with maxcpus= or whatever, is not such a big deal to
panic I think. Just need to have a clear error message.

> If we take that path, I'd rather not call that cpu_up() and simply panic if
> the given CPU happens not to be online after SMP bootup.

Sure that's fine by me too.

Thanks,
Nick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ