lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyFT9n5wUNifEHY6+t2bEcD91kL7+dqXuBMSMJvLeDDJQ@mail.gmail.com>
Date:	Wed, 23 Jul 2014 11:35:06 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Dietmar Eggemann <dietmar.eggemann@....com>,
	Michel Dänzer <michel@...nzer.net>,
	Ingo Molnar <mingo@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Random panic in load_balance() with 3.16-rc

On Wed, Jul 23, 2014 at 11:25 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Wed, Jul 23, 2014 at 10:26:21AM -0700, Linus Torvalds wrote:
>>
>> The whole - and really *only* - point of __get_cpu_var is to get the
>> address of a a cpu variable. If you want to read the *value* of the
>> variable, you should use "this_cpu_read()", which can use things like
>> special instructions or segments to read the percpu area.
>
> I think this code predates all the this_cpu* magic. But yes, agreed.

It turns out - now that I've stared at the code for much too long for
my own sanity - that this code actually depends very subtly on
"__get_cpu_var()".

And not in good ways.

So what happens is that the games that "cpumask_var_t" plays in order
to make code work correctly with both on-stack and off-stack
configurations are really toxic to good per-cpu use.

For the off-stack case, a cpumask_var_t is a pointer to the real
allocation, and using "__get_cpu_var()" is very suboptimal, because it
gets that pointer by following the percpu offset explicitly. So we
load the percpu offset, then we load the offset to "load_balance_mask"
within that, and then we load the pointer off that. We'd be much
better off using "this_cpu_read()", which can just use the percpu area
directly to read the pointer.

HOWEVER.

For the direct case, a "cpumask_var_t" is an array, exactly so that
accessing it will just return the address of it, so that you can get
the "struct cpumask *" directly.  And there the whole dance with
adding the percpu offset is actually the right thing, because what you
want is the address to the percpu area. And you cannot use
"this_cpu_read()", because that wants to read the _value_, which is
not what we want at all.

Ugh. So it's not just the initialization that is subtle, the use of
those per-cpu "cpumask_var_t" is sadly suboptimal too.

But the code does appear to be correct. It just is messy, avoids the
proper abstractions, and generates suboptimal code for the off-stack
case.

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ