lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 11 Jul 2014 14:15:49 -0700
From:	Havard Skinnemoen <hskinnemoen@...gle.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Tony Luck <tony.luck@...il.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Ewout van Bekkum <ewout@...gle.com>
Subject: Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and
 CMCI reports.

On Fri, Jul 11, 2014 at 12:52 PM, Borislav Petkov <bp@...en8.de> wrote:
> On Fri, Jul 11, 2014 at 12:06:40PM -0700, Tony Luck wrote:
>> > +               if (atomic_add_unless(&mce_banks[i].poll_reader, 1, 1)) {
>> > +                       m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
>>
>> Same as yesterday. You may skip reading a bank because someone else
>> is reading the same bank number, even though you don't share that bank
>> with them.
>
> Not if those banks are in a percpu variable. And this is what
> machine_check_poll gets. The ->poll_reader thing is then per-cpu too.
>
> For shared banks it should work also as expected since we want there
> only one reader to see the MCE signature.

If poll_reader is per-cpu, what will prevent multiple CPUs from
reading the same error from a shared bank?

>> If we are willing to be rather flexible amount when polling happens,
>> and not allow very fast poll rates. Then we could do something like
>> have the lowest numbered online cpu be the only one that sets a
>> timer. When it goes off, it scans its own banks, and then uses an
>> async cross-processor call to poke the next highest numbered
>> online cpu to have it scan banks and poke the next guy.
>>
>> That way we know that two cpus can't be polling at the same time,
>> because we convoy them all one at a time.
>
> See above - those banks are percpu. And besides, mce_timer_fn already
> has the WARN_ON which otherwise be firing left and right.

mce_banks isn't currently percpu.

> It seems, Havard's issue is only with shared banks. I think they only
> cause the repeated error records.

But the problem with shared banks is that multiple CPUs read them, and
we can't tell if the errors are duplicate unless we make sure that
only one CPU gets to read and clear it at a time. Any cpu-local
synchronization mechanism isn't going to work.

If you're worried about disabling interrupts, it's possible we don't
really need to make the spinlocks irqsafe. I'm not sure if we had any
reason for that other than "just to be safe".

Or we could keep the (irqsafe) spinlocks but move the clearing much
earlier. There may have been a reason why the current code clears the
bank status last though -- perhaps we also need to read out all the
state while we hold the lock, before we clear the status bit.

Havard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ