linux-kernel - Re: [PATCH v3] x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87h96md8rm.fsf@vitty.brq.redhat.com>
Date:   Fri, 02 Dec 2016 09:39:57 +0100
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     x86@...nel.org, devel@...uxdriverproject.org,
        linux-kernel@...r.kernel.org,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v3] x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic

Thomas Gleixner <tglx@...utronix.de> writes:

> Vitaly,
>
> On Thu, 1 Dec 2016, Vitaly Kuznetsov wrote:
>
>> There is a feature in Hyper-V (Debug-VM --InjectNonMaskableInterrupt) which
>> injects NMI to the guest. Prior to WS2016 the NMI is injected to all CPUs
>> of the guest and WS2016 injects it to CPU0 only. When unknown_nmi_panic is
>> enabled and we'd like to do kdump we need to perform some minimal cleanup
>> so the kdump kernel will be able to initialize VMBus devices, this cleanup
>> includes sending CHANNELMSG_UNLOAD to the host waiting for
>> CHANNELMSG_UNLOAD_RESPONSE to arrive. WS2012R2 always sends the response
>> to the CPU which was used to send CHANNELMSG_REQUESTOFFERS on VMBus module
>> load and not to the CPU which is sending CHANNELMSG_UNLOAD. As we can't do
>> any cross-CPU work reliably on crash we have vmbus_wait_for_unload()
>> function which tries to read CHANNELMSG_UNLOAD_RESPONSE on all CPUs message
>> pages and this sometimes works. It was discovered that in case the host
>> wants to send more than one message to a secondary CPU (not the CPU running
>> vmbus_wait_for_unload()) we're unable to get it as after reading the first
>> message we're supposed to do EOMing by doing wrmsrl(HV_X64_MSR_EOM, 0) but
>> this is per-CPU. I have a feeling that this was working some time ago when
>> I implemented vmbus_wait_for_unload(), the host was re-trying to deliver a
>> message even without wrmsrl() but apparently this doesn't work any more.
>> Unfortunately there is not that much we can do when all CPUs get NMI as
>> all but the first one are getting blocked with interrupts disabled. What we
>> can do is limit processing unknown interrupts to the first CPU which gets
>> it in case we're about to crash.
>
> This is completely unreadable and I really tried hard to make sense of it.
>
> Please structure it in a way that people who are not familiar with the
> inner workings of hyperv can at least understand the problem you are trying
> to solve and the concept of the solution w/o needing to figure out what all
> the acronyms and constants actually mean.
>
> Also visual structuring in paragraphs helps readability a lot.
>

Got it,

I'll try to do my best to make it readable.

> AFAICT this tries to deal with different problems of different hypervisor
> versions, but even that is unclear as you talk about version WS2016,
> versions prior to WS2016 and then about WS2012R2 in particular.
>
> Another issue I have with this is:
>
> 	".... I have a feeling that this was working ...."
>
> Changes like this are not about feelings. We want to have changes based on
> facts.
>

The thing is that Hyper-V is a (proprietary) software which gets updates
and I don't remember which particular updates were installed when I was
imlementing vmbus_wait_for_unload() but as far as I remember it was
always working on WS2012R2. Now I observe a different behavior ... 

-- 
  Vitaly