lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 Mar 2020 00:33:18 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     vkuznets <vkuznets@...hat.com>,
        "ltykernel@...il.com" <ltykernel@...il.com>
CC:     Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <liuwe@...rosoft.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>, "hpa@...or.com" <hpa@...or.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH 0/4] x86/Hyper-V: Unload vmbus channel in hv panic
 callback

From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Wednesday, March 18, 2020 8:58 AM
> 
> ltykernel@...il.com writes:
> 
> > From: Tianyu Lan <Tianyu.Lan@...rosoft.com>
> >
> > Customer reported Hyper-V VM still responded network traffic
> > ack packets after kernel panic with kernel parameter "panic=0”.
> > This becauses vmbus driver interrupt handler still works
> > on the panic cpu after kernel panic. Panic cpu falls into
> > infinite loop of panic() with interrupt enabled at that point.
> > Vmbus driver can still handle network traffic.
> >
> > This confuses remote service that the panic system is still
> > alive when it gets ack packets. Unload vmbus channel in hv panic
> > callback and fix it.
> >
> > vmbus_initiate_unload() maybe double called during panic process
> > (e.g, hyperv_panic_event() and hv_crash_handler()). So check
> > and set connection state in vmbus_initiate_unload() to resolve
> > reenter issue.

Let me suggest a revised version of the commit message:

When kdump is not configured, a Hyper-V VM might still respond to
network traffic after a kernel panic when kernel parameter panic=0.
The panic CPU goes into an infinite loop with interrupts enabled,
and the VMbus driver interrupt handler still works because the 
VMbus connection is unloaded only in the kdump path.  The network
responses make the other end of the connection think the VM is
still functional even though it has panic'ed, which could affect any
failover actions that should be taken.

Fix this by unloading the VMbus connection during the panic process.
vmbus_initiate_unload() could then be called twice (e.g., by
hyperv_panic_event() and hv_crash_handler(), so reset the connection
state in vmbus_initiate_unload() to ensure the unload is done only
once.

> >
> > Signed-off-by: Tianyu Lan <Tianyu.Lan@...rosoft.com>
> > ---
> >  drivers/hv/channel_mgmt.c |  5 +++++
> >  drivers/hv/vmbus_drv.c    | 17 +++++++++--------
> >  2 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 0370364169c4..893493f2b420 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -839,6 +839,9 @@ void vmbus_initiate_unload(bool crash)
> >  {
> >  	struct vmbus_channel_message_header hdr;
> >
> > +	if (vmbus_connection.conn_state == DISCONNECTED)
> > +		return;
> > +
> 
> To make this less racy, can we do something like
> 
> 	if (xchg(&vmbus_connection.conn_state, DISCONNECTED) == DISCONNECTED)
> 		return;
> 
> ?

I was trying to decide if there can actually be a race.  The panic() and die()
functions both ensure that only a single CPU can execute in those paths at any
one time, though maybe panic() and die() could be running concurrently.
And vmbus_initiate_unload() can also be called in the hibernation path in
vmbus_bus_suspend(), so there could be a race.  Doing the xchg() makes
sense.

> 
> >  	/* Pre-Win2012R2 hosts don't support reconnect */
> >  	if (vmbus_proto_version < VERSION_WIN8_1)
> >  		return;
> > @@ -857,6 +860,8 @@ void vmbus_initiate_unload(bool crash)
> >  		wait_for_completion(&vmbus_connection.unload_event);
> >  	else
> >  		vmbus_wait_for_unload();
> > +
> > +	vmbus_connection.conn_state = DISCONNECTED;
> >  }
> >
> >  static void check_ready_for_resume_event(void)
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index 029378c27421..b56b9fb9bd90 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -53,9 +53,12 @@ static int hyperv_panic_event(struct notifier_block *nb, unsigned
> long val,
> >  {
> >  	struct pt_regs *regs;
> >
> > -	regs = current_pt_regs();
> > +	vmbus_initiate_unload(true);
> >
> > -	hyperv_report_panic(regs, val);
> > +	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
> 
> With Michael's effors to make code in drivers/hv arch agnostic, I think
> we need a better, arch-neutral way.

Vitaly -- could you elaborate on what part is not arch-neutral?  I don't see
a problem.  ms_hyperv and the misc_features field exist for both the x86
and ARM64 code branches.  It turns out the particular bit for
GUEST_CRASH_MSR_AVAILABLE is different on the two architectures, but
the compiler will do the right thing.

> 
> > +		regs = current_pt_regs();
> > +		hyperv_report_panic(regs, val);
> > +	}
> >  	return NOTIFY_DONE;
> >  }
> >
> > @@ -1391,10 +1394,12 @@ static int vmbus_bus_init(void)
> >  		}
> >
> >  		register_die_notifier(&hyperv_die_block);
> > -		atomic_notifier_chain_register(&panic_notifier_list,
> > -					       &hyperv_panic_block);
> >  	}
> >
> > +	/* Vmbus channel is unloaded in panic callback when panic happens.*/
> > +	atomic_notifier_chain_register(&panic_notifier_list,
> > +			       &hyperv_panic_block);
> > +
> >  	vmbus_request_offers();
> >
> >  	return 0;
> > @@ -2204,8 +2209,6 @@ static int vmbus_bus_suspend(struct device *dev)
> >
> >  	vmbus_initiate_unload(false);
> >
> > -	vmbus_connection.conn_state = DISCONNECTED;
> > -
> >  	/* Reset the event for the next resume. */
> >  	reinit_completion(&vmbus_connection.ready_for_resume_event);
> >
> > @@ -2289,7 +2292,6 @@ static void hv_kexec_handler(void)
> >  {
> >  	hv_stimer_global_cleanup();
> >  	vmbus_initiate_unload(false);
> > -	vmbus_connection.conn_state = DISCONNECTED;
> >  	/* Make sure conn_state is set as hv_synic_cleanup checks for it */
> >  	mb();
> >  	cpuhp_remove_state(hyperv_cpuhp_online);
> > @@ -2306,7 +2308,6 @@ static void hv_crash_handler(struct pt_regs *regs)
> >  	 * doing the cleanup for current CPU only. This should be sufficient
> >  	 * for kdump.
> >  	 */
> > -	vmbus_connection.conn_state = DISCONNECTED;
> >  	cpu = smp_processor_id();
> >  	hv_stimer_cleanup(cpu);
> >  	hv_synic_disable_regs(cpu);
> 
> --
> Vitaly

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ