linux-kernel - Re: [PATCH v5 13/15] KVM: s390: add function process_gib_alert

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7e4a5077-00f0-3a0f-e21a-5bbc2fa14b70@linux.ibm.com>
Date:   Tue, 8 Jan 2019 16:21:17 +0100
From:   Michael Mueller <mimu@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     KVM Mailing List <kvm@...r.kernel.org>,
        Linux-S390 Mailing List <linux-s390@...r.kernel.org>,
        linux-kernel@...r.kernel.org,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>,
        Pierre Morel <pmorel@...ux.ibm.com>
Subject: Re: [PATCH v5 13/15] KVM: s390: add function process_gib_alert_list()



On 08.01.19 13:59, Halil Pasic wrote:
> On Wed, 19 Dec 2018 20:17:54 +0100
> Michael Mueller <mimu@...ux.ibm.com> wrote:
> 
>> This function processes the Gib Alert List (GAL). It is required
>> to run when either a gib alert interruption has been received or
>> a gisa that is in the alert list is cleared or dropped.
>>
>> The GAL is build up by millicode, when the respective ISC bit is
>> set in the Interruption Alert Mask (IAM) and an interruption of
>> that class is observed.
>>
>> Signed-off-by: Michael Mueller <mimu@...ux.ibm.com>
>> ---
>>   arch/s390/kvm/interrupt.c | 140 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 140 insertions(+)
>>
>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
>> index 48a93f5e5333..03e7ba4f215a 100644
>> --- a/arch/s390/kvm/interrupt.c
>> +++ b/arch/s390/kvm/interrupt.c
>> @@ -2941,6 +2941,146 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len)
>>   	return n;
>>   }
>>   
>> +static int __try_airqs_kick(struct kvm *kvm, u8 ipm)
>> +{
>> +	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
>> +	struct kvm_vcpu *vcpu = NULL, *kick_vcpu[MAX_ISC + 1];
>> +	int online_vcpus = atomic_read(&kvm->online_vcpus);
>> +	u8 ioint_mask, isc_mask, kick_mask = 0x00;
>> +	int vcpu_id, kicked = 0;
>> +
>> +	/* Loop over vcpus in WAIT state. */
>> +	for (vcpu_id = find_first_bit(fi->idle_mask, online_vcpus);
>> +	     /* Until all pending ISCs have a vcpu open for airqs. */
>> +	     (~kick_mask & ipm) && vcpu_id < online_vcpus;
>> +	     vcpu_id = find_next_bit(fi->idle_mask, online_vcpus, vcpu_id)) {
>> +		vcpu = kvm_get_vcpu(kvm, vcpu_id);
>> +		if (psw_ioint_disabled(vcpu))
>> +			continue;
>> +		ioint_mask = (u8)(vcpu->arch.sie_block->gcr[6] >> 24);
>> +		for (isc_mask = 0x80; isc_mask; isc_mask >>= 1) {
>> +			/* ISC pending in IPM ? */
>> +			if (!(ipm & isc_mask))
>> +				continue;
>> +			/* vcpu for this ISC already found ? */
>> +			if (kick_mask & isc_mask)
>> +				continue;
>> +			/* vcpu open for airq of this ISC ? */
>> +			if (!(ioint_mask & isc_mask))
>> +				continue;
>> +			/* use this vcpu (for all ISCs in ioint_mask) */
>> +			kick_mask |= ioint_mask;
>> +			kick_vcpu[kicked++] = vcpu;
> 
> Assuming that the vcpu can/will take all ISCs it's currently open for
> does not seem right. We kind of rely on this assumption here, or?

My latest version of this routine already follows a different strategy.
It looks for a horizontal distribution of pending ISCs among idle vcpus.

> 
>> +		}
>> +	}
>> +
>> +	if (vcpu && ~kick_mask & ipm)
>> +		VM_EVENT(kvm, 4, "gib alert undeliverable isc mask
>> 0x%02x",
>> +			 ~kick_mask & ipm);
>> +
>> +	for (vcpu_id = 0; vcpu_id < kicked; vcpu_id++)
>> +		kvm_s390_vcpu_wakeup(kick_vcpu[vcpu_id]);
>> +
>> +	return (online_vcpus != 0) ? kicked : -ENODEV;
>> +}
>> +
>> +static void __floating_airqs_kick(struct kvm *kvm)
>> +{
>> +	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
>> +	int online_vcpus, kicked;
>> +	u8 ipm_t0, ipm;
>> +
>> +	/* Get IPM and return if clean, IAM has been restored. */
>> +	ipm = get_ipm(kvm->arch.gisa, IRQ_FLAG_IAM);
>> +	if (!ipm)
>> +		return;
>> +retry:
>> +	ipm_t0 = ipm;
>> +
>> +	/* Try to kick some vcpus in WAIT state. */
>> +	kicked = __try_airqs_kick(kvm, ipm);
>> +	if (kicked < 0)
>> +		return;
>> +
>> +	/* Get IPM and return if clean, IAM has been restored. */
>> +	ipm = get_ipm(kvm->arch.gisa, IRQ_FLAG_IAM);
>> +	if (!ipm)
>> +		return;
>> +
>> +	/* Start over, if new ISC bits are pending in IPM. */
>> +	if ((ipm_t0 ^ ipm) & ~ipm_t0)
>> +		goto retry;
>> +
> 
> <MARK A>
> 
>> +	/*
>> +	 * Return as we just kicked at least one vcpu in WAIT state
>> +	 * open for airqs. The IAM will be restored latest when one
>> +	 * of them goes into WAIT or STOP state.
>> +	 */
>> +	if (kicked > 0)
>> +		return;
> 
> </MARK A>
> 
>> +
>> +	/*
>> +	 * No vcpu was kicked either because no vcpu was in WAIT state
>> +	 * or none of the vcpus in WAIT state are open for airqs.
>> +	 * Return immediately if no vcpus are in WAIT state.
>> +	 * There are vcpus in RUN state. They will process the airqs
>> +	 * if not closed for airqs as well. In that case the system will
>> +	 * delay airqs until a vcpu decides to take airqs again.
>> +	 */
>> +	online_vcpus = atomic_read(&kvm->online_vcpus);
>> +	if (!bitmap_weight(fi->idle_mask, online_vcpus))
>> +		return;
>> +
>> +	/*
>> +	 * None of the vcpus in WAIT state take airqs and we might
>> +	 * have no running vcpus as at least one vcpu is in WAIT state
>> +	 * and IPM is dirty.
>> +	 */
>> +	set_iam(kvm->arch.gisa, kvm->arch.iam);
>> +}
>> +
>> +#define NULL_GISA_ADDR 0x00000000UL
>> +#define NONE_GISA_ADDR 0x00000001UL
>> +#define GISA_ADDR_MASK 0xfffff000UL
>> +
>> +static void __maybe_unused process_gib_alert_list(void)
>> +{
>> +	u32 final, next_alert, origin = 0UL;
>> +	struct kvm_s390_gisa *gisa;
>> +	struct kvm *kvm;
>> +
>> +	do {
>> +		/*
>> +		 * If the NONE_GISA_ADDR is still stored in the alert list
>> +		 * origin, we will leave the outer loop. No further GISA has
>> +		 * been added to the alert list by millicode while processing
>> +		 * the current alert list.
>> +		 */
>> +		final = (origin & NONE_GISA_ADDR);
>> +		/*
>> +		 * Cut off the alert list and store the NONE_GISA_ADDR in the
>> +		 * alert list origin to avoid further GAL interruptions.
>> +		 * A new alert list can be build up by millicode in parallel
>> +		 * for guests not in the yet cut-off alert list. When in the
>> +		 * final loop, store the NULL_GISA_ADDR instead. This will re-
>> +		 * enable GAL interruptions on the host again.
>> +		 */
>> +		origin = xchg(&gib->alert_list_origin,
>> +			      (!final) ? NONE_GISA_ADDR : NULL_GISA_ADDR);
>> +		/* Loop through the just cut-off alert list. */
>> +		while (origin & GISA_ADDR_MASK) {
>> +			gisa = (struct kvm_s390_gisa *)(u64)origin;
>> +			next_alert = gisa->next_alert;
>> +			/* Unlink the GISA from the alert list. */
>> +			gisa->next_alert = origin;
>> +			kvm = container_of(gisa, struct sie_page2, gisa)->kvm;
>> +			/* Kick suitable vcpus */
>> +			__floating_airqs_kick(kvm);
> 
> We may finish handling the alerted gisa with iam not set e.g.
> via some vcpus kicked but ipm still dirty and some vcpus still in wait,
> or?

My above mentioned change to the routine identifying the vcpus to kick
will select one vcpu for each ISC pending if possible (depends on the
number of idle vcpus and their respective interruption masks and the
pending ISCs).

That does not exclude the principle scenario that maybe only one vcpu
is kicked and multiple ISCs are pending (ipm still dirty) although
have never observed this with a Linux guest.

What I was trying to avoid was a kind of busy loop running in addition
to the kicked vcpus monitoring the IPM state for resource utilization
reasons.

> 
>  From the comments it seems we speculate on being in a safe state, as
> these are supposed to return to wait or stop soon-ish, and we will set
> iam then (See <MARK A>). I don't quite understand.


Yes, the next vcpu going idle shall restore the IAM or process the
top ISC pending if the iomask (GCR) allows. vcpus are not allowed to go 
in disabled wait (IO int disabled by PSW). If all vcpus always (for
some time) mask a specific ISC the guest does not want to get 
interrupted for that ISC but will as soon a running vcpu will open
the mask again.

> 
> According to my current understanding we might end up loosing initiative
> in this scenario. Or am I wrong?

I currently don't have proof for you being wrong but have not observed
the situation yet.

> 
> Regards,
> Halil
> 
>> +			origin = next_alert;
>> +		}
>> +	} while (!final);
>> +}
>> +
>>   static void nullify_gisa(struct kvm_s390_gisa *gisa)
>>   {
>>   	memset(gisa, 0, sizeof(struct kvm_s390_gisa));
> 

-