lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Dec 2017 19:02:24 +0000
From:   Jean-Philippe Brucker <jean-philippe.brucker@....com>
To:     Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joerg Roedel <joro@...tes.org>,
        David Woodhouse <dwmw2@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Rafael Wysocki <rafael.j.wysocki@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>
Cc:     Lan Tianyu <tianyu.lan@...el.com>,
        "Liu, Yi L" <yi.l.liu@...ux.intel.com>
Subject: Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

A quick update on invalidations before I leave for holidays, since we're
struggling to define useful semantics. I worked on the virtio-iommu
prototype for vSVA, so I tried to pin down what I think is needed for vSVA
invalidation in the host. I don't know whether the VT-d and AMD emulations
can translate all of this from guest commands.

Scope selects which entries are invalidated, and flags cherry-pick what
caches to invalidate. For example a guest might remove GBs of sparse
mappings, and decide that it would be quicker to invalidate the whole
context instead of one at a time. Then it would set only flags = (TLB |
DEV_TLB) with scope = PASID. If the guest clears one entry in the PASID
table, then it would send scope = PASID and flags = (LEAF | CONFIG | TLB |
DEV_TLB). On an ARM system the guest can invalidate TLBs with CPU
instructions, but can't invalidate ATCs. So it would send an invalidate
with flags = (LEAF | TLB) and scope = VA.

enum iommu_sva_inval_scope {
	IOMMU_INVALIDATE_DOMAIN	= 1,
	IOMMU_INVALIDATE_PASID,
	IOMMU_INVALIDATE_VA,
};

/* Only invalidate leaf entry. Applies to PASID table if scope == PASID or
 * page tables if scope == VA. */
#define IOMMU_INVALIDATE_LEAF		(1 << 0)
/* Invalidate cached PASID table configuration */
#define IOMMU_INVALIDATE_CONFIG		(1 << 1)
/* Invalidate IOTLBs */
#define IOMMU_INVALIDATE_TLB		(1 << 2)
/* Invalidate ATCs */
#define IOMMU_INVALIDATE_DEV_TLB	(1 << 3)
/* + Need a global flag? */

struct iommu_sva_invalidate {
	enum iommu_sva_inval_scope	scope;
	u32				flags;
	u32				pasid;
	u64				iova;
	u64				size;
	/* Arch-specific, format is determined at bind time */
	union {
		struct {
			u16		asid;
			u8		granule;
		} arm;
	}
};

ARM needs two more fields. A 16-bit @asid (Address Space ID) targets TLB
entries and may be different from the PASID (up to the guest to decide),
which targets ATC and config entries.

@granule is the TLB granule that we're invalidating. For instance if the
guest just unmapped a few 2M huge pages, it sets @granule to 21 bits, so
we issue less invalidation commands, since we only need to evict huge TLB
entries. I'm not sure about other architecture but I'd be surprised if
this wasn't more common. Should we move it to the common part?


int iommu_sva_invalidate(struct iommu_domain *domain,
			 struct iommu_sva_invalidate *inval);

And so the host driver implementation is roughly:
--------------------------------------------------------------------------
bool leaf	= flags & IOMMU_INVALIDATE_LEAF;
bool config	= flags & IOMMU_INVALIDATE_CONFIG;
bool tlb	= flags & IOMMU_INVALIDATE_TLB;
bool atc	= flags & IOMMU_INVALIDATE_DEV_TLB;

if (config) {
	switch (scope) {
	case IOMMU_INVALIDATE_PASID:
		inval_cached_pasid_entry(domain, pasid, leaf);
		break;
	case IOMMU_INVALIDATE_DOMAIN:
		inval_all_cached_pasid_entries(domain);
		break;
	default:
		return -EINVAL;
	}

	/* Wait for caches to be clean, then invalidate TLBs */
	sync_commands();
}

if (tlb) {
	switch (scope) {
	case IOMMU_INVALIDATE_VA:
		inval_tlb_entries(domain, asid, iova, size, granule,
				  leaf);
		break;
	case IOMMU_INVALIDATE_PASID:
		inval_all_tlb_entries_for_asid(domain, asid);
		break;
	case IOMMU_INVALIDATE_DOMAIN:
		inval_all_tlb_entries(domain);
		break;
	default:
		return -EINVAL;
	}

	/* Wait for TLBs to be clean, then invalidate ATCs. */
	sync_commands();
}

if (atc) {
	/* ATC invalidations are sent to all devices in the domain */
	switch (scope) {
	case IOMMU_INVALIDATE_VA:
		inval_atc_entries(domain, pasid, iova, size);
		break;
	case IOMMU_INVALIDATE_PASID:
		/* Covers the full address space */
		inval_all_atc_entries_for_pasid(domain, pasid);
		break;
	case IOMMU_INVALIDATE_DOMAIN:
		/* Set Global Invalidate */
		inval_all_atc_entries(domain);
		break;
	default:
		return -EINVAL;
	}

	sync_commands();
}

/* Then return to guest. */
--------------------------------------------------------------------------

I think this covers what we need and allows userspace or the guest to
gather multiple invalidations into a single request/ioctl.

I don't think per-device ATC invalidation is needed, but might be wrong.
According to ATS it is implicit when the guest resets the device (FLR) or
disables the ATS capability. Are there other use-cases than reset? I still
need to see how QEMU handles when a device is detached from a domain (e.g.
its device table entry set to invalid). Kvmtool has one VFIO container per
device so can simply unmap-all to clear caches and TLBs when this happens.

Hope this helps,
Jean

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ