lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <b1e53f39-5e0b-a09d-2954-cdc9e8592b67@arm.com> Date: Mon, 22 May 2023 17:26:03 +0100 From: Robin Murphy <robin.murphy@....com> To: Niklas Schnelle <schnelle@...ux.ibm.com>, Joerg Roedel <joro@...tes.org>, Matthew Rosato <mjrosato@...ux.ibm.com>, Will Deacon <will@...nel.org>, Wenjia Zhang <wenjia@...ux.ibm.com>, Jason Gunthorpe <jgg@...pe.ca> Cc: Gerd Bayer <gbayer@...ux.ibm.com>, Julian Ruess <julianr@...ux.ibm.com>, Pierre Morel <pmorel@...ux.ibm.com>, Alexandra Winter <wintera@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>, Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>, Hector Martin <marcan@...can.st>, Sven Peter <sven@...npeter.dev>, Alyssa Rosenzweig <alyssa@...enzweig.io>, David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>, Andy Gross <agross@...nel.org>, Bjorn Andersson <andersson@...nel.org>, Konrad Dybcio <konrad.dybcio@...aro.org>, Yong Wu <yong.wu@...iatek.com>, Matthias Brugger <matthias.bgg@...il.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, Gerald Schaefer <gerald.schaefer@...ux.ibm.com>, Orson Zhai <orsonzhai@...il.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, Chunyan Zhang <zhang.lyra@...il.com>, Chen-Yu Tsai <wens@...e.org>, Jernej Skrabec <jernej.skrabec@...il.com>, Samuel Holland <samuel@...lland.org>, Thierry Reding <thierry.reding@...il.com>, Krishna Reddy <vdumpa@...dia.com>, Jonathan Hunter <jonathanh@...dia.com>, Jonathan Corbet <corbet@....net>, linux-s390@...r.kernel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, iommu@...ts.linux.dev, asahi@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org, linux-arm-msm@...r.kernel.org, linux-mediatek@...ts.infradead.org, linux-sunxi@...ts.linux.dev, linux-tegra@...r.kernel.org, linux-doc@...r.kernel.org Subject: Re: [PATCH v9 5/6] iommu/dma: Allow a single FQ in addition to per-CPU FQs On 2023-05-15 10:15, Niklas Schnelle wrote: > In some virtualized environments, including s390 paged memory guests, > IOTLB flushes are used to update IOMMU shadow tables. Due to this, they > are much more expensive than in typical bare metal environments or > non-paged s390 guests. In addition they may parallelize more poorly in > virtualized environments. This changes the trade off for flushing IOVAs > such that minimizing the number of IOTLB flushes trumps any benefit of > cheaper queuing operations or increased paralellism. > > In this scenario per-CPU flush queues pose several problems. Firstly > per-CPU memory is often quite limited prohibiting larger queues. > Secondly collecting IOVAs per-CPU but flushing via a global timeout > reduces the number of IOVAs flushed for each timeout especially on s390 > where PCI interrupts may not be bound to a specific CPU. > > Let's introduce a single flush queue mode that reuses the same queue > logic but only allocates a single global queue. This mode can be > selected as a flag bit in a new dma_iommu_options struct which can be > modified from its defaults by IOMMU drivers implementing a new > ops.tune_dma_iommu() callback. As a first user the s390 IOMMU driver > selects the single queue mode if IOTLB flushes are needed on map which > indicates shadow table use. With the unchanged small FQ size and > timeouts this setting is worse than per-CPU queues but a follow up patch > will make the FQ size and timeout variable. Together this allows the > common IOVA flushing code to more closely resemble the global flush > behavior used on s390's previous internal DMA API implementation. > > Link: https://lore.kernel.org/linux-iommu/3e402947-61f9-b7e8-1414-fde006257b6f@arm.com/ > Reviewed-by: Matthew Rosato <mjrosato@...ux.ibm.com> #s390 > Signed-off-by: Niklas Schnelle <schnelle@...ux.ibm.com> > --- > drivers/iommu/dma-iommu.c | 163 ++++++++++++++++++++++++++++++++++----------- > drivers/iommu/dma-iommu.h | 4 +- > drivers/iommu/iommu.c | 18 +++-- > drivers/iommu/s390-iommu.c | 10 +++ > include/linux/iommu.h | 21 ++++++ > 5 files changed, 169 insertions(+), 47 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7a9f0b0bddbd..be4cab6b4fe4 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -49,8 +49,11 @@ struct iommu_dma_cookie { > /* Full allocator for IOMMU_DMA_IOVA_COOKIE */ > struct { > struct iova_domain iovad; > - > - struct iova_fq __percpu *fq; /* Flush queue */ > + /* Flush queue */ > + union { > + struct iova_fq *single_fq; > + struct iova_fq __percpu *percpu_fq; > + }; > /* Number of TLB flushes that have been started */ > atomic64_t fq_flush_start_cnt; > /* Number of TLB flushes that have been finished */ > @@ -67,6 +70,8 @@ struct iommu_dma_cookie { > > /* Domain for flush queue callback; NULL if flush queue not in use */ > struct iommu_domain *fq_domain; > + /* Options for dma-iommu use */ > + struct dma_iommu_options options; > struct mutex mutex; > }; > > @@ -152,25 +157,44 @@ static void fq_flush_iotlb(struct iommu_dma_cookie *cookie) > atomic64_inc(&cookie->fq_flush_finish_cnt); > } > > -static void fq_flush_timeout(struct timer_list *t) > +static void fq_flush_percpu(struct iommu_dma_cookie *cookie) > { > - struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer); > int cpu; > > - atomic_set(&cookie->fq_timer_on, 0); > - fq_flush_iotlb(cookie); > - > for_each_possible_cpu(cpu) { > unsigned long flags; > struct iova_fq *fq; > > - fq = per_cpu_ptr(cookie->fq, cpu); > + fq = per_cpu_ptr(cookie->percpu_fq, cpu); > spin_lock_irqsave(&fq->lock, flags); > fq_ring_free(cookie, fq); > spin_unlock_irqrestore(&fq->lock, flags); > } > } > > +static void fq_flush_single(struct iommu_dma_cookie *cookie) > +{ > + struct iova_fq *fq = cookie->single_fq; > + unsigned long flags; > + > + spin_lock_irqsave(&fq->lock, flags); > + fq_ring_free(cookie, fq); > + spin_unlock_irqrestore(&fq->lock, flags) Nit: this should clearly just be a self-locked version of fq_ring_free() that takes fq as an argument, then both the new case and the existing loop body become trivial one-line calls. > +} > + > +static void fq_flush_timeout(struct timer_list *t) > +{ > + struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer); > + > + atomic_set(&cookie->fq_timer_on, 0); > + fq_flush_iotlb(cookie); > + > + if (cookie->options.flags & IOMMU_DMA_OPTS_SINGLE_QUEUE) > + fq_flush_single(cookie); > + else > + fq_flush_percpu(cookie); > +} > + > static void queue_iova(struct iommu_dma_cookie *cookie, > unsigned long pfn, unsigned long pages, > struct list_head *freelist) > @@ -188,7 +212,11 @@ static void queue_iova(struct iommu_dma_cookie *cookie, > */ > smp_mb(); > > - fq = raw_cpu_ptr(cookie->fq); > + if (cookie->options.flags & IOMMU_DMA_OPTS_SINGLE_QUEUE) > + fq = cookie->single_fq; > + else > + fq = raw_cpu_ptr(cookie->percpu_fq); > + > spin_lock_irqsave(&fq->lock, flags); > > /* > @@ -219,58 +247,114 @@ static void queue_iova(struct iommu_dma_cookie *cookie, > jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT)); > } > > -static void iommu_dma_free_fq(struct iommu_dma_cookie *cookie) > +static void iommu_dma_free_fq_single(struct iova_fq *fq) > +{ > + int idx; > + > + if (!fq) > + return; > + fq_ring_for_each(idx, fq) > + put_pages_list(&fq->entries[idx].freelist); > + vfree(fq); > +} > + > +static void iommu_dma_free_fq_percpu(struct iova_fq __percpu *percpu_fq) > { > int cpu, idx; > > - if (!cookie->fq) > - return; > - > - del_timer_sync(&cookie->fq_timer); > /* The IOVAs will be torn down separately, so just free our queued pages */ > for_each_possible_cpu(cpu) { > - struct iova_fq *fq = per_cpu_ptr(cookie->fq, cpu); > + struct iova_fq *fq = per_cpu_ptr(percpu_fq, cpu); > > fq_ring_for_each(idx, fq) > put_pages_list(&fq->entries[idx].freelist); > } > > - free_percpu(cookie->fq); > + free_percpu(percpu_fq); > +} > + > +static void iommu_dma_free_fq(struct iommu_dma_cookie *cookie) > +{ > + if (!cookie->fq_domain) > + return; > + > + del_timer_sync(&cookie->fq_timer); > + if (cookie->options.flags & IOMMU_DMA_OPTS_SINGLE_QUEUE) > + iommu_dma_free_fq_single(cookie->single_fq); > + else > + iommu_dma_free_fq_percpu(cookie->percpu_fq); > +} > + > + > +static void iommu_dma_init_one_fq(struct iova_fq *fq) > +{ > + int i; > + > + fq->head = 0; > + fq->tail = 0; > + > + spin_lock_init(&fq->lock); > + > + for (i = 0; i < IOVA_FQ_SIZE; i++) > + INIT_LIST_HEAD(&fq->entries[i].freelist); > +} > + > +static int iommu_dma_init_fq_single(struct iommu_dma_cookie *cookie) > +{ > + struct iova_fq *queue; > + > + queue = vzalloc(sizeof(*queue)); > + if (!queue) > + return -ENOMEM; > + iommu_dma_init_one_fq(queue); > + cookie->single_fq = queue; > + > + return 0; > +} > + > +static int iommu_dma_init_fq_percpu(struct iommu_dma_cookie *cookie) > +{ > + struct iova_fq __percpu *queue; > + int cpu; > + > + queue = alloc_percpu(struct iova_fq); > + if (!queue) > + return -ENOMEM; > + > + for_each_possible_cpu(cpu) > + iommu_dma_init_one_fq(per_cpu_ptr(queue, cpu)); > + cookie->percpu_fq = queue; > + return 0; > } > > /* sysfs updates are serialised by the mutex of the group owning @domain */ > -int iommu_dma_init_fq(struct iommu_domain *domain) > +int iommu_dma_init_fq(struct device *dev, struct iommu_domain *domain) > { > struct iommu_dma_cookie *cookie = domain->iova_cookie; > - struct iova_fq __percpu *queue; > - int i, cpu; > + const struct iommu_ops *ops = dev_iommu_ops(dev); > + int rc; > > if (cookie->fq_domain) > return 0; > > + if (ops->tune_dma_iommu) > + ops->tune_dma_iommu(dev, &cookie->options); > + > atomic64_set(&cookie->fq_flush_start_cnt, 0); > atomic64_set(&cookie->fq_flush_finish_cnt, 0); > > - queue = alloc_percpu(struct iova_fq); > - if (!queue) { > + if (cookie->options.flags & IOMMU_DMA_OPTS_SINGLE_QUEUE) > + rc = iommu_dma_init_fq_single(cookie); > + else > + rc = iommu_dma_init_fq_percpu(cookie); > + > + if (rc) { > pr_warn("iova flush queue initialization failed\n"); > - return -ENOMEM; > + /* fall back to strict mode */ > + domain->type = IOMMU_DOMAIN_DMA; Why move this? It doesn't logically belong to FQ initialisation itself. > + return rc; > } > > - for_each_possible_cpu(cpu) { > - struct iova_fq *fq = per_cpu_ptr(queue, cpu); > - > - fq->head = 0; > - fq->tail = 0; > - > - spin_lock_init(&fq->lock); > - > - for (i = 0; i < IOVA_FQ_SIZE; i++) > - INIT_LIST_HEAD(&fq->entries[i].freelist); > - } > - > - cookie->fq = queue; > - > timer_setup(&cookie->fq_timer, fq_flush_timeout, 0); > atomic_set(&cookie->fq_timer_on, 0); > /* > @@ -297,6 +381,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type) > if (cookie) { > INIT_LIST_HEAD(&cookie->msi_page_list); > cookie->type = type; > + cookie->options.flags = IOMMU_DMA_OPTS_PER_CPU_QUEUE; > } > return cookie; > } > @@ -585,9 +670,9 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, > if (ret) > goto done_unlock; > > - /* If the FQ fails we can simply fall back to strict mode */ > - if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain)) > - domain->type = IOMMU_DOMAIN_DMA; > + /* If the FQ fails we fall back to strict mode */ > + if (domain->type == IOMMU_DOMAIN_DMA_FQ) > + iommu_dma_init_fq(dev, domain); > > ret = iova_reserve_iommu_regions(dev, domain); > > diff --git a/drivers/iommu/dma-iommu.h b/drivers/iommu/dma-iommu.h > index 942790009292..4f727ab56d3c 100644 > --- a/drivers/iommu/dma-iommu.h > +++ b/drivers/iommu/dma-iommu.h > @@ -12,7 +12,7 @@ > int iommu_get_dma_cookie(struct iommu_domain *domain); > void iommu_put_dma_cookie(struct iommu_domain *domain); > > -int iommu_dma_init_fq(struct iommu_domain *domain); > +int iommu_dma_init_fq(struct device *dev, struct iommu_domain *domain); > > void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); > > @@ -20,7 +20,7 @@ extern bool iommu_dma_forcedac; > > #else /* CONFIG_IOMMU_DMA */ > > -static inline int iommu_dma_init_fq(struct iommu_domain *domain) > +static inline int iommu_dma_init_fq(struct device *dev, struct iommu_domain *domain) > { > return -EINVAL; > } > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 51f816367205..e2334ca480dd 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2967,10 +2967,19 @@ static ssize_t iommu_group_store_type(struct iommu_group *group, > return -EINVAL; > > mutex_lock(&group->mutex); > + /* Ensure that device exists. */ > + if (list_empty(&group->devices)) { > + mutex_unlock(&group->mutex); > + return -EPERM; > + } > + > + grp_dev = list_first_entry(&group->devices, struct group_device, list); > + dev = grp_dev->dev; > + > /* We can bring up a flush queue without tearing down the domain. */ > if (req_type == IOMMU_DOMAIN_DMA_FQ && > group->default_domain->type == IOMMU_DOMAIN_DMA) { > - ret = iommu_dma_init_fq(group->default_domain); > + ret = iommu_dma_init_fq(dev, group->default_domain); > if (!ret) > group->default_domain->type = IOMMU_DOMAIN_DMA_FQ; > mutex_unlock(&group->mutex); > @@ -2978,15 +2987,12 @@ static ssize_t iommu_group_store_type(struct iommu_group *group, > return ret ?: count; > } > > - /* Otherwise, ensure that device exists and no driver is bound. */ > - if (list_empty(&group->devices) || group->owner_cnt) { > + /* Otherwise, ensure that no driver is bound. */ > + if (group->owner_cnt) { > mutex_unlock(&group->mutex); > return -EPERM; > } > > - grp_dev = list_first_entry(&group->devices, struct group_device, list); > - dev = grp_dev->dev; > - > ret = iommu_change_dev_def_domain(group, dev, req_type); > > /* > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c > index 161b0be5aba6..65dd469ad524 100644 > --- a/drivers/iommu/s390-iommu.c > +++ b/drivers/iommu/s390-iommu.c > @@ -451,6 +451,15 @@ static void s390_iommu_get_resv_regions(struct device *dev, > } > } > > +static void s390_iommu_tune_dma_iommu(struct device *dev, > + struct dma_iommu_options *options) > +{ > + struct zpci_dev *zdev = to_zpci_dev(dev); > + > + if (zdev->tlb_refresh) > + options->flags |= IOMMU_DMA_OPTS_SINGLE_QUEUE; > +} > + > static struct iommu_device *s390_iommu_probe_device(struct device *dev) > { > struct zpci_dev *zdev; > @@ -793,6 +802,7 @@ static const struct iommu_ops s390_iommu_ops = { > .device_group = generic_device_group, > .pgsize_bitmap = SZ_4K, > .get_resv_regions = s390_iommu_get_resv_regions, > + .tune_dma_iommu = s390_iommu_tune_dma_iommu, > .default_domain_ops = &(const struct iommu_domain_ops) { > .attach_dev = s390_iommu_attach_device, > .map_pages = s390_iommu_map_pages, > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 58891eddc2c4..3649a17256a5 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -219,6 +219,21 @@ struct iommu_iotlb_gather { > bool queued; > }; > > +/** > + * struct dma_iommu_options - Options for dma-iommu > + * > + * @flags: Flag bits for enabling/disabling dma-iommu settings > + * > + * This structure is intended to provide IOMMU drivers a way to influence the > + * behavior of the dma-iommu DMA API implementation. This allows optimizing for > + * example for a virtualized environment with slow IOTLB flushes. > + */ > +struct dma_iommu_options { > +#define IOMMU_DMA_OPTS_PER_CPU_QUEUE (0L << 0) > +#define IOMMU_DMA_OPTS_SINGLE_QUEUE (1L << 0) > + u64 flags; > +}; I think for now this can just use a bit in dev_iommu to indicate that the device will prefer a global flush queue; s390 can set that in .probe_device, then iommu_dma_init_domain() can propagate it to an equivalent flag in the cookie (possibly even a new cookie type?) that iommu_dma_init_fq() can then consume. Then just make the s390 parameters from patch #6 the standard parameters for a global queue. Thanks, Robin. > + > /** > * struct iommu_ops - iommu ops and capabilities > * @capable: check capability > @@ -242,6 +257,9 @@ struct iommu_iotlb_gather { > * - IOMMU_DOMAIN_IDENTITY: must use an identity domain > * - IOMMU_DOMAIN_DMA: must use a dma domain > * - 0: use the default setting > + * @tune_dma_iommu: Allows the IOMMU driver to modify the default > + * options of the dma-iommu layer for a specific > + * device. > * @default_domain_ops: the default ops for domains > * @remove_dev_pasid: Remove any translation configurations of a specific > * pasid, so that any DMA transactions with this pasid > @@ -278,6 +296,9 @@ struct iommu_ops { > int (*def_domain_type)(struct device *dev); > void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid); > > + void (*tune_dma_iommu)(struct device *dev, > + struct dma_iommu_options *options); > + > const struct iommu_domain_ops *default_domain_ops; > unsigned long pgsize_bitmap; > struct module *owner; >
Powered by blists - more mailing lists