[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220719130114.2eecbba1.alex.williamson@redhat.com>
Date: Tue, 19 Jul 2022 13:01:14 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Yishai Hadas <yishaih@...dia.com>
Cc: <jgg@...dia.com>, <saeedm@...dia.com>, <kvm@...r.kernel.org>,
<netdev@...r.kernel.org>, <kuba@...nel.org>,
<kevin.tian@...el.com>, <joao.m.martins@...cle.com>,
<leonro@...dia.com>, <maorg@...dia.com>, <cohuck@...hat.com>
Subject: Re: [PATCH V2 vfio 05/11] vfio: Add an IOVA bitmap support
On Thu, 14 Jul 2022 11:12:45 +0300
Yishai Hadas <yishaih@...dia.com> wrote:
> From: Joao Martins <joao.m.martins@...cle.com>
>
> The new facility adds a bunch of wrappers that abstract how an IOVA
> range is represented in a bitmap that is granulated by a given
> page_size. So it translates all the lifting of dealing with user
> pointers into its corresponding kernel addresses backing said user
> memory into doing finally the bitmap ops to change various bits.
>
> The formula for the bitmap is:
>
> data[(iova / page_size) / 64] & (1ULL << (iova % 64))
>
> Where 64 is the number of bits in a unsigned long (depending on arch)
>
> An example usage of these helpers for a given @iova, @page_size, @length
> and __user @data:
>
> iova_bitmap_init(&iter.dirty, iova, __ffs(page_size));
> ret = iova_bitmap_iter_init(&iter, iova, length, data);
Why are these separate functions given this use case?
> if (ret)
> return -ENOMEM;
>
> for (; !iova_bitmap_iter_done(&iter);
> iova_bitmap_iter_advance(&iter)) {
> ret = iova_bitmap_iter_get(&iter);
> if (ret)
> break;
> if (dirty)
> iova_bitmap_set(iova_bitmap_iova(&iter),
> iova_bitmap_iova_length(&iter),
> &iter.dirty);
>
> iova_bitmap_iter_put(&iter);
>
> if (ret)
> break;
This break is unreachable.
> }
>
> iova_bitmap_iter_free(&iter);
>
> The facility is intended to be used for user bitmaps representing
> dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci).
>
> Signed-off-by: Joao Martins <joao.m.martins@...cle.com>
> Signed-off-by: Yishai Hadas <yishaih@...dia.com>
> ---
> drivers/vfio/Makefile | 6 +-
> drivers/vfio/iova_bitmap.c | 164 ++++++++++++++++++++++++++++++++++++
> include/linux/iova_bitmap.h | 46 ++++++++++
> 3 files changed, 214 insertions(+), 2 deletions(-)
> create mode 100644 drivers/vfio/iova_bitmap.c
> create mode 100644 include/linux/iova_bitmap.h
>
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 1a32357592e3..1d6cad32d366 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,9 +1,11 @@
> # SPDX-License-Identifier: GPL-2.0
> vfio_virqfd-y := virqfd.o
>
> -vfio-y += vfio_main.o
> -
> obj-$(CONFIG_VFIO) += vfio.o
> +
> +vfio-y := vfio_main.o \
> + iova_bitmap.o \
> +
> obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/vfio/iova_bitmap.c
> new file mode 100644
> index 000000000000..9ad1533a6aec
> --- /dev/null
> +++ b/drivers/vfio/iova_bitmap.c
> @@ -0,0 +1,164 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, Oracle and/or its affiliates.
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/iova_bitmap.h>
> +
> +static unsigned long iova_bitmap_iova_to_index(struct iova_bitmap_iter *iter,
> + unsigned long iova_length)
If we have an iova-to-index function, why do we pass it a length? That
seems to be conflating the use cases where the caller is trying to
determine the last index for a range with the actual implementation of
this helper.
> +{
> + unsigned long pgsize = 1 << iter->dirty.pgshift;
> +
> + return DIV_ROUND_UP(iova_length, BITS_PER_TYPE(*iter->data) * pgsize);
ROUND_UP here doesn't make sense to me and is not symmetric with the
below index-to-iova function. For example an iova of 0x1000 give me an
index of 1, but index of 1 gives me an iova of 0x40000. Does this code
work??
> +}
> +
> +static unsigned long iova_bitmap_index_to_iova(struct iova_bitmap_iter *iter,
> + unsigned long index)
> +{
> + unsigned long pgshift = iter->dirty.pgshift;
> +
> + return (index * sizeof(*iter->data) * BITS_PER_BYTE) << pgshift;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Isn't that BITS_PER_TYPE(*iter->data), just as in the previous function?
> +}
> +
> +static unsigned long iova_bitmap_iter_left(struct iova_bitmap_iter *iter)
I think this is trying to find "remaining" whereas "left" can be
confused with a direction.
> +{
> + unsigned long left = iter->count - iter->offset;
> +
> + left = min_t(unsigned long, left,
> + (iter->dirty.npages << PAGE_SHIFT) / sizeof(*iter->data));
Ugh, dirty.npages is zero'd on bitmap init, allocated on get and left
with stale data on put. This really needs some documentation/theory of
operation.
> +
> + return left;
> +}
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
This is all true and familiar, but what's it doing here? The type1
code this comes from uses this to justify some #defines that are used
to sanitize input. I see no such enforcement in this code. The only
comment in this whole patch and it seems irrelevant.
> +int iova_bitmap_iter_init(struct iova_bitmap_iter *iter,
> + unsigned long iova, unsigned long length,
> + u64 __user *data)
> +{
> + struct iova_bitmap *dirty = &iter->dirty;
> +
> + iter->data = data;
> + iter->offset = 0;
> + iter->count = iova_bitmap_iova_to_index(iter, length);
If this works, it's because the DIV_ROUND_UP above accounted for what
should have been and index-to-count fixup here, ie. add one.
> + iter->iova = iova;
> + iter->length = length;
> + dirty->pages = (struct page **)__get_free_page(GFP_KERNEL);
> +
> + return !dirty->pages ? -ENOMEM : 0;
> +}
> +
> +void iova_bitmap_iter_free(struct iova_bitmap_iter *iter)
> +{
> + struct iova_bitmap *dirty = &iter->dirty;
> +
> + if (dirty->pages) {
> + free_page((unsigned long)dirty->pages);
> + dirty->pages = NULL;
> + }
> +}
> +
> +bool iova_bitmap_iter_done(struct iova_bitmap_iter *iter)
> +{
> + return iter->offset >= iter->count;
> +}
> +
> +unsigned long iova_bitmap_length(struct iova_bitmap_iter *iter)
> +{
> + unsigned long max_iova = iter->dirty.iova + iter->length;
> + unsigned long left = iova_bitmap_iter_left(iter);
> + unsigned long iova = iova_bitmap_iova(iter);
> +
> + left = iova_bitmap_index_to_iova(iter, left);
@left is first used for number of indexes and then for an iova range :(
> + if (iova + left > max_iova)
> + left -= ((iova + left) - max_iova);
> +
> + return left;
> +}
IIUC, this is returning the iova free space in the bitmap, not the
length of the bitmap??
> +
> +unsigned long iova_bitmap_iova(struct iova_bitmap_iter *iter)
> +{
> + unsigned long skip = iter->offset;
> +
> + return iter->iova + iova_bitmap_index_to_iova(iter, skip);
> +}
It would help if this were defined above it's usage above.
> +
> +void iova_bitmap_iter_advance(struct iova_bitmap_iter *iter)
> +{
> + unsigned long length = iova_bitmap_length(iter);
> +
> + iter->offset += iova_bitmap_iova_to_index(iter, length);
Again, fudging an index count based on bogus index value.
> +}
> +
> +void iova_bitmap_iter_put(struct iova_bitmap_iter *iter)
> +{
> + struct iova_bitmap *dirty = &iter->dirty;
> +
> + if (dirty->npages)
> + unpin_user_pages(dirty->pages, dirty->npages);
dirty->npages = 0;?
> +}
> +
> +int iova_bitmap_iter_get(struct iova_bitmap_iter *iter)
> +{
> + struct iova_bitmap *dirty = &iter->dirty;
> + unsigned long npages;
> + u64 __user *addr;
> + long ret;
> +
> + npages = DIV_ROUND_UP((iter->count - iter->offset) *
> + sizeof(*iter->data), PAGE_SIZE);
> + npages = min(npages, PAGE_SIZE / sizeof(struct page *));
> + addr = iter->data + iter->offset;
> + ret = pin_user_pages_fast((unsigned long)addr, npages,
> + FOLL_WRITE, dirty->pages);
> + if (ret <= 0)
> + return ret;
> +
> + dirty->npages = (unsigned long)ret;
> + dirty->iova = iova_bitmap_iova(iter);
> + dirty->start_offset = offset_in_page(addr);
> + return 0;
> +}
> +
> +void iova_bitmap_init(struct iova_bitmap *bitmap,
> + unsigned long base, unsigned long pgshift)
> +{
> + memset(bitmap, 0, sizeof(*bitmap));
> + bitmap->iova = base;
> + bitmap->pgshift = pgshift;
> +}
> +
> +unsigned int iova_bitmap_set(struct iova_bitmap *dirty,
> + unsigned long iova,
> + unsigned long length)
> +{
> + unsigned long nbits, offset, start_offset, idx, size, *kaddr;
> +
> + nbits = max(1UL, length >> dirty->pgshift);
> + offset = (iova - dirty->iova) >> dirty->pgshift;
> + idx = offset / (PAGE_SIZE * BITS_PER_BYTE);
> + offset = offset % (PAGE_SIZE * BITS_PER_BYTE);
> + start_offset = dirty->start_offset;
> +
> + while (nbits > 0) {
> + kaddr = kmap_local_page(dirty->pages[idx]) + start_offset;
> + size = min(PAGE_SIZE * BITS_PER_BYTE - offset, nbits);
> + bitmap_set(kaddr, offset, size);
> + kunmap_local(kaddr - start_offset);
> + start_offset = offset = 0;
> + nbits -= size;
> + idx++;
> + }
> +
> + return nbits;
> +}
> +EXPORT_SYMBOL_GPL(iova_bitmap_set);
> +
> diff --git a/include/linux/iova_bitmap.h b/include/linux/iova_bitmap.h
> new file mode 100644
> index 000000000000..c474c351634a
> --- /dev/null
> +++ b/include/linux/iova_bitmap.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2022, Oracle and/or its affiliates.
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#ifndef _IOVA_BITMAP_H_
> +#define _IOVA_BITMAP_H_
> +
> +#include <linux/highmem.h>
> +#include <linux/mm.h>
> +#include <linux/uio.h>
> +
> +struct iova_bitmap {
> + unsigned long iova;
> + unsigned long pgshift;
> + unsigned long start_offset;
> + unsigned long npages;
> + struct page **pages;
> +};
> +
> +struct iova_bitmap_iter {
> + struct iova_bitmap dirty;
> + u64 __user *data;
> + size_t offset;
> + size_t count;
> + unsigned long iova;
> + unsigned long length;
> +};
> +
> +int iova_bitmap_iter_init(struct iova_bitmap_iter *iter, unsigned long iova,
> + unsigned long length, u64 __user *data);
> +void iova_bitmap_iter_free(struct iova_bitmap_iter *iter);
> +bool iova_bitmap_iter_done(struct iova_bitmap_iter *iter);
> +unsigned long iova_bitmap_length(struct iova_bitmap_iter *iter);
> +unsigned long iova_bitmap_iova(struct iova_bitmap_iter *iter);
> +void iova_bitmap_iter_advance(struct iova_bitmap_iter *iter);
> +int iova_bitmap_iter_get(struct iova_bitmap_iter *iter);
> +void iova_bitmap_iter_put(struct iova_bitmap_iter *iter);
> +void iova_bitmap_init(struct iova_bitmap *bitmap,
> + unsigned long base, unsigned long pgshift);
> +unsigned int iova_bitmap_set(struct iova_bitmap *dirty,
> + unsigned long iova,
> + unsigned long length);
> +
> +#endif
No relevant comments, no theory of operation. I found this really
difficult to review and the page handling is still not clear to me.
I'm not willing to take on maintainership of this code under
drivers/vfio/ as is. Thanks,
Alex
Powered by blists - more mailing lists