[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51231231.1040908@ozlabs.ru>
Date: Tue, 19 Feb 2013 16:48:33 +1100
From: Alexey Kardashevskiy <aik@...abs.ru>
To: Alex Williamson <alex.williamson@...hat.com>
CC: Joerg Roedel <joerg.roedel@....com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
linux-kernel@...r.kernel.org,
David Gibson <david@...son.dropbear.id.au>
Subject: Re: [PATCH] iommu: making IOMMU sysfs nodes API public
On 19/02/13 16:24, Alex Williamson wrote:
> On Mon, 2013-02-18 at 17:15 +1100, Alexey Kardashevskiy wrote:
>> On 13/02/13 04:15, Alex Williamson wrote:
>>> On Wed, 2013-02-13 at 01:42 +1100, Alexey Kardashevskiy wrote:
>>>> On 12/02/13 16:07, Alex Williamson wrote:
>>>>> On Tue, 2013-02-12 at 15:06 +1100, Alexey Kardashevskiy wrote:
>>>>>> Having this patch in a tree, adding new nodes in sysfs
>>>>>> for IOMMU groups is going to be easier.
>>>>>>
>>>>>> The first candidate for this change is a "dma-window-size"
>>>>>> property which tells a size of a DMA window of the specific
>>>>>> IOMMU group which can be used later for locked pages accounting.
>>>>>
>>>>> I'm still churning on this one; I'm nervous this would basically creat
>>>>> a /proc free-for-all under /sys/kernel/iommu_group/$GROUP/ where any
>>>>> iommu driver can add random attributes. That can get ugly for
>>>>> userspace.
>>>>
>>>> Is not it exactly what sysfs is for (unlike /proc)? :)
>>>
>>> Um, I hope it's a little more thought out than /proc.
>>>
>>>>> On the other hand, for the application of userspace knowing how much
>>>>> memory to lock for vfio use of a group, it's an appealing location to
>>>>> get that information. Something like libvirt would already be poking
>>>>> around here to figure out which devices to bind. Page limits need to be
>>>>> setup prior to use through vfio, so sysfs is more convenient than
>>>>> through vfio ioctls.
>>>>
>>>> True. DMA window properties do not change since boot so sysfs is the right
>>>> place to expose them.
>>>>
>>>>> But then is dma-window-size just a vfio requirement leaking over into
>>>>> iommu groups? Can we allow iommu driver based attributes without giving
>>>>> up control of the namespace? Thanks,
>>>>
>>>> Who are you asking these questions? :)
>>>
>>> Anyone, including you. Rather than dropping misc files in sysfs to
>>> describe things about the group, I think the better solution in your
>>> case might be a link from the group to an existing sysfs directory
>>> describing the PE. I believe your PE is rooted in a PCI bridge, so that
>>> presumably already has a representation in sysfs. Can the aperture size
>>> be determined from something in sysfs for that bridge already? I'm just
>>> not ready to create a grab bag of sysfs entries for a group yet.
>>> Thanks,
>>
>>
>> At the moment there is no information neither in sysfs nor
>> /proc/device-tree about the dma-window. And adding a sysfs entry per PE
>> (powerpc partitionable end-point which is often a PHB but not always) just
>> for VFIO is quite heavy.
>
> How do you learn the window size and PE extents in the host kernel?
When the ppc64 code does PCI scan, it creates iommu_table structs per PE.
When new PE (i.e. new iommu_table) is discovered I call iommu_group_alloc()
and iommu_group_set_iommudata() to link iommu_group with iommu_table. These
iommu_table structs have DMA window properties.
>> We could add a ppc64 subfolder under /sys/kernel/iommu/xxx/ and put the
>> "dma-window" property there. And replace it with a symlink when and if we
>> add something for PE later. Would work?
>
> To be clear, you're suggesting /sys/kernel/iommu_groups/$GROUP/xxx/,
> right? A subfolder really only limits the scope of the mess, so it's
> not much improvement.
You suggested some symlink to some ppc64 or pci tree in sysfs, it is not
that different.
> What does the interface look like to make those
> subfolders?
int iommu_group_create_platform_file(struct iommu_group *group,
struct iommu_group_attribute *attr)
and that's it.
> The problem we're trying to solve is this call flow:
>
> containerfd = open("/dev/vfio/vfio");
> ioctl(containerfd, VFIO_GET_API_VERSION);
> ioctl(containerfd, VFIO_CHECK_EXTENSION, ...);
> groupfd = open("/dev/vfio/$GROUP");
> ioctl(groupfd, VFIO_GROUP_GET_STATUS);
> ioctl(groupfd, VFIO_GROUP_SET_CONTAINER, &containerfd);
>
> You wanted to lock all the memory for the DMA window here, before we can
> call VFIO_IOMMU_GET_INFO, but does it need to happen there? We still
> have a MAP_DMA hook. We could do it all on the first mapping. It also
> has a flags field that could augment the behavior to trigger page
> locking. Adding the window size to sysfs seems more readily convenient,
> but is it so hard for userspace to open the files and call a couple
> ioctls to get far enough to call IOMMU_GET_INFO? I'm unconvinced the
> clutter in sysfs more than just a quick fix. Thanks,
I thought the problem is that we want to let user set correct rlimit before
running QEMU, no? We do not change rlimit from QEMU.
>
> Alex
>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@...abs.ru>
>>>>>> ---
>>>>>> drivers/iommu/iommu.c | 19 ++-----------------
>>>>>> include/linux/iommu.h | 20 ++++++++++++++++++++
>>>>>> 2 files changed, 22 insertions(+), 17 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>>>> index b0afd3d..58cc298 100644
>>>>>> --- a/drivers/iommu/iommu.c
>>>>>> +++ b/drivers/iommu/iommu.c
>>>>>> @@ -52,22 +52,6 @@ struct iommu_device {
>>>>>> char *name;
>>>>>> };
>>>>>>
>>>>>> -struct iommu_group_attribute {
>>>>>> - struct attribute attr;
>>>>>> - ssize_t (*show)(struct iommu_group *group, char *buf);
>>>>>> - ssize_t (*store)(struct iommu_group *group,
>>>>>> - const char *buf, size_t count);
>>>>>> -};
>>>>>> -
>>>>>> -#define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \
>>>>>> -struct iommu_group_attribute iommu_group_attr_##_name = \
>>>>>> - __ATTR(_name, _mode, _show, _store)
>>>>>> -
>>>>>> -#define to_iommu_group_attr(_attr) \
>>>>>> - container_of(_attr, struct iommu_group_attribute, attr)
>>>>>> -#define to_iommu_group(_kobj) \
>>>>>> - container_of(_kobj, struct iommu_group, kobj)
>>>>>> -
>>>>>> static ssize_t iommu_group_attr_show(struct kobject *kobj,
>>>>>> struct attribute *__attr, char *buf)
>>>>>> {
>>>>>> @@ -98,11 +82,12 @@ static const struct sysfs_ops iommu_group_sysfs_ops = {
>>>>>> .store = iommu_group_attr_store,
>>>>>> };
>>>>>>
>>>>>> -static int iommu_group_create_file(struct iommu_group *group,
>>>>>> +int iommu_group_create_file(struct iommu_group *group,
>>>>>> struct iommu_group_attribute *attr)
>>>>>> {
>>>>>> return sysfs_create_file(&group->kobj, &attr->attr);
>>>>>> }
>>>>>> +EXPORT_SYMBOL_GPL(iommu_group_create_file);
>>>>>>
>>>>>> static void iommu_group_remove_file(struct iommu_group *group,
>>>>>> struct iommu_group_attribute *attr)
>>>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>>>>> index f3b99e1..6d24ba7 100644
>>>>>> --- a/include/linux/iommu.h
>>>>>> +++ b/include/linux/iommu.h
>>>>>> @@ -21,6 +21,7 @@
>>>>>>
>>>>>> #include <linux/errno.h>
>>>>>> #include <linux/types.h>
>>>>>> +#include <linux/stat.h>
>>>>>>
>>>>>> #define IOMMU_READ (1)
>>>>>> #define IOMMU_WRITE (2)
>>>>>> @@ -197,6 +198,25 @@ static inline int report_iommu_fault(struct iommu_domain *domain,
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> +struct iommu_group_attribute {
>>>>>> + struct attribute attr;
>>>>>> + ssize_t (*show)(struct iommu_group *group, char *buf);
>>>>>> + ssize_t (*store)(struct iommu_group *group,
>>>>>> + const char *buf, size_t count);
>>>>>> +};
>>>>>> +
>>>>>> +#define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \
>>>>>> +struct iommu_group_attribute iommu_group_attr_##_name = \
>>>>>> + __ATTR(_name, _mode, _show, _store)
>>>>>> +
>>>>>> +#define to_iommu_group_attr(_attr) \
>>>>>> + container_of(_attr, struct iommu_group_attribute, attr)
>>>>>> +#define to_iommu_group(_kobj) \
>>>>>> + container_of(_kobj, struct iommu_group, kobj)
>>>>>> +
>>>>>> +extern int iommu_group_create_file(struct iommu_group *group,
>>>>>> + struct iommu_group_attribute *attr);
>>>>>> +
>>>>>> #else /* CONFIG_IOMMU_API */
>>>>>>
>>>>>> struct iommu_ops {};
>>
>>
>
>
>
--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists