[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97e63191-e2ba-34f6-ca6c-99b9e9841587@epam.com>
Date: Thu, 13 Jul 2023 14:40:08 +0000
From: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@...m.com>
To: Juergen Gross <jgross@...e.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Stefano Stabellini <sstabellini@...nel.org>
CC: Vincent Guittot <vincent.guittot@...aro.org>,
Alex Bennée <alex.bennee@...aro.org>,
"stratos-dev@...lists.linaro.org" <stratos-dev@...lists.linaro.org>,
Erik Schilling <erik.schilling@...aro.org>,
Manos Pitsidianakis <manos.pitsidianakis@...aro.org>,
Mathieu Poirier <mathieu.poirier@...aro.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] xen: privcmd: Add support for irqfd
On 13.07.23 10:44, Juergen Gross wrote:
Hello all.
> On 12.07.23 10:48, Viresh Kumar wrote:
>> Xen provides support for injecting interrupts to the guests via the
>> HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based
>> device backend implementations, in an inefficient manner currently.
>>
>> Generally, the Virtio backends are implemented to work with the Eventfd
>> based mechanism. In order to make such backends work with Xen, another
>> software layer needs to poll the Eventfds and raise an interrupt to the
>> guest using the Xen based mechanism. This results in an extra context
>> switch.
>>
>> This is not a new problem in Linux though. It is present with other
>> hypervisors like KVM, etc. as well. The generic solution implemented in
>> the kernel for them is to provide an IOCTL call to pass the interrupt
>> details and eventfd, which lets the kernel take care of polling the
>> eventfd and raising of the interrupt, instead of handling this in user
>> space (which involves an extra context switch).
>>
>> This patch adds support to inject a specific interrupt to guest using
>> the eventfd mechanism, by preventing the extra context switch.
>>
>> Inspired by existing implementations for KVM, etc..
Viresh, great work!
Do you perhaps have corresponding users-space (virtio backend) example
adopted for that feature (I would like to take a look at it if possible)?
>>
>> Signed-off-by: Viresh Kumar <viresh.kumar@...aro.org>
>> ---
>> drivers/xen/privcmd.c | 285 ++++++++++++++++++++++++++++++++++++-
>> include/uapi/xen/privcmd.h | 14 ++
>> 2 files changed, 297 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index e2f580e30a86..e8096b09c113 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -9,11 +9,16 @@
>> #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
>> +#include <linux/eventfd.h>
>> +#include <linux/file.h>
>> #include <linux/kernel.h>
>> #include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/poll.h>
>> #include <linux/sched.h>
>> #include <linux/slab.h>
>> #include <linux/string.h>
>> +#include <linux/workqueue.h>
>> #include <linux/errno.h>
>> #include <linux/mm.h>
>> #include <linux/mman.h>
>> @@ -833,6 +838,266 @@ static long privcmd_ioctl_mmap_resource(struct
>> file *file,
>> return rc;
>> }
>> +/* Irqfd support */
>> +static struct workqueue_struct *irqfd_cleanup_wq;
>> +static DEFINE_MUTEX(irqfds_lock);
>> +static LIST_HEAD(irqfds_list);
>> +
>> +struct privcmd_kernel_irqfd {
>> + domid_t dom;
>> + u8 level;
>> + u32 irq;
>> + struct eventfd_ctx *eventfd;
>> + struct work_struct shutdown;
>> + wait_queue_entry_t wait;
>> + struct list_head list;
>> + poll_table pt;
>> +};
>> +
>> +/* From xen/include/public/hvm/dm_op.h */
>> +#define XEN_DMOP_set_irq_level 19
>> +
>> +struct xen_dm_op_set_irq_level {
>> + u32 irq;
>> + /* IN - Level: 0 -> deasserted, 1 -> asserted */
>> + u8 level;
>> + u8 pad[3];
>> +};
>> +
>> +struct xen_dm_op {
>> + u32 op;
>> + u32 pad;
>> + union {
>> + /*
>> + * There are more structures here, we won't be using them, so
>> + * can skip adding them here.
>> + */
>> + struct xen_dm_op_set_irq_level set_irq_level;
>> + } u;
>> +};
>
> Instead of copying definitions over from Xen into privcmd.c, please just
> update
> the related linux header include/xen/interface/dm_op.h from the Xen public
> header.
>
>> +
>> +static void irqfd_deactivate(struct privcmd_kernel_irqfd *kirqfd)
>> +{
>> + lockdep_assert_held(&irqfds_lock);
>> +
>> + list_del_init(&kirqfd->list);
>> + queue_work(irqfd_cleanup_wq, &kirqfd->shutdown);
>> +}
>> +
>> +static void irqfd_shutdown(struct work_struct *work)
>> +{
>> + struct privcmd_kernel_irqfd *kirqfd =
>> + container_of(work, struct privcmd_kernel_irqfd, shutdown);
>> + u64 cnt;
>> +
>> + eventfd_ctx_remove_wait_queue(kirqfd->eventfd, &kirqfd->wait, &cnt);
>> + eventfd_ctx_put(kirqfd->eventfd);
>> + kfree(kirqfd);
>> +}
>> +
>> +static void irqfd_inject(struct privcmd_kernel_irqfd *kirqfd)
>> +{
>> + struct xen_dm_op dm_op = {
>> + .op = XEN_DMOP_set_irq_level,
>> + .u.set_irq_level.irq = kirqfd->irq,
>> + .u.set_irq_level.level = kirqfd->level,
>> + };
>> + struct xen_dm_op_buf xbufs = {
>> + .size = sizeof(dm_op),
>> + };
>> + u64 cnt;
>> +
>> + eventfd_ctx_do_read(kirqfd->eventfd, &cnt);
>> + set_xen_guest_handle(xbufs.h, &dm_op);
>> +
>> + xen_preemptible_hcall_begin();
>> + HYPERVISOR_dm_op(kirqfd->dom, 1, &xbufs);
>
> Please add some error handling, e.g. by issuing a message in case this
> hypercall
> was failing. Adding a bool "error" to struct privcmd_kernel_irqfd in
> order to
> avoid multiple error messages for the same device might be a good idea.
In addition to provided comments, I would like to mention that this
particular dm_op has Arm implementation only in vanilla hypervisor.
So this feature cannot be immediately reused on x86 because of
XEN_DMOP_set_irq_level at least. As I understand, the x86's variant is
XEN_DMOP_set_isa_irq_level (there is also XEN_DMOP_set_pci_intx_level
for legacy PCI interrupts).
Please note, I am not asking to wire on x86, but maybe it is worth
considering to put this new feature under something like Kconfig's
XEN_PRIVCMD_IRQFD which depends on Arm for now? Or maybe to put dm_op
specific part of irqfd_inject() under CONFIG_ARCH_XXX here or introduce
per-ARCH helpers to form a dm_op (Arm's variant will use
XEN_DMOP_set_irq_level like in current commit). If not, then at least a
sentence in the description mentioning that this works on Arm only needs
to be added, I think.
Also, I am wondering whether we should foresee the IOCTL_PRIVCMD_IRQFD
interface to be suitable for all existing irq related dm_ops (not only
XEN_DMOP_set_irq_level) right now or this can be extended/updated later
on (when there is a real need)?
[snip]
Powered by blists - more mailing lists