linux-kernel - Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110801081237.GC5439@redhat.com>
Date:	Mon, 1 Aug 2011 11:12:37 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Liu Yuan <namei.unix@...il.com>
Cc:	Rusty Russell <rusty@...tcorp.com.au>, Avi Kivity <avi@...hat.com>,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk

On Mon, Aug 01, 2011 at 02:25:36PM +0800, Liu Yuan wrote:
> On 07/28/2011 11:22 PM, Michael S. Tsirkin wrote:
> >
> >It would be nicer to reuse the worker infrastructure
> >from vhost.c. In particular this one ignores cgroups that
> >the owner belongs to if any.
> >Does this one do anything that vhost.c doesn't?
> >
> 
> The main idea I use a separated thread to handling completion is to
> decouple the  request handling
> and the request completion signalling. This might allow better
> scalability in a IO intensive scenario,

The code seems to have the vq mutex though, isn't that right?
If so, it can't execute in parallel so it's a bit
hard to see how this would help scalability.

> since I noted that virtio driver would allow sumbit as much as 128
> requests in one go.
> 
> Hmm, I have tried to make signalling thread into a function that is
> called as a vhost-worker's work.
> I didn't see regression in my laptop with iodepth equalling 1,2,3.
> But requests handling and completion signalling may produce race in
> a high requests submitting rate. Anyway, I'll adopt it to work as a
> vhost
> worker function in v2.
> 
> >
> >>+	switch (ioctl) {
> >>+		case VHOST_NET_SET_BACKEND:
> >>+			if(copy_from_user(&backend, (void __user *)arg, sizeof backend))
> >>+				break;
> >>+			ret = blk_set_backend(blk,&backend);
> >>+			break;
> >Please create your own ioctl for this one.
> >
> 
> How about change VHOST_NET_SET_BACKEND into VHOST_SET_BACKEND?

I had a feeling other devices might want some other structure
(not an fd) as a backend. Maybe that was wrong ...
If so, pls do that, and #define VHOST_NET_SET_BACKEND VHOST_SET_BACKEND
for compatibility.

> >>
> >>  static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> >>  			 struct iocb *iocb, bool compat)
> >>diff --git a/fs/eventfd.c b/fs/eventfd.c
> >>index d9a5917..6343bc9 100644
> >>--- a/fs/eventfd.c
> >>+++ b/fs/eventfd.c
> >>@@ -406,6 +406,7 @@ struct file *eventfd_file_create(unsigned int count, int flags)
> >>
> >>  	return file;
> >>  }
> >>+EXPORT_SYMBOL_GPL(eventfd_file_create);
> >You can avoid the need for this export if you pass
> >the eventfd in from userspace.
> >
> 
> Since eventfd used by completion code is internal and hiding it from
> hw/vhost_blk.c would simplify
> the configuration, I think this exporting is necessary and can get
> rid of unnecessary FD management
> in vhost-blk.c.

Well this is a new kernel interface duplicating the functionality of the
old one.  You'll have a hard time selling this idea upstream, I suspect.
And I doubt it simplifies the code significantly.
Further, you have a single vq for block, but net has two and
we do want the flexibility of using a single eventfd for both,
I think.

> >>
> >>  SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
> >>  {
> >>diff --git a/include/linux/aio.h b/include/linux/aio.h
> >>index 7a8db41..d63bc04 100644
> >>--- a/include/linux/aio.h
> >>+++ b/include/linux/aio.h
> >>@@ -214,6 +214,37 @@ struct mm_struct;
> >>  extern void exit_aio(struct mm_struct *mm);
> >>  extern long do_io_submit(aio_context_t ctx_id, long nr,
> >>  			 struct iocb __user *__user *iocbpp, bool compat);
> >>+extern void __put_ioctx(struct kioctx *ctx);
> >>+extern struct kioctx *ioctx_alloc(unsigned nr_events);
> >>+extern struct kiocb *aio_get_req(struct kioctx *ctx);
> >>+extern ssize_t aio_run_iocb(struct kiocb *iocb);
> >>+extern int __aio_run_iocbs(struct kioctx *ctx);
> >>+extern int read_events(struct kioctx *ctx,
> >>+                        long min_nr, long nr,
> >>+                        struct io_event __user *event,
> >>+                        struct timespec __user *timeout);
> >>+extern void io_destroy(struct kioctx *ioctx);
> >>+extern ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat);
> >>+extern void __put_ioctx(struct kioctx *ctx);
> >>+
> >>+static inline void get_ioctx(struct kioctx *kioctx)
> >>+{
> >>+        BUG_ON(atomic_read(&kioctx->users)<= 0);
> >>+        atomic_inc(&kioctx->users);
> >>+}
> >>+
> >>+static inline int try_get_ioctx(struct kioctx *kioctx)
> >>+{
> >>+        return atomic_inc_not_zero(&kioctx->users);
> >>+}
> >>+
> >>+static inline void put_ioctx(struct kioctx *kioctx)
> >>+{
> >>+        BUG_ON(atomic_read(&kioctx->users)<= 0);
> >>+        if (unlikely(atomic_dec_and_test(&kioctx->users)))
> >>+                __put_ioctx(kioctx);
> >>+}
> >>+
> >>  #else
> >>  static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
> >>  static inline int aio_put_req(struct kiocb *iocb) { return 0; }
> >>-- 
> >>1.7.5.1
> 
> Other comments will be addressed in V2. Thanks
> 
> Yuan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/