linux-kernel - Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E366A1A.2090004@gmail.com>
Date:	Mon, 01 Aug 2011 16:55:54 +0800
From:	Liu Yuan <namei.unix@...il.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
CC:	Rusty Russell <rusty@...tcorp.com.au>, Avi Kivity <avi@...hat.com>,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk

On 08/01/2011 04:12 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 01, 2011 at 02:25:36PM +0800, Liu Yuan wrote:
>> On 07/28/2011 11:22 PM, Michael S. Tsirkin wrote:
>>> It would be nicer to reuse the worker infrastructure
>> >from vhost.c. In particular this one ignores cgroups that
>>> the owner belongs to if any.
>>> Does this one do anything that vhost.c doesn't?
>>>
>> The main idea I use a separated thread to handling completion is to
>> decouple the  request handling
>> and the request completion signalling. This might allow better
>> scalability in a IO intensive scenario,
> The code seems to have the vq mutex though, isn't that right?
> If so, it can't execute in parallel so it's a bit
> hard to see how this would help scalability.
>

Nope, V1 completion thread doesn't has any mutex, thus can run parallel 
with the vhost worker.
Anyway, I'll adopt completion code to the vhost worker, since it deals 
with other stuff like cgroup.

> I had a feeling other devices might want some other structure
> (not an fd) as a backend. Maybe that was wrong ...
> If so, pls do that, and #define VHOST_NET_SET_BACKEND VHOST_SET_BACKEND
> for compatibility.
>

Okay.

>>>>   static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>>>>   			 struct iocb *iocb, bool compat)
>>>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>>>> index d9a5917..6343bc9 100644
>>>> --- a/fs/eventfd.c
>>>> +++ b/fs/eventfd.c
>>>> @@ -406,6 +406,7 @@ struct file *eventfd_file_create(unsigned int count, int flags)
>>>>
>>>>   	return file;
>>>>   }
>>>> +EXPORT_SYMBOL_GPL(eventfd_file_create);
>>> You can avoid the need for this export if you pass
>>> the eventfd in from userspace.
>>>
>> Since eventfd used by completion code is internal and hiding it from
>> hw/vhost_blk.c would simplify
>> the configuration, I think this exporting is necessary and can get
>> rid of unnecessary FD management
>> in vhost-blk.c.
> Well this is a new kernel interface duplicating the functionality of the
> old one.  You'll have a hard time selling this idea upstream, I suspect.
> And I doubt it simplifies the code significantly.
> Further, you have a single vq for block, but net has two and
> we do want the flexibility of using a single eventfd for both,
> I think.
>
point taken.

>>>>   SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
>>>>   {
>>>> diff --git a/include/linux/aio.h b/include/linux/aio.h
>>>> index 7a8db41..d63bc04 100644
>>>> --- a/include/linux/aio.h
>>>> +++ b/include/linux/aio.h
>>>> @@ -214,6 +214,37 @@ struct mm_struct;
>>>>   extern void exit_aio(struct mm_struct *mm);
>>>>   extern long do_io_submit(aio_context_t ctx_id, long nr,
>>>>   			 struct iocb __user *__user *iocbpp, bool compat);
>>>> +extern void __put_ioctx(struct kioctx *ctx);
>>>> +extern struct kioctx *ioctx_alloc(unsigned nr_events);
>>>> +extern struct kiocb *aio_get_req(struct kioctx *ctx);
>>>> +extern ssize_t aio_run_iocb(struct kiocb *iocb);
>>>> +extern int __aio_run_iocbs(struct kioctx *ctx);
>>>> +extern int read_events(struct kioctx *ctx,
>>>> +                        long min_nr, long nr,
>>>> +                        struct io_event __user *event,
>>>> +                        struct timespec __user *timeout);
>>>> +extern void io_destroy(struct kioctx *ioctx);
>>>> +extern ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat);
>>>> +extern void __put_ioctx(struct kioctx *ctx);
>>>> +
>>>> +static inline void get_ioctx(struct kioctx *kioctx)
>>>> +{
>>>> +        BUG_ON(atomic_read(&kioctx->users)<= 0);
>>>> +        atomic_inc(&kioctx->users);
>>>> +}
>>>> +
>>>> +static inline int try_get_ioctx(struct kioctx *kioctx)
>>>> +{
>>>> +        return atomic_inc_not_zero(&kioctx->users);
>>>> +}
>>>> +
>>>> +static inline void put_ioctx(struct kioctx *kioctx)
>>>> +{
>>>> +        BUG_ON(atomic_read(&kioctx->users)<= 0);
>>>> +        if (unlikely(atomic_dec_and_test(&kioctx->users)))
>>>> +                __put_ioctx(kioctx);
>>>> +}
>>>> +
>>>>   #else
>>>>   static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
>>>>   static inline int aio_put_req(struct kiocb *iocb) { return 0; }
>>>> -- 
>>>> 1.7.5.1
>> Other comments will be addressed in V2. Thanks
>>
>> Yuan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/