lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 15 Sep 2010 09:50:22 +0800
From:	"Xin, Xiaohui" <xiaohui.xin@...el.com>
To:	Shirley Ma <mashirle@...ibm.com>, Avi Kivity <avi@...hat.com>
CC:	David Miller <davem@...emloft.net>,
	"arnd@...db.de" <arnd@...db.de>, "mst@...hat.com" <mst@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host
 kernel

>From: Shirley Ma [mailto:mashirle@...ibm.com]
>Sent: Tuesday, September 14, 2010 11:05 PM
>To: Avi Kivity
>Cc: David Miller; arnd@...db.de; mst@...hat.com; Xin, Xiaohui; netdev@...r.kernel.org;
>kvm@...r.kernel.org; linux-kernel@...r.kernel.org
>Subject: Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel
>
>On Tue, 2010-09-14 at 11:12 +0200, Avi Kivity wrote:
>> >> +            base = (unsigned long)from->iov_base + offset1;
>> >> +            size = ((base&  ~PAGE_MASK) + len + ~PAGE_MASK)>>
>> PAGE_SHIFT;
>> >> +            num_pages = get_user_pages_fast(base, size,
>> 0,&page[i]);
>> >> +            if ((num_pages != size) ||
>> >> +                (num_pages>  MAX_SKB_FRAGS -
>> skb_shinfo(skb)->nr_frags))
>> >> +                    /* put_page is in skb free */
>> >> +                    return -EFAULT;
>> > What keeps the user from writing to these pages in it's address
>> space
>> > after the write call returns?
>> >
>> > A write() return of success means:
>> >
>> >       "I wrote what you gave to me"
>> >
>> > not
>> >
>> >       "I wrote what you gave to me, oh and BTW don't touch these
>> >           pages for a while."
>> >
>> > In fact "a while" isn't even defined in any way, as there is no way
>> > for the write() invoker to know when the networking card is done
>> with
>> > those pages.
>>
>> That's what io_submit() is for.  Then io_getevents() tells you what
>> "a
>> while" actually was.
>
>This macvtap zero copy uses iov buffers from vhost ring, which is
>allocated from guest kernel. In host kernel, vhost calls macvtap
>sendmsg. macvtap sendmsg calls get_user_pages_fast to pin these buffers'
>pages for zero copy.
>
>The patch is relying on how vhost handle these buffers. I need to look
>at vhost code (qemu) first for addressing the questions here.
>
>Thanks
>Shirley

I think what David said is what we have thought before in mp device.
Since we are not sure the exact time the tx buffer was wrote though DMA operation.
But the deadline is when the tx buffer was freed. So we only notify the vhost stuff
about the write when tx buffer freed. But the deadline is maybe too late for performance.

Thanks
Xiaohui 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ