[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F2E9EB7348B8264F86B6AB8151CE2D792B8CAA04F9@shsmsx502.ccr.corp.intel.com>
Date: Fri, 17 Sep 2010 11:16:16 +0800
From: "Xin, Xiaohui" <xiaohui.xin@...el.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mingo@...e.hu" <mingo@...e.hu>,
"davem@...emloft.net" <davem@...emloft.net>,
"herbert@...dor.hengli.com.au" <herbert@...dor.hengli.com.au>,
"jdike@...ux.intel.com" <jdike@...ux.intel.com>
Subject: RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
>From: Michael S. Tsirkin [mailto:mst@...hat.com]
>Sent: Wednesday, September 15, 2010 7:28 PM
>To: Xin, Xiaohui
>Cc: netdev@...r.kernel.org; kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
>mingo@...e.hu; davem@...emloft.net; herbert@...dor.hengli.com.au;
>jdike@...ux.intel.com
>Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
>
>On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote:
>> >From: Michael S. Tsirkin [mailto:mst@...hat.com]
>> >Sent: Sunday, September 12, 2010 9:37 PM
>> >To: Xin, Xiaohui
>> >Cc: netdev@...r.kernel.org; kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
>> >mingo@...e.hu; davem@...emloft.net; herbert@...dor.hengli.com.au;
>> >jdike@...ux.intel.com
>> >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
>> >
>> >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote:
>> >> >>Playing with rlimit on data path, transparently to the application in this way
>> >> >>looks strange to me, I suspect this has unexpected security implications.
>> >> >>Further, applications may have other uses for locked memory
>> >> >>besides mpassthru - you should not just take it because it's there.
>> >> >>
>> >> >>Can we have an ioctl that lets userspace configure how much
>> >> >>memory to lock? This ioctl will decrement the rlimit and store
>> >> >>the data in the device structure so we can do accounting
>> >> >>internally. Put it back on close or on another ioctl.
>> >> >Yes, we can decrement the rlimit in ioctl in one time to avoid
>> >> >data path.
>> >> >
>> >> >>Need to be careful for when this operation gets called
>> >> >>again with 0 or another small value while we have locked memory -
>> >> >>maybe just fail with EBUSY? or wait until it gets unlocked?
>> >> >>Maybe 0 can be special-cased and deactivate zero-copy?.
>> >> >>
>> >>
>> >> How about we don't use a new ioctl, but just check the rlimit
>> >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device
>> >> break the rlimit, then we fail the bind ioctl, and thus can't do
>> >> zero copy any more.
>> >
>> >Yes, and not just check, but decrement as well.
>> >I think we should give userspace control over
>> >how much memory we can lock and subtract from the rlimit.
>> >It's OK to add this as a parameter to MPASSTHRU_BINDDEV.
>> >Then increment the rlimit back on unbind and on close?
>> >
>> >This opens up an interesting condition: process 1
>> >calls bind, process 2 calls unbind or close.
>> >This will increment rlimit for process 2.
>> >Not sure how to fix this properly.
>> >
>> I can't too, can we do any synchronous operations on rlimit stuff?
>> I quite suspect in it.
>>
>> >--
>> >MST
>
>Here's what infiniband does: simply pass the amount of memory userspace
>wants you to lock on an ioctl, and verify that either you have
>CAP_IPC_LOCK or this number does not exceed the current rlimit. (must
>be on ioctl, not on open, as we likely want the fd passed around between
>processes), but do not decrement rlimit. Use this on following
>operations. Be careful if this can be changed while operations are in
>progress.
>
>This does mean that the effective amount of memory that userspace can
>lock is doubled, but at least it is not unlimited, and we sidestep all
>other issues such as userspace running out of lockable memory simply by
>virtue of using the driver.
>
What I have done in mp device is almost the same as it. The difference is that
I do not check the capability, and I use my own parameter ctor->pages instead
of mm->locked_vm.
So currently, 1) add the capability check 2) use mm->locked_vm 3) add
an ioctl for userspace to configure how much memory can lock.
>--
>MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists