[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f18063c-c76b-4728-5145-810f069988ea@redhat.com>
Date: Thu, 14 Jun 2018 11:01:59 -0400
From: Nitesh Narayan Lal <nilal@...hat.com>
To: Wei Wang <wei.w.wang@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Michael S. Tsirkin" <mst@...hat.com>
Cc: KVM list <kvm@...r.kernel.org>,
virtualization <virtualization@...ts.linux-foundation.org>,
Network Development <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Bjorn Andersson <bjorn.andersson@...aro.org>
Subject: Re: [PULL] vhost: cleanups and fixes
Hi Wei,
On 06/12/2018 07:05 AM, Wei Wang wrote:
> On 06/12/2018 09:59 AM, Linus Torvalds wrote:
>> On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin <mst@...hat.com>
>> wrote:
>>> Maybe it will help to have GFP_NONE which will make any allocation
>>> fail if attempted. Linus, would this address your comment?
>> It would definitely have helped me initially overlook that call chain.
>>
>> But then when I started looking at the whole dma_map_page() thing, it
>> just raised my hackles again.
>>
>> I would seriously suggest having a much simpler version for the "no
>> allocation, no dma mapping" case, so that it's *obvious* that that
>> never happens.
>>
>> So instead of having virtio_balloon_send_free_pages() call a really
>> generic complex chain of functions that in _some_ cases can do memory
>> allocation, why isn't there a short-circuited "vitruque_add_datum()"
>> that is guaranteed to never do anything like that?
>>
>> Honestly, I look at "add_one_sg()" and it really doesn't make me
>> happy. It looks hacky as hell. If I read the code right, you're really
>> trying to just queue up a simple tuple of <pfn,len>, except you encode
>> it as a page pointer in order to play games with the SG logic, and
>> then you hmap that to the ring, except in this case it's all a fake
>> ring that just adds the cpu-physical address instead.
>>
>> And to figuer that out, it's like five layers of indirection through
>> different helper functions that *can* do more generic things but in
>> this case don't.
>>
>> And you do all of this from a core VM callback function with some
>> _really_ core VM locks held.
>>
>> That makes no sense to me.
>>
>> How about this:
>>
>> - get rid of all that code
>>
>> - make the core VM callback save the "these are the free memory
>> regions" in a fixed and limited array. One that DOES JUST THAT. No
>> crazy "SG IO dma-mapping function crap". Just a plain array of a fixed
>> size, pre-allocated for that virtio instance.
>>
>> - make it obvious that what you do in that sequence is ten
>> instructions and no allocations ("Look ma, I wrote a value to an array
>> and incremented the array idex, and I'M DONE")
>>
>> - then in that workqueue entry that you start *anyway*, you empty the
>> array and do all the crazy virtio stuff.
>>
>> In fact, while at it, just simplify the VM interface too. Instead of
>> traversing a random number of buddy lists, just trraverse *one* - the
>> top-level one. Are you seriously ever going to shrink or mark
>> read-only anythin *but* something big enough to be in the maximum
>> order?
>>
>> MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really*
>> want the balloon code to work on smaller things, particularly since
>> the whole interface is fundamentally racy and opportunistic to begin
>> with?
>
> OK, I will implement a new version based on the suggestions. Thanks.
I have been working on a similar series [1] that is more generic, which
solves the problem of giving unused memory back to the host and could be
used to solve the migration problem as well. Can you take a look and see
if you can use my series in some way?
[1] https://www.spinics.net/lists/kvm/msg170113.html
>
> Best,
> Wei
>
--
Regards
Nitesh
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists