[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f748e40a-04ff-9907-f25d-dfc8d4e5e7b7@nvidia.com>
Date: Wed, 22 Feb 2017 15:58:46 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Balbir Singh <bsingharora@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
CC: Jérôme Glisse <jglisse@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
David Nellans <dnellans@...dia.com>,
Aneesh Kumar KV <aneesh.kumar@...ux.vnet.ibm.com>,
Reza Arbab <arbab@...ux.vnet.ibm.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
<haren@...ux.vnet.ibm.com>, Evgeny Baskakov <ebaskakov@...dia.com>
Subject: Re: [HMM v17 00/14] HMM (Heterogeneous Memory Management) v17
On 02/22/2017 12:27 AM, Balbir Singh wrote:
> On Wed, Feb 22, 2017 at 7:16 PM, Andrew Morton
> <akpm@...ux-foundation.org> wrote:
>> On Wed, 22 Feb 2017 18:19:15 +1100 Balbir Singh <bsingharora@...il.com> wrote:
>>>
>>> Andrew, do we expect to get this in 4.11/4.12? Just curious.
>>>
>>
>> I'll be taking a serious look after -rc1.
>>
>> The lack of reviewed-by, acked-by and tested-by is a concern. It's
>> rather odd for a patchset in the 17th revision! What's up with that?
>>
>> Have you reviewed or tested the patches?
>
> I reviewed v14/15 of the patches. Aneesh reviewed some versions as
> well. I know a few people who tested a small subset of the patches,
> I'll get them to report back as well. I think John Hubbard has been
> testing iterations as well. CC'ing other interested people as well
>
> Balbir
>
Yes, Evgeny Baskakov and I have been testing each of the posted versions. We are using both
migration and mirroring, and have a small set of multi-threaded and multi-device tests. I've been
procastinating about writing up a summary of the test results, partly because the patchset is still
changing (bug fixes, new features, API changes) and so we keep resetting our testing.
We (ahem, actually Evgeny has done most of the work) have been debugging and proposing fixes
directly to Jerome, and that email traffic with Jerome has not been CC-ing this list, so things have
looked a little quieter than they really were.
Anyway, a very rudimentary testing report:
1. What we are testing: Our latest testing (in the last few weeks) has been against Jerome's repo, here:
git://people.freedesktop.org/~glisse/linux (branch: hmm-next)
which has moved ahead from his hmm-v17 branch. hmm-next adds a few bug fixes, and a new feature
(populating CPU pages on a GPU fault). Here are the differences in summary:
$ git diff --stat hmm-v17 hmm-next
drivers/char/Kconfig | 10 +
drivers/char/Makefile | 1 +
drivers/char/hmm_dmirror.c | 1168 +++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/migrate.h | 8 +-
include/uapi/linux/hmm_dmirror.h | 54 +++
mm/hmm.c | 6 +-
mm/migrate.c | 174 ++++++--
7 files changed, 1388 insertions(+), 33 deletions(-)
2. API: As for the driver-kernel API: this is looking OK, although of course the documentation can
be improved. As Jerome already explained, there are missing pieces functionality[1] that will be
added later, and this may change the API, but for now, OK. With this initial API, we can handle both
"device" and CPU page faults, and migrate pages around.
3. More testing plans: TODO: there are a lot of programs that can be easily modified, to use malloc
instead of a special device-centric allocator. On our list.
4. Stability: still a little shaky, as we have some pretty recent bug fixes to try out.
5. Performance: I'll send out another note for that at some point. There was a performance bug that
Jerome just recently fixed, and I want to see how it looks with that fix applied. No real surprises
though.
6. Code reviews: the large size of the patchset, plus the requirement for a complicated driver to
exercise it, makes it less likely for other people to review this patch series. It's a bit
chicken-and-eggy, too, because our UVM driver can't be checked in and shipped until the kernel API
stabilizes. heh.
-----
[1] For example, due to lacking file-backed memory support, some userspace program variables that
are file-backed (initialized globals, etc) have to be mapped (from the device) instead of migrated
to the device, on a device fault.
thanks,
john h
Powered by blists - more mailing lists