[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180924234843.GA23726@yury-thinkpad>
Date: Tue, 25 Sep 2018 02:48:43 +0300
From: Yury Norov <ynorov@...iumnetworks.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
Dan Williams <dan.j.williams@...el.com>,
Huang Ying <ying.huang@...el.com>,
"Michael S . Tsirkin" <mst@...hat.com>,
Michel Lespinasse <walken@...gle.com>,
Souptick Joarder <jrdr.linux@...il.com>,
Willy Tarreau <w@....eu>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH] mm: fix COW faults after mlock()
On Tue, Sep 25, 2018 at 12:22:47AM +0300, Kirill A. Shutemov wrote:
> External Email
>
> On Mon, Sep 24, 2018 at 04:08:52PM +0300, Yury Norov wrote:
> > After mlock() on newly mmap()ed shared memory I observe page faults.
> >
> > The problem is that populate_vma_page_range() doesn't set FOLL_WRITE
> > flag for writable shared memory in mlock() path, arguing that like:
> > /*
> > * We want to touch writable mappings with a write fault in order
> > * to break COW, except for shared mappings because these don't COW
> > * and we would not want to dirty them for nothing.
> > */
> >
> > But they are actually COWed. The most straightforward way to avoid it
> > is to set FOLL_WRITE flag for shared mappings as well as for private ones.
>
> Huh? How do shared mapping get CoWed?
>
> In this context CoW means to create a private copy of the page for the
> process. It only makes sense for private mappings as all pages in shared
> mappings do not belong to the process.
>
> Shared mappings will still get faults, but a bit later -- after the page
> is written back to disc, the page get clear and write protected to catch
> the next write access.
>
> Noticeable exception is tmpfs/shmem. These pages do not belong to normal
> write back process. But the code path is used for other filesystems as
> well.
>
> Therefore, NAK. You only create unneeded write back traffic.
Hi Kirill,
(My first reaction was exactly like yours indeed, but) on my real
system (Cavium OcteonTX2), and on my qemu simulation I can reproduce
the same behavior: just mlock()ed memory causes faults. That faults
happen because page is mapped to the process as read-only, while
underlying VMA is read-write. So faults get resolved well by just
setting write access to the page.
Maybe I use term COW wrongly here, but this is how faultin_page()
works, and it sets FOLL_COW bit before return (which is ignored
on upper level).
I realize that proper fix may be more complex, and if so I'll
thankfully take it and drop this patch from my tree, but this is
all that I have so far to address the problem.
The user code below is reproducer.
Thanks,
Yury
int i, ret, len = getpagesize() * 1000;
char tmpfile[] = "/tmp/my_tmp-XXXXXX";
int fd = mkstemp(tmpfile);
ret = ftruncate(fd, len);
if (ret) {
printf("Failed to ftruncate: %d\n", errno);
goto out;
}
ptr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) {
printf("Failed to mmap memory: %d\n", errno);
goto out;
}
ret = mlock(ptr, len);
if (ret) {
printf("Failed to mlock: %d\n", errno);
goto out;
}
printf("Touch...\n");
for (i = 0; i < len; i++)
ptr[i] = (char) i; /* Faults here. */
printf("\t... done\n");
out:
close(fd);
unlink(tmpfile);
Powered by blists - more mailing lists