[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140708153558.GB24698@nhori>
Date: Tue, 8 Jul 2014 11:35:58 -0400
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Konstantin Khlebnikov <koct9i@...il.com>,
Wu Fengguang <fengguang.wu@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Borislav Petkov <bp@...en8.de>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Johannes Weiner <hannes@...xchg.org>,
Rusty Russell <rusty@...tcorp.com.au>,
David Miller <davem@...emloft.net>,
Andres Freund <andres@...quadrant.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Christoph Hellwig <hch@...radead.org>,
Dave Chinner <david@...morbit.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Linux API <linux-api@...r.kernel.org>,
Naoya Horiguchi <nao.horiguchi@...il.com>,
Kees Cook <kees@...flux.net>
Subject: Re: [PATCH v3 1/3] mm: introduce fincore()
On Mon, Jul 07, 2014 at 03:44:22PM -0700, Dave Hansen wrote:
> On 07/07/2014 02:48 PM, Naoya Horiguchi wrote:
> > On Mon, Jul 07, 2014 at 01:43:31PM -0700, Dave Hansen wrote:
> >> The whole FINCORE_PGOFF vs. FINCORE_BMAP issue is something that will
> >> come up in practice. We just don't have the interfaces for an end user
> >> to pick which one they want to use.
> >>
> >>>> Is it really right to say this is going to be 8 bytes? Would we want it
> >>>> to share types with something else, like be an loff_t?
> >>>
> >>> Could you elaborate it more?
> >>
> >> We specify file offsets in other system calls, like the lseek family. I
> >> was just thinking that this type should match up with those calls since
> >> they are expressing the same data type with the same ranges and limitations.
> >
> > The 2nd parameter is loff_t, do we already do this?
>
> I mean the fields in the buffer, like:
>
> > +Any of the following flags are to be set to add an 8 byte field in each entry.
> > +You can set any of these flags at the same time, although you can't set
> > +FINCORE_BMAP combined with these 8 byte field flags.
Thanks. And OK, we can make it depending on arch or config
(although in currnet version only x86_64 is supported.)
>
> >>>> This would essentially tell userspace where in the kernel's address
> >>>> space some user-controlled data will be.
> >>>
> >>> OK, so this and FINCORE_PAGEFLAGS will be limited for privileged users.
> >
> > Sorry, this statement of mine might a bit short-sighted, and I'd like
> > to revoke it.
> > I think that some page flags and/or numa info should be useful outside
> > the debugging environment, and safe to expose to userspace. So limiting
> > to bitmap-one for unprivileged users is too strict.
>
> The PFN is not the same as NUMA information, and the PFN is insufficient
> to describe the NUMA node on all systems that Linux supports.
Agree.
> Trying to get NUMA information back out is a good goal, but doing it
> with PFNs is a bad idea since they have so many consequences.
Yes, so a separate field for NUMA node is helpful. PFN is purely for
debugging.
> I'm also bummed exporting NUMA information was a design goal of these
> patches, but they weren't mentioned in any of the patch descriptions.
OK, I'll add it with some documentation in the next post.
> >> Then I'd just question their usefulness outside of a debugging
> >> environment, especially when you can get at them in other (more
> >> roundabout) ways in a debugging environment.
> >>
> >> This is really looking to me like two system calls. The bitmap-based
> >> one, and another more extensible one. I don't think there's any harm in
> >> having two system calls, especially when they're trying to glue together
> >> two disparate interfaces.
> >
> > I think that if separating syscall into two, one for privileged users
> > and one for unprivileged users migth be fine (rather than bitmap-based
> > one and extensible one.)
>
> The problem as I see it is shoehorning two interfaces in to the same
> syscall. If there are privileged and unprivileged operations that use
> the same _interfaces_ I think they should share a syscall.
Hmm, if we think that bitmap one and extensible one are using different
interfaces, should we also consider that different modes in extensible
one are using different interfaces (whose entry per page is variable in
length)?
It seems to me just a problem about how differently we use the user buffer,
rather than about different interfaces.
Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists