[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161220232122.62c8196e@roar.ozlabs.ibm.com>
Date: Tue, 20 Dec 2016 23:21:22 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Bob Peterson <rpeterso@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
swhiteho@...hat.com, luto@...nel.org, agruenba@...hat.com,
peterz@...radead.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH] make global bitlock waitqueues per-node
On Tue, 20 Dec 2016 12:58:25 +0000
Mel Gorman <mgorman@...hsingularity.net> wrote:
> On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote:
> > On Mon, 19 Dec 2016 16:20:05 -0800
> > Dave Hansen <dave.hansen@...ux.intel.com> wrote:
> >
> > > On 12/19/2016 03:07 PM, Linus Torvalds wrote:
> > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > > +{
> > > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word));
> > > > +
> > > > + return __bit_waitqueue(word, bit, nid);
> > > >
> > > > No can do. Part of the problem with the old coffee was that it did that
> > > > virt_to_page() crud. That doesn't work with the virtually mapped stack.
> > >
> > > Ahhh, got it.
> > >
> > > So, what did you have in mind? Just redirect bit_waitqueue() to the
> > > "first_online_node" waitqueues?
> > >
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > > return __bit_waitqueue(word, bit, first_online_node);
> > > }
> > >
> > > We could do some fancy stuff like only do virt_to_page() for things in
> > > the linear map, but I'm not sure we'll see much of a gain for it. None
> > > of the other waitqueue users look as pathological as the 'struct page'
> > > ones. Maybe:
> > >
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > > int nid
> > > if (word >= VMALLOC_START) /* all addrs not in linear map */
> > > nid = first_online_node;
> > > else
> > > nid = page_to_nid(virt_to_page(word));
> > > return __bit_waitqueue(word, bit, nid);
> > > }
> >
> > I think he meant just make the page_waitqueue do the per-node thing
> > and leave bit_waitqueue as the global bit.
> >
>
> I'm pressed for time but at a glance, that might require a separate
> structure of wait_queues for page waitqueue. Most users of bit_waitqueue
> are not operating with pages. The first user is based on a word inside
> a block_device for example. All non-page users could assume node-0.
Yes it would require something or other like that. Trivial to keep things
balanced (if not local) over nodes by take a simple hash of the virtual
address to spread over the nodes. Or just keep using this separate global
table for the bit_waitqueue...
But before Linus grumps at me again, let's try to do the waitqueue
avoidance bit first before we worry about that :)
> It
> shrinks the available hash table space but as before, maybe collisions
> are not common enough to actually matter. That would be worth checking
> out. Alternatively, careful auditing to pick a node when it's known it's
> safe to call virt_to_page may work but it would be fragile.
>
> Unfortunately I won't be able to review or test any patches until January
> 3rd after I'm back online properly. Right now, I have intermittent internet
> access at best. During the next 4 days, I know I definitely will not have
> any internet access.
>
> The last time around, there were three patch sets to avoid the overhead for
> pages in particular. One was dropped (mine, based on Nick's old work) as
> it was too complicated. Peter had some patches but after enough hammering
> it failed due to a missed wakup that I didn't pin down before having to
> travel to a conference.
>
> I hadn't tested Nick's prototype although it looked fine because others
> reviewed it before I looked and I was waiting for another version to
> appear. If one appears, I'll take a closer look and bash it across a few
> machines to see if it has any lost wakeup problems.
>
Sure I'll respin it this week.
Thanks,
Nick
Powered by blists - more mailing lists