linux-kernel - Re: [PATCH 13/40] autonuma: CPU follow memory algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPQyPG62yZrZpAz1un6UABjALbUHNT+VTS3A2MhHKMJW1Ykq8Q@mail.gmail.com>
Date:	Sat, 30 Jun 2012 05:02:25 +0800
From:	Nai Xia <nai.xia@...il.com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>,
	Alex Shi <alex.shi@...el.com>,
	Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Don Morris <don.morris@...com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [PATCH 13/40] autonuma: CPU follow memory algorithm

On Sat, Jun 30, 2012 at 12:30 AM, Andrea Arcangeli <aarcange@...hat.com> wrote:
> Hi Nai,
>
> On Fri, Jun 29, 2012 at 10:11:35PM +0800, Nai Xia wrote:
>> If one process do very intensive visit of a small set of pages in this
>> node, but occasional visit of a large set of pages in another node.
>> Will this algorithm do a very bad judgment? I guess the answer would
>> be: it's possible and this judgment depends on the racing pattern
>> between the process and your knuma_scand.
>
> Depending if the knuma_scand/scan_pass_sleep_millisecs is more or less
> occasional than the visit of a large set of pages it may behave
> differently correct.
>
> Note that every algorithm will have a limit on how smart it can be.
>
> Just to make a random example: if you lookup some pagecache a million
> times and some other pagecache a dozen times, their "aging"
> information in the pagecache will end up identical. Yet we know one
> set of pages is clearly higher priority than the other. We've only so
> many levels of lrus and so many referenced/active bitflags per
> page. Once you get at the top, then all is equal.
>
> Does this mean the "active" list working set detection is useless just
> because we can't differentiate a million of lookups on a few pages, vs
> a dozen of lookups on lots of pages?
>
> Last but not the least, in the very example you mention it's not even
> clear that the process should be scheduled in the CPU where there is
> the small set of pages accessed frequently, or the CPU where there's
> the large set of pages accessed occasionally. If the small sets of
> pages fits in the 8MBytes of the L2 cache, then it's better to put the
> process in the other CPU where the large set of pages can't fit in the
> L2 cache. Lots of hardware details should be evaluated, to really know
> what's the right thing in such case even if it was you having to
> decide.
>
> But the real reason why the above isn't an issue and why we don't need
> to solve that problem perfectly: there's not just a CPU follow memory
> algorithm in AutoNUMA. There's also the memory follow CPU
> algorithm. AutoNUMA will do its best to change the layout of your
> example to one that has only one clear solution: the occasional lookup
> of the large set of pages, will make those eventually go in the node
> together with the small set of pages (or the other way around), and
> this is how it's solved.
>
> In any case, whatever wrong decision it will take, it will at least be
> a better decision than the numa/sched where there's absolutely zero
> information about what pages the process is accessing. And best of all
> with AutoNUMA you also know which pages the _thread_ is accessing so
> it will also be able to take optimal decisions if there are more
> threads than CPUs in a node (as long as not all thread accesses are
> shared).
>
> Hope this explains things better.
> Andrea

Hi Andrea,

Sorry for being so negative, but this problem seems so clear to me.
I might have pointed all these out, if you CC me since the first version,
I am not always on the list watching posts....

Sincerely,

Nai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/