lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Sep 2010 07:35:13 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Robert Mueller <robm@...tmail.fm>
cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mel@....ul.ie>, linux-kernel@...r.kernel.org,
	Bron Gondwana <brong@...tmail.fm>,
	linux-mm <linux-mm@...ck.org>
Subject: Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web
 servers

On Tue, 28 Sep 2010, Robert Mueller wrote:

> How would the ACPI information actually be changed?

Fix the BIOS SLIT distance tables.

> I ran numactl -H to get the hardware information, and that seems to
> include distances. As mentioned previously, this is a very standard
> Intel server motherboard.
>
> http://www.intel.com/Products/Server/Motherboards/S5520UR/S5520UR-specifications.htm
>
> Intel 5520 chipset with Intel I/O Controller Hub ICH10R
>
> $ numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6 8 10 12 14
> node 0 size: 24517 MB
> node 0 free: 1523 MB
> node 1 cpus: 1 3 5 7 9 11 13 15
> node 1 size: 24576 MB
> node 1 free: 39 MB
> node distances:
> node   0   1
>   0:  10  21
>   1:  21  10

21 is larger than REMOTE_DISTANCE on x86 and triggers zone_reclaim

19 would keep it off.


> Since I'm not sure what the "distance" values mean, I have no idea if
> those values large or not?

Distance values represent the additional latency necessary to access
remote memory vs local memory (10)

> > 4. Fix the application to be conscious of the effect of memory
> >    allocations on a NUMA systems. Use the numa memory allocations API
> >    to allocate anonymous memory locally for optimal access and set
> >    interleave for the file backed pages.
>
> The problem we saw was purely with file caching. The application wasn't
> actually allocating much memory itself, but it was reading lots of files
> from disk (via mmap'ed memory mostly), and as most people would, we
> expected that data would be cached in memory to reduce future reads from
> disk. That was not happening.

Obviously and you have stated that numerous times. Problem that the use of
a remote memory will reduced performance of reads so the OS (with
zone_reclaim=1) defaults to the use of local memory and favors reclaim of
local memory over the allocation from the remote node. This is fine if
you have multiple applications running on both nodes because then each
application will get memory local to it and therefore run faster. That
does not work with a single app that only allocates from one node.

Control over memory allocations over the various nodes under NUMA
for a process can occur via the numactl ctl or the libnuma C apis.

F.e.e

numactl --interleave ... command

will address that issue for a specific command that needs to go
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ