[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4A07E646.7090100@rz.rwth-aachen.de>
Date: Mon, 11 May 2009 10:48:06 +0200
From: Dieter an Mey <anmey@...rwth-aachen.de>
To: Stefan Lankes <lankes@...s.rwth-aachen.de>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/4]: affinity-on-next-touch
Hello,
I am supporting Stefan's activity from the parallel programmer's
perspective and I would be happy to provide further input, if needed.
best regards
Dieter
Stefan Lankes schrieb:
> Hello,
>
> I wrote a patch to support the adaptive data distribution strategy
> "affinity-on-next-touch" for NUMA architectures. The patch is in an early
> state and I am interested in your comments.
>
> The basic idea of "affinity-on-next-touch" is this: Via some runtime
> mechanism, a user-level process activates "affinity-on-next-touch" for a
> certain region of its virtual memory space. Afterwards, each page in this
> region will be migrated to that node which next tries to access it.
> Noordergraaf and van der Pas [1] have proposed to extend the OpenMP standard
> to support this strategy. Since version 9, the “affinity-on-next-touch”
> mechanism is available in Solaris and can be triggered via the madvise
> system call. Löf and Homgren [2] and Terboven et al. [3] have described
> their encouraging experiences with this implementation.
>
> Linux does not yet support "affinity-on-next-touch". Terboven et al. [3]
> have presented a user-level implementation of this strategy for Linux. To
> realize "affinity-on-next-touch" in user space, they protect a specific
> memory area from read and write accesses and install a signal handler to
> catch access violations. If a thread accesses a page in the protected memory
> area, the signal handler migrates the page to the node which handled the
> access violation. Afterwards, the signal handler clears the page protection
> and the interrupted thread is resumed.
>
> Unfortunately, the overhead of this solution is very high. For instance, to
> distribute 512 MByte via "affinity-on-next-touch" the user-level solution
> needs 2518ms on our dual-socket, quad-core Opteron 2376 system with the
> current kernel (2.6.30-rc4). I evaluated this overhead with the following
> OpenMP code:
>
> madvise(array, sizeof(int) * SIZE, MADV_ACCESS_LWP);
> start = omp_get_wtime();
> #pragma omp parallel for
> for (j = 0; j < SIZE; j += pagesize/sizeof(int))
> array[j]++;
> end = omp_get_wtime();
> printf("time: %lf ms\n", (end - start) * 1000.0);
>
> The benchmark uses 8 threads and each thread is bound to one core.
>
> I divide my patch into the following 4 parts:
>
> [Patch 1/4]: Extend the system call madvise with a new parameter
> MADV_ACCESS_LWP (the same as used in Solaris). The specified memory area
> then uses "affinity-on-next-touch". In this case, madvise_access_lwp
> protects the memory area from read and write access. To avoid unnecessary
> list operations, the patch changes the permissions only in the page table
> entries and does not update the list of VMAs. Beside this, the system call
> madvise set also the new “untouched bit” of the "page" record.
>
> [Patch 2/4]: The pte fault handler detects, via a new "untouched bit" inside
> of the "page" record, that the page which the thread tried to access uses
> “affinity-on-next-touch”. Afterwards, the kernel reads the original
> permissions from vm_area_struct, restores them in the page tables and
> migrates the page to the current node. To accelerate page migration, the
> patch avoids unnecessary calls to migrate_prep().
>
> [Patch 3/4]: If the "untouched" bit is set, mprotect isn’t permitted to
> change the permission in the page table entry. By using of
> "affinity-on-next-touch", the access permission will be set by the pte fault
> handler.
>
> [Patch 4/4]: This part of the patch adds some counters to detect migration
> errors and publishes these counters via /proc/vmstat. Besides this, the
> Kconfig file is extend with the parameter CONFIG_AFFINITY_ON_NEXT_TOUCH.
>
> With this patch, the kernel reduces the overhead of page distribution via
> "affinity-on-next-touch" from 2518ms to 366ms compared to the user-level
> approach. Currently, I'm evaluating the performance of the patch with some
> other benchmarks and test applications (stream benchmark, Jacobi solver, PDE
> solver,...).
>
> I am very interested in your comments!
>
> Stefan
>
> [1] Noordergraaf, L., van der Pas, R.: Performance Experiences on Suns
> WildFire Prototype. In: Proceedings of the 1999 ACM/IEEE conference on
> Supercomputing,Portland, Oregon, USA (November 1999)
>
> [2] Löf, H., Holmgren, S.: affinity-on-next-touch: Increasing the
> Performance of an Industrial PDE Solver on a cc-NUMA System. In: Proceedings
> of the 19th Annual International Conference on Supercomputing, Cambridge,
> Massachusetts, USA (June 2005) 387–392
>
> [3]. Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data
> and Thread Affinity in OpenMP Programs. In: Proceedings of the 2008 Workshop
> on Memory Access on future Processors: A solved problem?, ACM International
> Conference on Computing Frontiers, Ischia, Italy (May 2008) 377–384
>
>
>
--
Dipl.-Math. Dieter an Mey, HPC Team Lead
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)
Phone: + 49 241 80 24377 - Fax/UMS: + 49 241 80 624377
mailto:anmey@...rwth-aachen.de http://www.rz.rwth-aachen.de
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5773 bytes)
Powered by blists - more mailing lists