[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240108214346.5fd93127@namcao>
Date: Mon, 8 Jan 2024 21:43:46 +0100
From: Nam Cao <namcao@...utronix.de>
To: Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
<palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, Jisheng
Zhang <jszhang@...nel.org>
Subject: Re: [PATCH v2] riscv: Add support for BATCHED_UNMAP_TLB_FLUSH
On Mon, 8 Jan 2024 20:36:40 +0100 Alexandre Ghiti <alexghiti@...osinc.com> wrote:
> Allow to defer the flushing of the TLB when unmapping pages, which allows
> to reduce the numbers of IPI and the number of sfence.vma.
>
> The ubenchmarch used in commit 43b3dfdd0455 ("arm64: support
> batched/deferred tlb shootdown during page reclamation/migration") that
> was multithreaded to force the usage of IPI shows good performance
> improvement on all platforms:
>
> * Unmatched: ~34%
> * TH1520 : ~78%
> * Qemu : ~81%
>
> In addition, perf on qemu reports an important decrease in time spent
> dealing with IPIs:
>
> Before: 68.17% main [kernel.kallsyms] [k] __sbi_rfence_v02_call
> After : 8.64% main [kernel.kallsyms] [k] __sbi_rfence_v02_call
>
> * Benchmark:
>
> int stick_this_thread_to_core(int core_id) {
> int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
> if (core_id < 0 || core_id >= num_cores)
> return EINVAL;
>
> cpu_set_t cpuset;
> CPU_ZERO(&cpuset);
> CPU_SET(core_id, &cpuset);
>
> pthread_t current_thread = pthread_self();
> return pthread_setaffinity_np(current_thread,
> sizeof(cpu_set_t), &cpuset);
> }
>
> static void *fn_thread (void *p_data)
> {
> int ret;
> pthread_t thread;
>
> stick_this_thread_to_core((int)p_data);
>
> while (1) {
> sleep(1);
> }
>
> return NULL;
> }
>
> int main()
> {
> volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> pthread_t threads[4];
> int ret;
>
> for (int i = 0; i < 4; ++i) {
> ret = pthread_create(&threads[i], NULL, fn_thread, (void *)i);
> if (ret)
> {
> printf("%s", strerror (ret));
> }
> }
>
> memset(p, 0x88, SIZE);
>
> for (int k = 0; k < 10000; k++) {
> /* swap in */
> for (int i = 0; i < SIZE; i += 4096) {
> (void)p[i];
> }
>
> /* swap out */
> madvise(p, SIZE, MADV_PAGEOUT);
> }
>
> for (int i = 0; i < 4; i++)
> {
> pthread_cancel(threads[i]);
> }
>
> for (int i = 0; i < 4; i++)
> {
> pthread_join(threads[i], NULL);
> }
>
> return 0;
> }
>
> Signed-off-by: Alexandre Ghiti <alexghiti@...osinc.com>
> Reviewed-by: Jisheng Zhang <jszhang@...nel.org>
> Tested-by: Jisheng Zhang <jszhang@...nel.org> # Tested on TH1520
Before:
real 0m36.674s
user 0m0.173s
sys 0m36.493s
After:
real 0m18.016s
user 0m0.125s
sys 0m17.885s
Tested-by: Nam Cao <namcao@...utronix.de>
Best regards,
Nam
Powered by blists - more mailing lists