[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZqkX3mYBPuUf0Gi5@pc636>
Date: Tue, 30 Jul 2024 18:42:06 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Huang Adrian <adrianhuang0701@...il.com>
Cc: Uladzislau Rezki <urezki@...il.com>, ahuang12@...ovo.com,
akpm@...ux-foundation.org, andreyknvl@...il.com, bhe@...hat.com,
dvyukov@...gle.com, glider@...gle.com, hch@...radead.org,
kasan-dev@...glegroups.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, ryabinin.a.a@...il.com, sunjw10@...ovo.com,
vincenzo.frascino@....com
Subject: Re: [PATCH 1/1] mm/vmalloc: Combine all TLB flush operations of
KASAN shadow virtual address into one operation
On Wed, Jul 31, 2024 at 12:27:27AM +0800, Huang Adrian wrote:
> On Tue, Jul 30, 2024 at 7:38 PM Uladzislau Rezki <urezki@...il.com> wrote:
> >
> > > On Mon, Jul 29, 2024 at 7:29 PM Uladzislau Rezki <urezki@...il.com> wrote:
> > > > It would be really good if Adrian could run the "compiling workload" on
> > > > his big system and post the statistics here.
> > > >
> > > > For example:
> > > > a) v6.11-rc1 + KASAN.
> > > > b) v6.11-rc1 + KASAN + patch.
> > >
> > > Sure, please see the statistics below.
> > >
> > > Test Result (based on 6.11-rc1)
> > > ===============================
> > >
> > > 1. Profile purge_vmap_node()
> > >
> > > A. Command: trace-cmd record -p function_graph -l purge_vmap_node make -j $(nproc)
> > >
> > > B. Average execution time of purge_vmap_node():
> > >
> > > no patch (us) patched (us) saved
> > > ------------- ------------ -----
> > > 147885.02 3692.51 97%
> > >
> > > C. Total execution time of purge_vmap_node():
> > >
> > > no patch (us) patched (us) saved
> > > ------------- ------------ -----
> > > 194173036 5114138 97%
> > >
> > > [ftrace log] Without patch: https://gist.github.com/AdrianHuang/a5bec861f67434e1024bbf43cea85959
> > > [ftrace log] With patch: https://gist.github.com/AdrianHuang/a200215955ee377288377425dbaa04e3
> > >
> > > 2. Use `time` utility to measure execution time
> > >
> > > A. Command: make clean && time make -j $(nproc)
> > >
> > > B. The following result is the average kernel execution time of five-time
> > > measurements. ('sys' field of `time` output):
> > >
> > > no patch (seconds) patched (seconds) saved
> > > ------------------ ---------------- -----
> > > 36932.904 31403.478 15%
> > >
> > > [`time` log] Without patch: https://gist.github.com/AdrianHuang/987b20fd0bd2bb616b3524aa6ee43112
> > > [`time` log] With patch: https://gist.github.com/AdrianHuang/da2ea4e6aa0b4dcc207b4e40b202f694
> > >
> > I meant another statistics. As noted here https://lore.kernel.org/linux-mm/ZogS_04dP5LlRlXN@pc636/T/#m5d57f11d9f69aef5313f4efbe25415b3bae4c818
> > i came to conclusion that below place and lock:
> >
> > <snip>
> > static void exit_notify(struct task_struct *tsk, int group_dead)
> > {
> > bool autoreap;
> > struct task_struct *p, *n;
> > LIST_HEAD(dead);
> >
> > write_lock_irq(&tasklist_lock);
> > ...
> > <snip>
> >
> > keeps IRQs disabled, so it means that the purge_vmap_node() does the progress
> > but it can be slow.
> >
> > CPU_1:
> > disables IRQs
> > trying to grab the tasklist_lock
> >
> > CPU_2:
> > Sends an IPI to CPU_1
> > waits until the specified callback is executed on CPU_1
> >
> > Since CPU_1 has disabled IRQs, serving an IPI and completion of callback
> > takes time until CPU_1 enables IRQs back.
> >
> > Could you please post lock statistics for kernel compiling use case?
> > KASAN + patch is enough, IMO. This just to double check whether a
> > tasklist_lock is a problem or not.
>
> Sorry for the misunderstanding.
>
> Two experiments are shown as follows. I saw you think KASAN + patch is
> enough. But, in case you need another one. ;-)
>
> a) v6.11-rc1 + KASAN
>
> The result is different from yours, so I ran two tests (make sure the
> soft lockup warning was triggered).
>
> Test #1: waittime-max = 5.4ms
> <snip>
> ...
> class name con-bounces contentions waittime-min waittime-max
> waittime-total waittime-avg acq-bounces acquisitions
> holdtime-min holdtime-max holdtime-total holdtime-avg
> ...
> tasklist_lock-W: 118762 120090 0.44
> 5443.22 24807413.37 206.57 429757 569051
> 2.27 3222.00 69914505.87 122.86
> tasklist_lock-R: 108262 108300 0.41
> 5381.34 23613372.10 218.04 489132 541541
> 0.20 5543.40 10095470.68 18.64
> ---------------
> tasklist_lock 44594 [<0000000099d3ea35>]
> exit_notify+0x82/0x900
> tasklist_lock 32041 [<0000000058f753d8>]
> release_task+0x104/0x3f0
> tasklist_lock 99240 [<000000008524ff80>]
> __do_wait+0xd8/0x710
> tasklist_lock 43435 [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
> ---------------
> tasklist_lock 98334 [<0000000099d3ea35>]
> exit_notify+0x82/0x900
> tasklist_lock 82649 [<0000000058f753d8>]
> release_task+0x104/0x3f0
> tasklist_lock 2 [<00000000da5a7972>]
> mm_update_next_owner+0xc0/0x430
> tasklist_lock 26708 [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
> ...
> <snip>
>
> Test #2:waittime-max = 5.7ms
> <snip>
> ...
> class name con-bounces contentions waittime-min waittime-max
> waittime-total waittime-avg acq-bounces acquisitions
> holdtime-min holdtime-max holdtime-total holdtime-avg
> ...
> tasklist_lock-W: 121742 123167 0.43
> 5713.02 25252257.61 205.02 432111 569762
> 2.25 3083.08 70711022.74 124.11
> tasklist_lock-R: 111479 111523 0.39
> 5050.50 24557264.88 220.20 491404 542221
> 0.20 5611.81 10007782.09 18.46
> ---------------
> tasklist_lock 102317 [<000000008524ff80>]
> __do_wait+0xd8/0x710
> tasklist_lock 44606 [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
> tasklist_lock 45584 [<0000000099d3ea35>]
> exit_notify+0x82/0x900
> tasklist_lock 32969 [<0000000058f753d8>]
> release_task+0x104/0x3f0
> ---------------
> tasklist_lock 100498 [<0000000099d3ea35>]
> exit_notify+0x82/0x900
> tasklist_lock 27401 [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
> tasklist_lock 85473 [<0000000058f753d8>]
> release_task+0x104/0x3f0
> tasklist_lock 650 [<000000004d0b9f6b>]
> tty_open_proc_set_tty+0x23/0x210
> ...
> <snip>
>
>
> b) v6.11-rc1 + KASAN + patch: waittime-max = 5.7ms
> <snip>
> ...
> class name con-bounces contentions waittime-min waittime-max
> waittime-total waittime-avg acq-bounces acquisitions
> holdtime-min holdtime-max holdtime-total holdtime-avg
> ...
> tasklist_lock-W: 108876 110087 0.33
> 5688.64 18622460.43 169.16 426740 568715
> 1.94 2930.76 62560515.48 110.00
> tasklist_lock-R: 99864 99909 0.43
> 5868.69 17849478.20 178.66 487654 541328
> 0.20 5709.98 9207504.90 17.01
> ---------------
> tasklist_lock 91655 [<00000000a622e532>]
> __do_wait+0xd8/0x710
> tasklist_lock 41100 [<00000000ccf53925>]
> exit_notify+0x82/0x900
> tasklist_lock 8254 [<00000000093ccded>]
> tty_open_proc_set_tty+0x23/0x210
> tasklist_lock 39542 [<00000000a0e6bf4d>]
> copy_process+0x2a46/0x50f0
> ---------------
> tasklist_lock 90525 [<00000000ccf53925>]
> exit_notify+0x82/0x900
> tasklist_lock 76934 [<00000000cb7ca00c>]
> release_task+0x104/0x3f0
> tasklist_lock 23723 [<00000000a0e6bf4d>]
> copy_process+0x2a46/0x50f0
> tasklist_lock 18223 [<00000000a622e532>]
> __do_wait+0xd8/0x710
> ...
> <snip>
>
Thank you for posting this! So tasklist_lock is not a problem.
I assume you have a full output of lock_stat. Could you please
paste it for v6.11-rc1 + KASAN?
Thank you!
--
Uladzislau Rezki
Powered by blists - more mailing lists