linux-kernel - Re: [RFC] Scheduler: DMA Engine regression because of sched/fair changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220121134640.ghdq3wbwa5jcfplz@yadro.com>
Date:   Fri, 21 Jan 2022 16:46:40 +0300
From:   Alexander Fomichev <fomichev.ru@...il.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
        dmaengine@...r.kernel.org, linux@...ro.com,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC] Scheduler: DMA Engine regression because of sched/fair
 changes

On Fri, Jan 21, 2022 at 06:12:17PM +0800, Hillf Danton wrote:
> On Wed, 19 Jan 2022 15:55:13 +0300 Alexander Fomichev wrote:
> >On Tue, Jan 18, 2022 at 10:04:48AM +0800, Hillf Danton wrote:
> >> On Mon, 17 Jan 2022 20:44:19 +0300 Alexander Fomichev wrote:
> >> > On Mon, Jan 17, 2022 at 10:27:01AM +0000, Mel Gorman wrote:
> >> > 
> >> > -----< v5.15.8-vanilla >-----
> >> > [17057.866760] dmatest: Added 1 threads using dma0chan0
> >> > [17060.133880] dmatest: Started 1 threads using dma0chan0
> >> > [17060.154343] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 49338.85 iops 3157686 KB/s (0)
> >> > [17063.737887] dmatest: Added 1 threads using dma0chan0
> >> > [17065.113838] dmatest: Started 1 threads using dma0chan0
> >> > [17065.137659] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 42183.41 iops 2699738 KB/s (0)
> >> > [17100.339989] dmatest: Added 1 threads using dma0chan0
> >> > [17102.190764] dmatest: Started 1 threads using dma0chan0
> >> > [17102.214285] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 42844.89 iops 2742073 KB/s (0)
> >> > -----< end >-----
> >> > 
> >
> >Just to remind, used dmatest parameters:
> >
> >/sys/module/dmatest/parameters/iterations:1000
> >/sys/module/dmatest/parameters/alignment:-1
> >/sys/module/dmatest/parameters/verbose:N
> >/sys/module/dmatest/parameters/norandom:Y
> >/sys/module/dmatest/parameters/max_channels:0
> >/sys/module/dmatest/parameters/dmatest:0
> >/sys/module/dmatest/parameters/polled:N
> >/sys/module/dmatest/parameters/threads_per_chan:1
> >/sys/module/dmatest/parameters/noverify:Y
> >/sys/module/dmatest/parameters/test_buf_size:1048576
> >/sys/module/dmatest/parameters/transfer_size:65536
> >/sys/module/dmatest/parameters/run:N
> >/sys/module/dmatest/parameters/wait:Y
> >/sys/module/dmatest/parameters/timeout:2000
> >/sys/module/dmatest/parameters/xor_sources:3
> >/sys/module/dmatest/parameters/pq_sources:3
> 
> 
> See if tuning back down 10 degree can close the gap in iops, in the
> assumption that the prev CPU can be ignored in case of cold cache.
> 
> Also want to see the diff in output of "cat /proc/interrupts" before
> and after dmatest, wondering if the dma irq is bond to a CPU core of
> dancing on several ones.
> 
> Hillf
> 
> +++ x/kernel/sched/fair.c
> @@ -5888,20 +5888,10 @@ static int wake_wide(struct task_struct
>  static int
>  wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>  {
> -	/*
> -	 * If this_cpu is idle, it implies the wakeup is from interrupt
> -	 * context. Only allow the move if cache is shared. Otherwise an
> -	 * interrupt intensive workload could force all tasks onto one
> -	 * node depending on the IO topology or IRQ affinity settings.
> -	 *
> -	 * If the prev_cpu is idle and cache affine then avoid a migration.
> -	 * There is no guarantee that the cache hot data from an interrupt
> -	 * is more important than cache hot data on the prev_cpu and from
> -	 * a cpufreq perspective, it's better to have higher utilisation
> -	 * on one CPU.
> -	 */
> -	if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> -		return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> +	/* select this cpu because of cold cache */
> +	if (cpus_share_cache(this_cpu, prev_cpu))
> +		if (available_idle_cpu(this_cpu))
> +			return this_cpu;
>  
>  	if (sync && cpu_rq(this_cpu)->nr_running == 1)
>  		return this_cpu;
> --

Hi Hillf,

Thanks for the information.
With the recent patch (I called it patch2) the results are following:

-----< 5.15.8-Hillf-Danton-patch2+ noverify=Y >-----
[  646.568455] dmatest: Added 1 threads using dma0chan0
[  661.127077] dmatest: Started 1 threads using dma0chan0
[  661.147156] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 50251.25 iops 3216080 KB/s (0)
[  675.132323] dmatest: Added 1 threads using dma0chan0
[  676.205829] dmatest: Started 1 threads using dma0chan0
[  676.225991] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 50022.50 iops 3201440 KB/s (0)
[  703.100813] dmatest: Added 1 threads using dma0chan0
[  704.933579] dmatest: Started 1 threads using dma0chan0
[  704.953733] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 49950.04 iops 3196803 KB/s (0)
-----< end >-----

Also I have re-run the test with 'noverify=N' option, just for
illustration.

-----< 5.15.8-Hillf-Danton-patch2+ noverify=N >-----
[ 1614.739687] dmatest: Added 1 threads using dma0chan0
[ 1620.346536] dmatest: Started 1 threads using dma0chan0
[ 1623.254880] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23544.92 iops 1506875 KB/s (0)
[ 1634.974200] dmatest: Added 1 threads using dma0chan0
[ 1635.981532] dmatest: Started 1 threads using dma0chan0
[ 1638.892182] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23703.98 iops 1517055 KB/s (0)
[ 1652.878143] dmatest: Added 1 threads using dma0chan0
[ 1655.235130] dmatest: Started 1 threads using dma0chan0
[ 1658.143206] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23526.64 iops 1505705 KB/s (0)
-----< end >-----

/proc/interrupts changes before/after the test:

-----< interrupts.diff >-----
- 184:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0       6000          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI 103813120-edge      0000:c6:00.2
+ 184:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0       9000          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI 103813120-edge      0000:c6:00.2
-----< end >-----

It looks like the MSI handler is called on the same CPU all the time.

-- 
Regards,
  Alexander