lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2627025ab96a315af0e76e5983c803578623c826.camel@linux.intel.com>
Date:   Thu, 17 Feb 2022 10:00:23 -0800
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Yicong Yang <yangyicong@...wei.com>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        Barry Song <21cnbao@...il.com>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>
Cc:     yangyicong@...ilicon.com,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        "ego@...ux.vnet.ibm.com" <ego@...ux.vnet.ibm.com>,
        Linuxarm <linuxarm@...wei.com>,
        Guodong Xu <guodong.xu@...aro.org>,
        Chen Yu <yu.c.chen@...el.com>
Subject: Re: [PATCH v2 2/2] sched/fair: Scan cluster before scanning LLC in
 wake-up path

On Wed, 2022-02-16 at 18:00 +0800, Yicong Yang wrote:
> On 2022/2/16 17:19, Song Bao Hua (Barry Song) wrote:
> > 
> > tbench running on numa 0&1:
> >                             5.17-rc1          rc1 + chenyu          rc1+chenyu+cls     rc1+chenyu+cls-pingpong  rc1+cls
> > Hmean     1        320.01 (   0.00%)      318.03 *  -0.62%*      357.15 *  11.61%*      375.43 *  17.32%*      378.44 *  18.26%*
> > Hmean     2        643.85 (   0.00%)      637.74 *  -0.95%*      714.36 *  10.95%*      745.82 *  15.84%*      752.52 *  16.88%*
> > Hmean     4       1287.36 (   0.00%)     1285.20 *  -0.17%*     1431.35 *  11.18%*     1481.71 *  15.10%*     1505.62 *  16.95%*
> > Hmean     8       2564.60 (   0.00%)     2551.02 *  -0.53%*     2812.74 *   9.68%*     2921.51 *  13.92%*     2955.29 *  15.23%*
> > Hmean     16      5195.69 (   0.00%)     5163.39 *  -0.62%*     5583.28 *   7.46%*     5726.08 *  10.21%*     5814.74 *  11.91%*
> > Hmean     32      9769.16 (   0.00%)     9815.63 *   0.48%*    10518.35 *   7.67%*    10852.89 *  11.09%*    10872.63 *  11.30%*
> > Hmean     64     15952.50 (   0.00%)    15780.41 *  -1.08%*    10608.36 * -33.50%*    17503.42 *   9.72%*    17281.98 *   8.33%*
> > Hmean     128    13113.77 (   0.00%)    12000.12 *  -8.49%*    13095.50 *  -0.14%*    13991.90 *   6.70%*    13895.20 *   5.96%*
> > Hmean     256    10997.59 (   0.00%)    12229.20 *  11.20%*    11902.60 *   8.23%*    12214.29 *  11.06%*    11244.69 *   2.25%*
> > Hmean     512    14623.60 (   0.00%)    15863.25 *   8.48%*    14103.38 *  -3.56%*    16422.56 *  12.30%*    15526.25 *   6.17%*
> > 
> 
> Yes I think it'll also benefit for the cluster's conditon.
> 
> But 128 threads seems like a weired point that Chen's patch on 5.17-rc1 (without this series) causes degradation,
> which in Chen's tbench test it doesn't cause that much when the 2 * cpu number == threads[*]:
> 

>From the data, it seems like Chen Yu's patch benefits the overloaded condition (as expected) while
the cluster scheduling has benefit most at the low end (also expected).  It is nice that
by combining these two approaches we can get the most benefit.
 
Chen Yu's patch has a hard transition to stop search for idle CPU at about 85% utilization.
So we may be hitting that knee and we may benefit from not stopping search completely
but reducing number of CPUs searched, as Peter pointed out.

Tim 

> case            	load    	baseline(std%)	compare%( std%)
> loopback        	thread-224	 1.00 (  0.17)	 +2.30 (  0.10)
> 
> [*] https://lore.kernel.org/lkml/20220207034013.599214-1-yu.c.chen@intel.com/
> 
> > tbench running on numa 0 only:
> >                             5.17-rc1          rc1 + chenyu          rc1+chenyu+cls     rc1+chenyu+cls-pingpong   rc1+cls
> > Hmean     1        324.73 (   0.00%)      330.96 *   1.92%*      358.97 *  10.54%*      376.05 *  15.80%*      378.01 *  16.41%*
> > Hmean     2        645.36 (   0.00%)      643.13 *  -0.35%*      710.78 *  10.14%*      744.34 *  15.34%*      754.63 *  16.93%*
> > Hmean     4       1302.09 (   0.00%)     1297.11 *  -0.38%*     1425.22 *   9.46%*     1484.92 *  14.04%*     1507.54 *  15.78%*
> > Hmean     8       2612.03 (   0.00%)     2623.60 *   0.44%*     2843.15 *   8.85%*     2937.81 *  12.47%*     2982.57 *  14.19%*
> > Hmean     16      5307.12 (   0.00%)     5304.14 *  -0.06%*     5610.46 *   5.72%*     5763.24 *   8.59%*     5886.66 *  10.92%*
> > Hmean     32      9354.22 (   0.00%)     9738.21 *   4.11%*     9360.21 *   0.06%*     9699.05 *   3.69%*     9908.13 *   5.92%*
> > Hmean     64      7240.35 (   0.00%)     7210.75 *  -0.41%*     6992.70 *  -3.42%*     7321.52 *   1.12%*     7278.78 *   0.53%*
> > Hmean     128     6186.40 (   0.00%)     6314.89 *   2.08%*     6166.44 *  -0.32%*     6279.85 *   1.51%*     6187.85 (   0.02%)
> > Hmean     256     9231.40 (   0.00%)     9469.26 *   2.58%*     9134.42 *  -1.05%*     9322.88 *   0.99%*     9448.61 *   2.35%*
> > Hmean     512     8907.13 (   0.00%)     9130.46 *   2.51%*     9023.87 *   1.31%*     9276.19 *   4.14%*     9397.22 *   5.50%*
> > 
> > > like rc1+cls, in some
> > > cases(256, 512 threads on numa0&1), it is even much better.
> > > 
> > > Thanks
> > > Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ