[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171009124921.wtbzvqagges44brq@yury-thinkpad>
Date:   Mon, 9 Oct 2017 15:49:21 +0300
From:   Yury Norov <ynorov@...iumnetworks.com>
To:     Will Deacon <will.deacon@....com>
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Jeremy.Linton@....com, peterz@...radead.org, mingo@...hat.com,
        longman@...hat.com, boqun.feng@...il.com,
        paulmck@...ux.vnet.ibm.com
Subject: Re: [PATCH v2 0/5] Switch arm64 over to qrwlock
On Mon, Oct 09, 2017 at 10:59:36AM +0100, Will Deacon wrote:
> Hi Yury,
> 
> On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> > On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > > This is version two of the patches I posted yesterday:
> > > 
> > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> > > 
> > > I'd normally leave it longer before posting again, but Peter had a good
> > > suggestion to rework the layout of the lock word, so I wanted to post a
> > > version that follows that approach.
> > > 
> > > I've updated my branch if you're after the full patch stack:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> > > 
> > > As before, all comments (particularly related to testing and performance)
> > > welcome!
> > > 
> > I tested your patches with locktorture and found measurable performance
> > regression. I also respin the patch of Jan Glauber [1], and I also
> > tried Jan's patch with patch 5 from this series. Numbers differ a lot
> > from my previous measurements, but since that I changed working
> > station and use qemu with the support of parallel threads.
> >                         Spinlock        Read-RW lock    Write-RW lock
> > Vanilla:                129804626       12340895        14716138
> > This series:            113718002       10982159        13068934
> > Jan patch:              117977108       11363462        13615449
> > Jan patch + #5:         121483176       11696728        13618967
> > 
> > The bottomline of discussion [1] was that queued locks are more
> > effective when SoC has many CPUs. And 4 is not many. My measurement
> > was made on the 4-CPU machine, and it seems it confirms that. Does
> > it make sense to make queued locks default for many-CPU machines only?
> 
> Just to confirm, you're running this under qemu on an x86 host, using full
> AArch64 system emulation? If so, I really don't think we should base the
> merits of qrwlocks on arm64 around this type of configuration. Given that
> you work for a silicon vendor, could you try running on real arm64 hardware
> instead, please?
I don't have the hardware access at the moment. I'll run the test when
I'll get it.
> My measurements on 6-core and 8-core systems look a lot
> better with qrwlock than what we currently have in mainline, and they
> also fix a real starvation issue reported by Jeremy [1].
> 
> I'd also add that lock fairness comes at a cost, so I'd expect a small drop
> in total throughput for some workloads. I encourage you to try passing
> different arguments to locktorture to see this in action. For example, on
> an 8-core machine:
> 
> # insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2
> 
> -rc3:
> 
>   Writes:  Total: 6612  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 1265230  Max/Min: 0/0   Fail: 0
>   Writes:  Total: 6709  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 1916418  Max/Min: 0/0   Fail: 0
>   Writes:  Total: 6725  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 5103727  Max/Min: 0/0   Fail: 0
> 
> notice how the writers are really struggling here (you only have to tweak a
> bit more and you get RCU stalls, lose interrupts etc).
> 
> With the qrwlock:
> 
>   Writes:  Total: 47962  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 277903  Max/Min: 0/0   Fail: 0
>   Writes:  Total: 100151  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 525781  Max/Min: 0/0   Fail: 0
>   Writes:  Total: 155284  Max/Min: 0/0   Fail: 0
>   Reads :  Total: 767703  Max/Min: 0/0   Fail: 0
> 
> which is an awful lot better for maximum latency and fairness, despite the
> much lower reader count.
> 
> > There were 2 preparing patches in the series: 
> > [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> > and
> > [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
> > 
> > 1st patch is not needed anymore because Babu Moger submitted similar patch that
> > is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> > qrwlock.c"). Could you revisit second patch?
> 
> Sorry, not sure what you're asking me to do here.
It removes unneeded #include <linux/atomic.h> in
include/asm-generic/qspinlock_types.h. Could you or someone else take
it upstream?
 
> Will
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html
Powered by blists - more mailing lists
 
