linux-kernel - Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160816004423.GH16044@dastard>
Date:	Tue, 16 Aug 2016 10:44:23 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Mel Gorman <mgorman@...hsingularity.net>,
	Johannes Weiner <hannes@...xchg.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Bob Peterson <rpeterso@...hat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	"Huang, Ying" <ying.huang@...el.com>,
	Christoph Hellwig <hch@....de>,
	Wu Fengguang <fengguang.wu@...el.com>, LKP <lkp@...org>,
	Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

On Mon, Aug 15, 2016 at 04:48:36PM -0700, Linus Torvalds wrote:
> On Mon, Aug 15, 2016 at 4:20 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > None of this code is all that new, which is annoying. This must have
> > gone on forever,
> 
> ... ooh.
> 
> Wait, I take that back.
> 
> We actually have some very recent changes that I didn't even think
> about that went into this very merge window.
....
> Mel? The issue is that Dave Chinner is seeing some nasty spinlock
> contention on "mapping->tree_lock":
> 
> >   31.18%  [kernel]  [k] __pv_queued_spin_lock_slowpath
> 
> and one of the main paths is this:
> 
> >    - 30.29% kswapd
> >       - 30.23% shrink_node
> >          - 30.07% shrink_node_memcg.isra.75
> >             - 30.15% shrink_inactive_list
> >                - 29.49% shrink_page_list
> >                   - 22.79% __remove_mapping
> >                      - 22.27% _raw_spin_lock_irqsave
> >                           __pv_queued_spin_lock_slowpath
> 
> so there's something ridiculously bad going on with a fairly simple benchmark.
> 
> Dave's benchmark is literally just a "write a new 48GB file in
> single-page chunks on a 4-node machine". Nothing odd - not rewriting
> files, not seeking around, no nothing.
> 
> You can probably recreate it with a silly
> 
>   dd bs=4096 count=$((12*1024*1024)) if=/dev/zero of=bigfile
> 
> although Dave actually had something rather fancier, I think.

16p, 16GB RAM, fake_numa=4. Overwrite a 47GB file on a 48GB
filesystem:

# mkfs.xfs -f -d size=48g /dev/vdc
# mount /dev/vdc /mnt/scratch
# xfs_io -f -c "pwrite 0 47g" /mnt/scratch/fooey

Wait for memory to fill and reclaim to kick in, then look at the
profile. If you run it a second time, reclaim kicks in straight
away.

It's not the new code in 4.8 - it reproduces on 4.7 just fine, and
probably will reproduce all the way back to when the memcg-aware
writeback code was added....

-Dave.
-- 
Dave Chinner
david@...morbit.com