lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFSUoLXjEh8bvULXe2bysiW8S6yTcpgzCAgkuPuJxD6_Q@mail.gmail.com>
Date: Mon, 2 Dec 2024 11:08:47 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Bharata B Rao <bharata@....com>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, nikunj@....com, 
	willy@...radead.org, vbabka@...e.cz, david@...hat.com, 
	akpm@...ux-foundation.org, yuzhao@...gle.com, axboe@...nel.dk, 
	viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz, joshdon@...gle.com, 
	clm@...a.com
Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path

On Mon, Dec 2, 2024 at 10:37 AM Bharata B Rao <bharata@....com> wrote:
>
> On 28-Nov-24 10:01 AM, Mateusz Guzik wrote:
>
> > WIlly mentioned the folio wait queue hash table could be grown, you
> > can find it in mm/filemap.c:
> >    1062 #define PAGE_WAIT_TABLE_BITS 8
> >    1063 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
> >    1064 static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE]
> > __cacheline_aligned;
> >    1065
> >    1066 static wait_queue_head_t *folio_waitqueue(struct folio *folio)
> >    1067 {
> >    1068 │       return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
> >    1069 }
> >
> > Can you collect off cpu time? offcputime-bpfcc -K > /tmp/out
>
> Flamegraph for "perf record --off-cpu -F 99 -a -g --all-kernel
> --kernel-callchains -- sleep 120" is attached.
>
> Off-cpu samples were collected for 120s at around 45th minute run of the
> FIO benchmark that actually runs for 1hr. This run was with kernel that
> had your inode_lock fix but no changes to PAGE_WAIT_TABLE_BITS.
>
> Hopefully this captures the representative sample of the scalability
> issue with folio lock.
>

I'm not familiar with the off-cpu option, fwiw does not look like any
of that time got graphed. The thing that I know to work is
offcputime-bpfcc.

Regardless, per your own graph over half the *on* cpu  time is spent
spinning on the folio hash table locks.

If bumping the size does not resolve the problem, the most likely
contention shifts again to something else. So what we need is some
profiling data from that state.

-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ