lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b1d4265b384424688711a9259f98dec44c77848.camel@fifi.org>
Date: Mon, 25 Nov 2024 17:48:48 -0800
From: Philippe Troin <phil@...i.org>
To: Anders Blomdell <anders.blomdell@...il.com>, Jan Kara <jack@...e.cz>, 
 "Matthew Wilcox (Oracle)" <willy@...radead.org>, Andrew Morton
 <akpm@...ux-foundation.org>,  linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc: Jan Kara <jack@...e.cz>
Subject: Re: Regression in NFS probably due to very large amounts of
 readahead

On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote:
> When we (re)started one of our servers with 6.11.3-200.fc40.x86_64,
> we got terrible performance (lots of nfs: server x.x.x.x not
> responding).
> What triggered this problem was virtual machines with NFS-mounted
> qcow2 disks
> that often triggered large readaheads that generates long streaks of
> disk I/O
> of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache
> area of the
> machine.
> 
> A git bisect gave the following suspect:
> 
> git bisect start

8< snip >8

> # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25]
> readahead: properly shorten readahead when falling back to
> do_page_cache_ra()

Thank you for taking the time to bisect, this issue has been bugging
me, but it's been non-deterministic, and hence hard to bisect.

I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in
slightly different setups:

(1) On machines mounting NFSv3 shared drives. The symptom here is a
"nfs server XXX not responding, still trying" that never recovers
(while the server remains pingable and other NFSv3 volumes from the
hanging server can be mounted).

(2) On VMs running over qemu-kvm, I see very long stalls (can be up to
several minutes) on random I/O. These stalls eventually recover.

I've built a 6.11.10 kernel with
7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to
normal (no more NFS hangs, no more VM stalls).

Phil.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ