linux-kernel - Re: Regression in NFS probably due to very large amounts of readahead

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4bb8bfe1-5de6-4b5d-af90-ab24848c772b@gmail.com>
Date: Tue, 26 Nov 2024 09:01:35 +0100
From: Anders Blomdell <anders.blomdell@...il.com>
To: Philippe Troin <phil@...i.org>, Jan Kara <jack@...e.cz>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: Regression in NFS probably due to very large amounts of readahead



On 2024-11-26 02:48, Philippe Troin wrote:
> On Sat, 2024-11-23 at 23:32 +0100, Anders Blomdell wrote:
>> When we (re)started one of our servers with 6.11.3-200.fc40.x86_64,
>> we got terrible performance (lots of nfs: server x.x.x.x not
>> responding).
>> What triggered this problem was virtual machines with NFS-mounted
>> qcow2 disks
>> that often triggered large readaheads that generates long streaks of
>> disk I/O
>> of 150-600 MB/s (4 ordinary HDD's) that filled up the buffer/cache
>> area of the
>> machine.
>>
>> A git bisect gave the following suspect:
>>
>> git bisect start
> 
> 8< snip >8
> 
>> # first bad commit: [7c877586da3178974a8a94577b6045a48377ff25]
>> readahead: properly shorten readahead when falling back to
>> do_page_cache_ra()
> 
> Thank you for taking the time to bisect, this issue has been bugging
> me, but it's been non-deterministic, and hence hard to bisect.
> 
> I'm seeing the same problem on 6.11.10 (and earlier 6.11.x kernels) in
> slightly different setups:
> 
> (1) On machines mounting NFSv3 shared drives. The symptom here is a
> "nfs server XXX not responding, still trying" that never recovers
> (while the server remains pingable and other NFSv3 volumes from the
> hanging server can be mounted).
> 
> (2) On VMs running over qemu-kvm, I see very long stalls (can be up to
> several minutes) on random I/O. These stalls eventually recover.
> 
> I've built a 6.11.10 kernel with
> 7c877586da3178974a8a94577b6045a48377ff25 reverted and I'm back to
> normal (no more NFS hangs, no more VM stalls).
> 
> Phil.
Some printk debugging, seems to indicate that the problem
is that the entity 'ra->size - (index - start)' goes
negative, which then gets cast to a very large unsigned
'nr_to_read' when calling 'do_page_cache_ra'. Where the true
bug is still eludes me, though.

/Anders