netdev - Re: [PATCH RFC 00/30] Overhaul NFSD filecache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <11579827-F7FE-49FA-9D32-D98C138796B1@oracle.com>
Date:   Fri, 24 Jun 2022 15:30:11 +0000
From:   Chuck Lever III <chuck.lever@...cle.com>
To:     Wang Yugui <wangyugui@...-tech.com>
CC:     Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "david@...morbit.com" <david@...morbit.com>,
        "tgraf@...g.ch" <tgraf@...g.ch>, Jeff Layton <jlayton@...hat.com>
Subject: Re: [PATCH RFC 00/30] Overhaul NFSD filecache



> On Jun 23, 2022, at 1:51 PM, Wang Yugui <wangyugui@...-tech.com> wrote:
> 
> Hi,
> 
>>> On Jun 23, 2022, at 5:02 AM, Wang Yugui <wangyugui@...-tech.com> wrote:
>>> 
>>> Hi,
>>> 
>>>>> On Jun 22, 2022, at 3:04 PM, Chuck Lever III <chuck.lever@...cle.com> wrote:
>>>>>> On Jun 22, 2022, at 2:36 PM, Wang Yugui <wangyugui@...-tech.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> fstests generic/531 triggered a panic on kernel 5.19.0-rc3 with this
>>>>>> patchset.
>>>>> 
>>>>> As I mention in the cover letter, I haven't tried running generic/531
>>>>> yet -- no claim at all that this is finished work and that #386 has
>>>>> been fixed at this point. I'm merely interested in comments on the
>>>>> general approach.
>>>>> 
>>>>> 
>>>>>> [ 405.478056] BUG: kernel NULL pointer dereference, address: 0000000000000049
>>>>> 
>>>>> The "RIP: " tells the location of the crash. Notice that the call
>>>>> trace here does not include that information. From your attachment:
>>>>> 
>>>>> [ 405.518022] RIP: 0010:nfsd_do_file_acquire+0x4e1/0xb80 [nfsd]
>>>>> 
>>>>> To match that to a line of source code:
>>>>> 
>>>>> [cel@...et ~]$ cd src/linux/linux/
>>>>> [cel@...et linux]$ scripts/faddr2line ../obj/manet/fs/nfsd/filecache.o nfsd_do_file_acquire+0x4e1
>>>>> nfsd_do_file_acquire+0x4e1/0xfc0:
>>>>> rht_bucket_insert at /home/cel/src/linux/linux/include/linux/rhashtable.h:303
>>>>> (inlined by) __rhashtable_insert_fast at /home/cel/src/linux/linux/include/linux/rhashtable.h:718
>>>>> (inlined by) rhashtable_lookup_get_insert_key at /home/cel/src/linux/linux/include/linux/rhashtable.h:982
>>>>> (inlined by) nfsd_file_insert at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1031
>>>>> (inlined by) nfsd_do_file_acquire at /home/cel/src/linux/linux/fs/nfsd/filecache.c:1089
>>>>> [cel@...et linux]$
>>>>> 
>>>>> This is an example, I'm sure my compiled objects don't match yours.
>>>>> 
>>>>> And, now that I've added observability, you should be able to do:
>>>>> 
>>>>> # watch cat /proc/fs/nfsd/filecache
>>>>> 
>>>>> to see how many items are in the hash and LRU list while the test
>>>>> is running.
>>>>> 
>>>>> 
>>>>>> [ 405.608016] Call Trace:
>>>>>> [ 405.608016] <TASK>
>>>>>> [ 405.613020] nfs4_get_vfs_file+0x325/0x410 [nfsd]
>>>>>> [ 405.618018] nfsd4_process_open2+0x4ba/0x16d0 [nfsd]
>>>>>> [ 405.623016] ? inode_get_bytes+0x38/0x40
>>>>>> [ 405.623016] ? nfsd_permission+0x97/0xf0 [nfsd]
>>>>>> [ 405.628022] ? fh_verify+0x1cc/0x6f0 [nfsd]
>>>>>> [ 405.633025] nfsd4_open+0x640/0xb30 [nfsd]
>>>>>> [ 405.638025] nfsd4_proc_compound+0x3bd/0x710 [nfsd]
>>>>>> [ 405.643017] nfsd_dispatch+0x143/0x270 [nfsd]
>>>>>> [ 405.648019] svc_process_common+0x3bf/0x5b0 [sunrpc]
>>>> 
>>>> I was able to trigger something that looks very much like this crash.
>>>> If you remove this line from fs/nfsd/filecache.c:
>>>> 
>>>> 	.max_size		= 131072, /* buckets */
>>>> 
>>>> things get a lot more stable for generic/531.
>>>> 
>>>> I'm looking into the issue now.
>>> 
>>> Yes. When '.max_size = 131072' is removed, fstests generic/531 passed.
>> 
>> Great! Are you comfortable with this general approach for bug #386?
> 
> It seems a good result for #386.
> 
> fstests generic/531(file-max: 1M) performance result:
> base(5.19.0-rc3, 12 bits hash, serialized nfsd_file_gc): 222s
> this patchset(.min_size=4096): 59s
> so, a good improvement for #386.
> 
> It seems a good(acceptable) result for #387 too.
> the period of 'text busy(exec directly from the back-end of nfs-server)'
> is about 4s.

I was surprised to learn that NFSv4 doesn't close those files outright,
that they remain in the filecache for a bit. I think we can do better
there, but I haven't looked closely at that yet.

I see that files are left open on the server by crashed NFSv4 clients too.
That will result in the server becoming unusable after significant
uptime, which I've seen on occasion (because I crash my clients a lot)
but never looked into until now. 


--
Chuck Lever