linux-kernel - Re: [PATCH v10 01/10] fs: Allow fine-grained control of folio sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bf9ffac5-11ff-4a1a-b31a-b9940558fe2c@arm.com>
Date: Wed, 17 Jul 2024 16:26:15 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
Cc: Matthew Wilcox <willy@...radead.org>, david@...morbit.com,
 chandan.babu@...cle.com, djwong@...nel.org, brauner@...nel.org,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 yang@...amperecomputing.com, linux-mm@...ck.org, john.g.garry@...cle.com,
 linux-fsdevel@...r.kernel.org, hare@...e.de, p.raghav@...sung.com,
 mcgrof@...nel.org, gost.dev@...sung.com, cl@...amperecomputing.com,
 linux-xfs@...r.kernel.org, hch@....de, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH v10 01/10] fs: Allow fine-grained control of folio sizes

On 17/07/2024 16:12, Pankaj Raghav (Samsung) wrote:
>>>>
>>>> This is really too much.  It's something that will never happen.  Just
>>>> delete the message.
>>>>
>>>>> +	if (max > MAX_PAGECACHE_ORDER) {
>>>>> +		VM_WARN_ONCE(1,
>>>>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
>>>>> +		max = MAX_PAGECACHE_ORDER;
>>>>
>>>> Absolutely not.  If the filesystem declares it can support a block size
>>>> of 4TB, then good for it.  We just silently clamp it.
>>>
>>> Hmm, but you raised the point about clamping in the previous patches[1]
>>> after Ryan pointed out that we should not silently clamp the order.
>>>
>>> ```
>>>> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
>>>> whatever values are passed in are a hard requirement? So wouldn't want them to
>>>> be silently reduced. (Especially given the recent change to reduce the size of
>>>> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
>>>
>>> Hm, yes.  We should probably make this return an errno.  Including
>>> returning an errno for !IS_ENABLED() and min > 0.
>>> ```
>>>
>>> It was not clear from the conversation in the previous patches that we
>>> decided to just clamp the order (like it was done before).
>>>
>>> So let's just stick with how it was done before where we clamp the
>>> values if min and max > MAX_PAGECACHE_ORDER?
>>>
>>> [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
>>
>> The way I see it, there are 2 approaches we could take:
>>
>> 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
>> mapping_set_folio_order_range() that says min must be lte max, max must be lte
>> mapping_max_folio_size_supported(). Then emit VM_WARN() in
>> mapping_set_folio_order_range() if the constraints are violated, and clamp to
>> make it safe (from page cache's perspective). The VM_WARN()s can just be inline
> 
> Inlining with the `if` is not possible since:
> 91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")

Ahh my bad. Could use WARN_ON()?

> 
>> in the if statements to keep them clean. The FS is responsible for checking
>> mapping_max_folio_size_supported() and ensuring min and max meet requirements.
> 
> This is sort of what is done here but IIUC willy's reply to the patch,
> he prefers silent clamping over having WARNINGS. I think because we check
> the constraints during the mount time, so it should be safe to call
> this I guess?

I don't want to put words in his mouth, but I thought he was complaining about
the verbosity of the warnings, not their presence.

> 
>>
>> 2. Return an error from mapping_set_folio_order_range() (and the other functions
>> that set min/max). No need for warning. No state changed if error is returned.
>> FS can emit warning on error if it wants.
> 
> I think Chinner was not happy with this approach because this is done
> per inode and basically we would just shutdown the filesystem in the
> first inode allocation instead of refusing the mount as we know about
> the MAX_PAGECACHE_ORDER even during the mount phase anyway.

Ahh that makes sense. Understood.

> 
> --
> Pankaj