[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191223204551.GA272672@chrisdown.name>
Date: Mon, 23 Dec 2019 20:45:51 +0000
From: Chris Down <chris@...isdown.name>
To: Matthew Wilcox <willy@...radead.org>
Cc: Amir Goldstein <amir73il@...il.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Al Viro <viro@...iv.linux.org.uk>,
Jeff Layton <jlayton@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Tejun Heo <tj@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>, kernel-team@...com,
Hugh Dickins <hughd@...gle.com>,
Miklos Szeredi <miklos@...redi.hu>,
"zhengbin (A)" <zhengbin13@...wei.com>,
Roman Gushchin <guro@...com>
Subject: Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when
ino_t is 64 bit
Matthew Wilcox writes:
>On Fri, Dec 20, 2019 at 07:35:38PM +0200, Amir Goldstein wrote:
>> On Fri, Dec 20, 2019 at 6:46 PM Matthew Wilcox <willy@...radead.org> wrote:
>> >
>> > On Fri, Dec 20, 2019 at 03:41:11PM +0200, Amir Goldstein wrote:
>> > > Suggestion:
>> > > 1. Extend the kmem_cache API to let the ctor() know if it is
>> > > initializing an object
>> > > for the first time (new page) or recycling an object.
>> >
>> > Uh, what? The ctor is _only_ called when new pages are allocated.
>> > Part of the contract with the slab user is that objects are returned to
>> > the slab in an initialised state.
>>
>> Right. I mixed up the ctor() with alloc_inode().
>> So is there anything stopping us from reusing an existing non-zero
>> value of i_ino in shmem_get_inode()? for recycling shmem ino
>> numbers?
>
>I think that would be an excellent solution to the problem! At least,
>I can't think of any problems with it.
Thanks for the suggestions and feedback, Amir and Matthew :-)
The slab i_ino recycling approach works somewhat, but is unfortunately neutered
quite a lot by the fact that slab recycling is per-memcg. That is, replacing
with recycle_or_get_next_ino(old_ino)[0] for shmfs and a few other trivial
callsites only leads to about 10% slab reuse, which doesn't really stem the
bleeding of 32-bit inums on an affected workload:
# tail -5000 /sys/kernel/debug/tracing/trace | grep -o 'recycle_or_get_next_ino:.*' | sort | uniq -c
4454 recycle_or_get_next_ino: not recycled
546 recycle_or_get_next_ino: recycled
Roman (who I've just added to cc) tells me that currently we only have
per-memcg slab reuse instead of global when using CONFIG_MEMCG. This
contributes fairly significantly here since there are multiple tasks across
multiple cgroups which are contributing to the get_next_ino() thrash.
I think this is a good start, but we need something of a different magnitude in
order to actually solve this problem with the current slab infrastructure. How
about something like the following?
1. Add get_next_ino_full, which uses whatever the full width of ino_t is
2. Use get_next_ino_full in tmpfs (et al.)
3. Add a mount option to tmpfs (et al.), say `32bit-inums`, which people can
pass if they want the 32-bit inode numbers back. This would still allow
people who want to make this tradeoff to use xino.
4. (If you like) Also add a CONFIG option to disable this at compile time.
I'd appreciate your thoughts on that approach or others you have ideas about.
Thanks! :-)
0:
unsigned int recycle_or_get_next_ino(ino_t old_ino)
{
/*
* get_next_ino returns unsigned int. If this fires then i_ino must be
* >32 bits and have been changed later, so the caller shouldn't be
* recycling inode numbers
*/
WARN_ONCE(old_ino >> (sizeof(unsigned int) * 8),
"Recyclable i_ino uses more bits than unsigned int: %llu",
(u64)old_ino);
if (old_ino) {
if (prandom_u32() % 100 == 0)
trace_printk("recycled\n");
return old_ino;
} else {
if (prandom_u32() % 100 == 0)
trace_printk("not recycled\n");
return get_next_ino();
}
}
Powered by blists - more mailing lists