[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210923033454.GA3145@kvm.asia-northeast3-a.c.our-ratio-313919.internal>
Date: Thu, 23 Sep 2021 03:34:54 +0000
From: Hyeonggon Yoo <42.hyeyoo@...il.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
Matthew Wilcox <willy@...radead.org>,
John Garry <john.garry@...wei.com>,
linux-block@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache
On Wed, Sep 22, 2021 at 06:58:00AM -0600, Jens Axboe wrote:
> On 9/22/21 2:19 AM, Hyeonggon Yoo wrote:
> > On Tue, Sep 21, 2021 at 09:37:40AM -0600, Jens Axboe wrote:
> >>> @@ -424,6 +431,57 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
> >>> }
> >>> EXPORT_SYMBOL(kmem_cache_create);
> >>>
> >>> +/**
> >>> + * kmem_cache_alloc_cached - try to allocate from cache without lock
> >>> + * @s: slab cache
> >>> + * @flags: SLAB flags
> >>> + *
> >>> + * Try to allocate from cache without lock. If fails, fill the lockless cache
> >>> + * using bulk alloc API
> >>> + *
> >>> + * Be sure that there's no race condition.
> >>> + * Must create slab cache with SLAB_LOCKLESS_CACHE flag to use this function.
> >>> + *
> >>> + * Return: a pointer to free object on allocation success, NULL on failure.
> >>> + */
> >>> +void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags)
> >>> +{
> >>> + struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache);
> >>> +
> >>> + BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE));
> >>> +
> >>> + if (cache->size) /* fastpath without lock */
> >>> + return cache->queue[--cache->size];
> >>> +
> >>> + /* slowpath */
> >>> + cache->size = kmem_cache_alloc_bulk(s, gfpflags,
> >>> + KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue);
> >>> + if (cache->size)
> >>> + return cache->queue[--cache->size];
> >>> + else
> >>> + return NULL;
> >>> +}
> >>> +EXPORT_SYMBOL(kmem_cache_alloc_cached);
> >
> > Hello Jens, I'm so happy that you gave comment.
> >
> >> What I implemented for IOPOLL doesn't need to care about interrupts,
> >> hence preemption disable is enough. But we do need that, at least.
> >
> > To be honest, that was my mistake. I was mistakenly using percpu API.
> > it's a shame :> Thank you for pointing that.
> >
> > Fixed it in v3 (work in progress now)
Hello Jens, thank you so much for reviewing this. Your thoughtful review helps
a lot And I feel we're going into right direction.
Anyway, let's start discussion again.
>
> Another thing to fix from there, just make it:
>
> if (cache->size)
> return cached item
> return NULL;
>
> No need for an if/else with the return.
That looks better. I'll consider that in next version.
> >> How does this work for preempt? You seem to assume that the function is
> >> invoked with preempt disabled, but then it could only be used with
> >> GFP_ATOMIC.
> >
> > I wrote it just same prototype with kmem_cache_alloc, and the gfpflags
> > parameter is unnecessary as you said. Okay, let's remove it in v3.
>
> Please also run some actual comparitative benchmarks on this, with a
> real workload. You also need an internal user of this, a stand-alone
> feature isn't really useful. It needs someone using it, and the
> benchmarks would be based on that (or those) use cases.
>
Yeah, That's important point too. as my project affects performance,
I should measure the impact. And as you said, to measure impact, I should
find use cases. For test, I made NAPI (network layer IO Polling) to use
this. it may help measuring.
So, Okay. I'll show its impact on performance, as work progresses.
> Another consideration - is it going to be OK to ignore slab pruning for
> this? Probably. But it needs to be thought about.
>
I think I should consider slab pruning too. That's one of benefits of
using slab allocator. I'll add this in TODOs.
> In terms of API, not sure why these are separate functions. Creating the
> cache with SLAB_FOO should be enough, and then kmem_cache_alloc() tries
> the cache first. If nothing there, fallback to the regular path.
> Something like this would only be used for cases that have a high
> alloc+free rate, would seem like a shame to need two function calls for
> the case where you just want a piece of memory and the cache is empty.
>
It would be wonderful if we can do that. And I'm working on this.
I'm testing this in my private repository and looking at what happens.
it seems there's some issues to solve.
> >> There are basically two types of use cases for this:
> >>
> >> 1) Freeing can happen from interrupts
> >> 2) Freeing cannot happen from interrupts
> >>
> >
> > I considered only case 2) when writing code. Well, To support 1),
> > I think there are two ways:
> >
> > a) internally call kmem_cache_free when in_interrupt() is true
> > b) caller must disable interrupt when freeing
> >
> > I think a) is okay, how do you think?
>
> If the API doesn't support freeing from interrupts, then I'd make that
> the rule. Caller should know better if that can happen, and then just
> use kmem_cache_free() if in a problematic context. That avoids polluting
> the fast path with that check. I'd still make it a WARN_ON_ONCE() as
> described and it can get removed later, hopefully.
Hmm.. As you mentioned above, to provide unified API, we should check
its context in kmem_cache_alloc anyway. But yes, it is better not to allow
calling from interrupt kmem_cache_{alloc,free}_cached.
> But note it's not just the freeing side that would be problematic. If
> you do support from-irq freeing, then the alloc side would need
> local_irq_save/restore() instead of just basic preempt protection. That
> would make it more expensive all around
I think so too. One reason of this work is to reduce cost of disabling
interrupts. That's costly.
> > note that b) can be problematic with kmem_cache_free_bulk
> > as it says interrupts must be enabled.
>
> Not sure that's actually true, apart from being bitrot.
I don't get what you mean.
Can you tell me in detail?
(Yeah, maybe I'm misunderstanding kmem_cache_{alloc,free}_bulk.)
Thanks,
Hyeonggon Yoo
> --
> Jens Axboe
>
Powered by blists - more mailing lists