linux-kernel - Re: [PATCH] dma-buf: system_heap: avoid reclaim for order 4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCoAKtHmxFomdGfTfXy8ZvFMfMRj4jZ+b8wMMD+5AmAB0g@mail.gmail.com>
Date:   Wed, 25 Jan 2023 12:32:36 -0800
From:   John Stultz <jstultz@...gle.com>
To:     jaewon31.kim@...sung.com
Cc:     "T.J. Mercier" <tjmercier@...gle.com>,
        "sumit.semwal@...aro.org" <sumit.semwal@...aro.org>,
        "daniel.vetter@...ll.ch" <daniel.vetter@...ll.ch>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jaewon31.kim@...il.com" <jaewon31.kim@...il.com>
Subject: Re: [PATCH] dma-buf: system_heap: avoid reclaim for order 4

On Wed, Jan 25, 2023 at 2:20 AM Jaewon Kim <jaewon31.kim@...sung.com> wrote:
> > > On Tue, Jan 17, 2023 at 10:54 PM John Stultz <jstultz@...gle.com> wrote:
> > > >
> > > > On Tue, Jan 17, 2023 at 12:31 AM Jaewon Kim <jaewon31.kim@...sung.com> wrote:
> > > > > > Using order 4 pages would be helpful for many IOMMUs, but it could spend
> > > > > > quite much time in page allocation perspective.
> > > > > >
> > > > > > The order 4 allocation with __GFP_RECLAIM may spend much time in
> > > > > > reclaim and compation logic. __GFP_NORETRY also may affect. These cause
> > > > > > unpredictable delay.
> > > > > >
> > > > > > To get reasonable allocation speed from dma-buf system heap, use
> > > > > > HIGH_ORDER_GFP for order 4 to avoid reclaim.
> > > >
> > > > Thanks for sharing this!
> > > > The case where the allocation gets stuck behind reclaim under pressure
> > > > does sound undesirable, but I'd be a bit hesitant to tweak numbers
> > > > that have been used for a long while (going back to ion) without a bit
> > > > more data.
> > > >
> > > > It might be good to also better understand the tradeoff of potential
> > > > on-going impact to performance from using low order pages when the
> > > > buffer is used.  Do you have any details like or tests that you could
> > > > share to help ensure this won't impact other users?
> > > >
> > > > TJ: Do you have any additional thoughts on this?
> > > >
> > > I don't have any data on how often we hit reclaim for mid order
> > > allocations. That would be interesting to know. However the 70th
> > > percentile of system-wide buffer sizes while running the camera on my
> > > phone is still only 1 page, so it looks like this change would affect
> > > a subset of use-cases.
> > >
> > > Wouldn't this change make it less likely to get an order 4 allocation
> > > (under memory pressure)? The commit message makes me think the goal of
> > > the change is to get more of them.
> >
> > Hello John Stultz
> >
> > I've been waiting for your next reply.

Sorry, I was thinking you were gathering data on the tradeoffs. Sorry
for my confusion.

> > With my commit, we may gather less number of order 4 pages and fill the
> > requested size with more number of order 0 pages. I think, howerver, stable
> > allocation speed is quite important so that corresponding user space
> > context can move on within a specific time.
> >
> > Not only compaction but reclaim also, I think, would be invoked more if the
> > __GFP_RECLAIM is added on order 4. I expect the reclaim could be decreased
> > if we move to order 0.
> >
>
> Additionally I'd like to say the old legacy ion system heap also used the
> __GFP_RECLAIM only for order 8, not for order 4.
>
> drivers/staging/android/ion/ion_system_heap.c
>
> static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN |
>                                     __GFP_NORETRY) & ~__GFP_RECLAIM;
> static gfp_t low_order_gfp_flags  = GFP_HIGHUSER | __GFP_ZERO;
> static const unsigned int orders[] = {8, 4, 0};
>
> static int ion_system_heap_create_pools(struct ion_page_pool **pools)
> {
>        int i;
>
>        for (i = 0; i < NUM_ORDERS; i++) {
>                struct ion_page_pool *pool;
>                gfp_t gfp_flags = low_order_gfp_flags;
>
>                if (orders[i] > 4)
>                        gfp_flags = high_order_gfp_flags;


This seems a bit backwards from your statement. It's only removing
__GFP_RECLAIM on order 8 (high_order_gfp_flags).

So apologies again, but how is that different from the existing code?

#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
#define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
#define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
                                | __GFP_NORETRY) & ~__GFP_RECLAIM) \
                                | __GFP_COMP)
static gfp_t order_flags[] = {HIGH_ORDER_GFP, MID_ORDER_GFP, LOW_ORDER_GFP};

Where the main reason we introduced the mid-order flags is to avoid
the warnings on order 4 allocation failures when we'll fall back to
order 0

The only substantial difference I see between the old ion code and
what we have now is the GFP_COMP addition, which is a bit hazy in my
memory. I unfortunately don't have a record of why it was added (don't
have access to my old mail box), so I suspect it was something brought
up in private review.  Dropping that from the low order flags probably
makes sense as TJ pointed out, but this isn't what your patch is
changing.

Your patch is changing that for mid-order allocations we'll use the
high order flags, so we'll not retry and not reclaim, so there will be
more failing and falling back to single page allocations.
This makes sense to make allocation time faster and more deterministic
(I like it!), but potentially has the tradeoff of losing the
performance benefit of using mid order page sizes.

I suspect your change is a net win overall, as the cumulative effect
of using larger pages probably won't benefit more than the large
indeterministic allocation time, particularly under pressure.

But because your change is different from what the old ion code did, I
want to be a little cautious. So it would be nice to see some
evaluation of not just the benefits the patch provides you but also of
what negative impact it might have.  And so far you haven't provided
any details there.

A quick example might be for the use case where mid-order allocations
are causing you trouble, you could see how the performance changes if
you force all mid-order allocations to be single page allocations (so
orders[] = {8, 0, 0};) and compare it with the current code when
there's no memory pressure (right after reboot when pages haven't been
fragmented) so the mid-order allocations will succeed.  That will let
us know the potential downside if we have brief / transient pressure
at allocation time that forces small pages.

Does that make sense?

thanks
-john