[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA+hA=QEtxeCZX7K+sW0KUZbErjr9NFMN6ZaidaXCL+1m6=F+w@mail.gmail.com>
Date: Thu, 17 Mar 2022 11:49:16 +0800
From: Dong Aisheng <dongas86@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Dong Aisheng <aisheng.dong@....com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
shawnguo@...nel.org, linux-imx@....com, m.szyprowski@...sung.com,
lecopzer.chen@...iatek.com, david@...hat.com, vbabka@...e.cz,
stable@...r.kernel.org, shijie.qin@....com
Subject: Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
On Thu, Mar 17, 2022 at 5:09 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Wed, 16 Mar 2022 11:41:37 +0800 Dong Aisheng <dongas86@...il.com> wrote:
>
> > On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
> > >
> > > On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@....com> wrote:
> > >
> > > > --- a/mm/cma.c
> > > > +++ b/mm/cma.c
> > > >
> > > > ...
> > > >
> > > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > > > offset);
> > > > if (bitmap_no >= bitmap_maxno) {
> > > > spin_unlock_irq(&cma->lock);
> > > > + pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> > > > + /*
> > > > + * rescan as others may finish the memory migration
> > > > + * and quit if no available CMA memory found finally
> > > > + */
> > > > + if (start) {
> > > > + schedule();
> > > > + start = 0;
> > > > + continue;
> > > > + }
> > > > break;
> > >
> > > The schedule() is problematic. For a start, we'd normally use
> > > cond_resched() here, so we avoid calling the more expensive schedule()
> > > if we know it won't perform any action.
> > >
> > > But cond_resched() is problematic if this thread has realtime
> > > scheduling policy and the process we're waiting on does not. One way
> > > to address that is to use an unconditional msleep(1), but that's still
> > > just a hack.
> > >
> >
> > I think we can simply drop schedule() here during the second round of retry
> > as the estimated delay may not be really needed.
>
> That will simply cause a tight loop, so I'm obviously not understanding
> the proposal.
>
IIUC the original code is already a tight loop, isn't it?
You could also see my observation, thousands of retries, in patch 2.
The logic in this patch is just retry the original loop in case in case there's
a false possive error return.
Or you mean infinite loop? The loop will break out when meet an non EBUSY
error in alloc_contig_range().
BTW, the tight loop situation could be improved a lot by my patch 2.
And after Zi Yan's patchset [1] got merged, the situation could be
further improved by retring in pageblock step.
1. [v7,0/5] Use pageblock_order for cma and alloc_contig_range
alignment. - Patchwork (kernel.org)
So generally i wonder it seems still better than simply revert.
Please fix me if i still missed something.
Regards
Aisheng
Powered by blists - more mailing lists