linux-kernel - Re: [PATCH 2/2] mm/vmalloc: rework the drain logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAC=cRTN-JyZKyFkRgC0BrBjnu4mMTJ_hXBYszJ9HLXaLqeMfgQ@mail.gmail.com>
Date:   Wed, 18 Nov 2020 10:44:13 +0800
From:   huang ying <huang.ying.caritas@...il.com>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Hillf Danton <hdanton@...a.com>,
        Michal Hocko <mhocko@...e.com>,
        Matthew Wilcox <willy@...radead.org>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Huang Ying <ying.huang@...el.com>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] mm/vmalloc: rework the drain logic

On Tue, Nov 17, 2020 at 9:04 PM Uladzislau Rezki <urezki@...il.com> wrote:
>
> On Tue, Nov 17, 2020 at 10:37:34AM +0800, huang ying wrote:
> > On Tue, Nov 17, 2020 at 6:00 AM Uladzislau Rezki (Sony)
> > <urezki@...il.com> wrote:
> > >
> > > A current "lazy drain" model suffers from at least two issues.
> > >
> > > First one is related to the unsorted list of vmap areas, thus
> > > in order to identify the [min:max] range of areas to be drained,
> > > it requires a full list scan. What is a time consuming if the
> > > list is too long.
> > >
> > > Second one and as a next step is about merging all fragments
> > > with a free space. What is also a time consuming because it
> > > has to iterate over entire list which holds outstanding lazy
> > > areas.
> > >
> > > See below the "preemptirqsoff" tracer that illustrates a high
> > > latency. It is ~24 676us. Our workloads like audio and video
> > > are effected by such long latency:
> >
> > This seems like a real problem.  But I found there's long latency
> > avoidance mechanism in the loop in __purge_vmap_area_lazy() as
> > follows,
> >
> >         if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
> >             cond_resched_lock(&free_vmap_area_lock);
> >
> I have added that "resched threshold" because of on my tests i could
> simply hit out of memory, due to the fact that a drain work is not up
> to speed to process such long outstanding list of vmap areas.

OK.  Now I think I understand the problem.  For free area purging,
there are multiple "producers" but one "consumer", and it lacks enough
mechanism to slow down the "producers" if "consumer" can not catch up.
And your patch tries to resolve the problem via accelerating the
"consumer".
That isn't perfect, but I think we may have quite some opportunities
to merge the free areas, so it should just work.

And I found the long latency avoidance logic in
__purge_vmap_area_lazy() appears problematic,

         if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
             cond_resched_lock(&free_vmap_area_lock);

Shouldn't it be something as follows?

         if (i >= BATCH && atomic_long_read(&vmap_lazy_nr) <
resched_threshold) {
             cond_resched_lock(&free_vmap_area_lock);
             i = 0;
         } else
             i++;

This will accelerate the purging via batching and slow down vmalloc()
via holding free_vmap_area_lock.  If it makes sense, can we try this?

And, can we reduce lazy_max_pages() to control the length of the
purging list?  It could be > 8K if the vmalloc/vfree size is small.

Best Regards,
Huang, Ying