lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD7_sbEwxk-avBMONmihHOeKAnWoeANLQ7cR6LBO2YfzJ5Q8kw@mail.gmail.com>
Date:   Tue, 2 Jul 2019 20:18:03 +0800
From:   oddtux <lpf.vector@...il.com>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     Michal Hocko <mhocko@...nel.org>, akpm@...ux-foundation.org,
        peterz@...radead.org, rpenyaev@...e.de, guro@...com,
        aryabinin@...tuozzo.com, rppt@...ux.ibm.com, mingo@...nel.org,
        rick.p.edgecombe@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/5] mm/vmalloc.c: improve readability and rewrite vmap_area

Uladzislau Rezki <urezki@...il.com> 于2019年7月1日周一 下午6:11写道:
>
> On Mon, Jul 01, 2019 at 11:20:37AM +0200, Michal Hocko wrote:
> > On Sun 30-06-19 15:56:45, Pengfei Li wrote:
> > > Hi,
> > >
> > > This series of patches is to reduce the size of struct vmap_area.
> > >
> > > Since the members of struct vmap_area are not being used at the same time,
> > > it is possible to reduce its size by placing several members that are not
> > > used at the same time in a union.
> > >
> > > The first 4 patches did some preparatory work for this and improved
> > > readability.
> > >
> > > The fifth patch is the main patch, it did the work of rewriting vmap_area.
> > >
> > > More details can be obtained from the commit message.
> >
> > None of the commit messages talk about the motivation. Why do we want to
> > add quite some code to achieve this? How much do we save? This all
> > should be a part of the cover letter.
> >
> > > Thanks,
> > >
> > > Pengfei
> > >
> > > Pengfei Li (5):
> > >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > >   mm/vmalloc.c: Introduce a wrapper function of
> > >     insert_vmap_area_augment()
> > >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> > >
> > >  include/linux/vmalloc.h |  28 +++++---
> > >  mm/vmalloc.c            | 144 +++++++++++++++++++++++++++-------------
> > >  2 files changed, 117 insertions(+), 55 deletions(-)
> > >
> > > --
> > > 2.21.0
>
> > > Pengfei Li (5):
> > >   mm/vmalloc.c: Introduce a wrapper function of insert_vmap_area()
> > >   mm/vmalloc.c: Introduce a wrapper function of
> > >     insert_vmap_area_augment()
> > >   mm/vmalloc.c: Rename function __find_vmap_area() for readability
> > >   mm/vmalloc.c: Modify function merge_or_add_vmap_area() for readability
> > >   mm/vmalloc.c: Rewrite struct vmap_area to reduce its size
> Fitting vmap_area to 1 cacheline boundary makes sense to me. I was thinking about
> that and i have patches in my pipeline to send out but implementation is different.
>
> I had a look at all 5 patches. What you are doing is reasonable to me, i mean when
> it comes to the idea of reducing the size to L1 cache line.
>

Thank you for your review.

> I have a concern about implementation and all logic around when we can use va_start
> and when it is something else. It is not optimal at least to me, from performance point
> of view and complexity. All hot paths and tree traversal are affected by that.
>
> For example running the vmalloc test driver against this series shows the following
> delta:
>
> <5.2.0-rc6+>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 969370 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 989619 usec
> Summary: long_busy_list_alloc_test loops: 1000000 avg: 12895813 usec
> <5.2.0-rc6+>
>
> <this series>
> Summary: fix_size_alloc_test passed: loops: 1000000 avg: 1098372 usec
> Summary: full_fit_alloc_test passed: loops: 1000000 avg: 1167260 usec
> Summary: long_busy_list_alloc_test passed: loops: 1000000 avg: 12934286 usec
> <this series>
>
> For example, the degrade in second test is ~15%.
>
> --
> Vlad Rezki

Hi, Vlad

I think the reason for the performance degradation is that the value
of va_start is obtained by va->vm->addr.

And since the vmap area in the BUSY tree is always page-aligned,
there is no reason for _va_vmlid to override va_start, just let
the va->flags use the bits that lower than PAGE_OFFSET.

I will use this implementation in the next version and show almost
no performance penalty in my local tests.

I will send the next version soon.

Thank you for taking your time for the review.

Best regards,

Pengfei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ