linux-kernel - Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHVum0cynwp5Phx=v2LV33Hsa8viq0jpVLh0Q_ZtpUZVy6Lm9w@mail.gmail.com>
Date:   Mon, 28 Mar 2022 12:13:18 -0700
From:   Vipin Sharma <vipinsh@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     David Matlack <dmatlack@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely
 populated rmaps

Thank you David and Paolo, for checking this patch carefully. With
hindsight, I should have explicitly mentioned adding "noinline" in my
patch email.

On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
>
> On 3/26/22 01:31, Vipin Sharma wrote:
> >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> >>> +static noinline void
> >>
> >> What is the reason to add noinline?
> >
> > My understanding is that since this method is called from
> > __always_inline methods, noinline will avoid gcc inlining the
> > slot_rmap_walk_next in those functions and generate smaller code.
> >
>
> Iterators are written in such a way that it's way more beneficial to
> inline them.  After inlining, compilers replace the aggregates (in this
> case, struct slot_rmap_walk_iterator) with one variable per field and
> that in turn enables a lot of optimizations, so the iterators should
> actually be always_inline if anything.
>
> For the same reason I'd guess the effect on the generated code should be
> small (next time please include the output of "size mmu.o"), but should
> still be there.  I'll do a quick check of the generated code and apply
> the patch.

Yeah, I should have added the "size mmu.o" output. Here is what I have found:

size arch/x86/kvm/mmu/mmu.o

Without noinline:
              text      data     bss       dec        hex filename
          89938   15793      72  105803   19d4b arch/x86/kvm/mmu/mmu.o

With noinline:
              text      data     bss        dec       hex filename
          90058   15793      72  105923   19dc3 arch/x86/kvm/mmu/mmu.o

With noinline, increase in size = 120

Curiously, I also checked file size with "ls -l" command
File size:
        Without noinline: 1394272 bytes
        With noinline: 1381216 bytes

With noinline, decrease in size = 13056 bytes

I also disassembled mmu.o via "objdump -d" and found following
Total lines in the generated assembly:
        Without noinline: 23438
        With noinline: 23393

With noinline, decrease in assembly code = 45

I can see in assembly code that there are multiple "call" operations
in the "with noinline" object file, which is expected and has less
lines of code compared to "without noinline". I am not sure why the
size command is showing an increase in text segment for "with
noinline" and what to infer with all of this data.

Thanks
Vipin