netdev - Re: [PATCHv2 net-next 04/12] sctp: implement make_datafrag for sctp_stream

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADvbK_ckOgH9vzXVzckEtWAkJaYn9wuhJtZ+qpzep2T6C8Wung@mail.gmail.com>
Date:   Sat, 9 Dec 2017 00:17:41 +0800
From:   Xin Long <lucien.xin@...il.com>
To:     Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Cc:     Neil Horman <nhorman@...driver.com>,
        David Laight <David.Laight@...lab.com>,
        network dev <netdev@...r.kernel.org>,
        "linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>
Subject: Re: [PATCHv2 net-next 04/12] sctp: implement make_datafrag for sctp_stream_interleave

On Sat, Dec 9, 2017 at 12:00 AM, Marcelo Ricardo Leitner
<marcelo.leitner@...il.com> wrote:
> On Fri, Dec 08, 2017 at 10:37:34AM -0500, Neil Horman wrote:
>> On Fri, Dec 08, 2017 at 12:56:30PM -0200, Marcelo Ricardo Leitner wrote:
>> > On Fri, Dec 08, 2017 at 02:06:04PM +0000, David Laight wrote:
>> > > From: Xin Long
>> > > > Sent: 08 December 2017 13:04
>> > > ...
>> > > > @@ -264,8 +264,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>> > > >                                 frag |= SCTP_DATA_SACK_IMM;
>> > > >                 }
>> > > >
>> > > > -               chunk = sctp_make_datafrag_empty(asoc, sinfo, len, frag,
>> > > > -                                                0, GFP_KERNEL);
>> > > > +               chunk = asoc->stream.si->make_datafrag(asoc, sinfo, len, frag,
>> > > > +                                                      GFP_KERNEL);
>> > >
>> > > I know that none of the sctp code is very optimised, but that indirect
>> > > call is going to be horrid.
>> >
>> > Yeah.. but there is no way to avoid the double derreference
>> > considering we only have the asoc pointer in there and we have to
>> > reach the contents of the data chunk operations struct, and the .si
>> > part is the same as 'stream' part as it's a constant offset.
>> >
>> > Due to the for() in there, we could add a variable to store
>> > asoc->stream.si outside the for and then we can do only a single deref
>> > inside it. Xin, can you please try and see if the generated code is
>> > different?
>> >
>> > Other suggestions?
>> >
>> Is it worth replacing the si struct with an index/enum value, and indexing an
>> array of method pointer structs?  That would save you at least one dereference.
>
> Hmmm, maybe, yes. It would be like
> sctp_stream_interleave[asoc->stream.si].make_datafrag(...)
>
> Then same goes for pf->af, probably.
>
>>
>> Alternatively you could preform the dereference in two steps (i.e. declare an si
>> pointer on the stack and set it equal to asoc->stream.si, then deref
>> si->make_datafrag at call time.  That will at least give the compiler an
>> opportunity to preload the first pointer.
>
> Yep, that was my 2nd paragraph above :-) but it only works for cases
> such as this one.

Now:
  for(N) {
    ...
    chunk = asoc->stream.si->make_datafrag(asoc, sinfo, len, frag,
     0x000000000000fb58 <+360>: mov    0x848(%r13),%rax  <---- [a]
     0x000000000000fb5f <+367>: movzbl %cl,%ecx
     0x000000000000fb62 <+370>: mov    $0x14000c0,%r8d
     0x000000000000fb68 <+376>: mov    %r12d,%edx
     0x000000000000fb6b <+379>: mov    (%rsp),%rsi
     0x000000000000fb6f <+383>: mov    %r13,%rdi  <=(X)
     0x000000000000fb72 <+386>: callq  *0x8(%rax)  <---- [b]
     0x000000000000fb78 <+392>: mov    %rax,%r15
   }

   ret = N * ([a] + [b])


After using a variable:
  struct sctp_stream_interleave *si;
  ...
  si = asoc->stream.si;
     0x000000000000fb44 <+340>: mov    0x848(%r14),%rax
     0x000000000000fb4e <+350>: mov    %rax,0x20(%rsp) <----- [1]

  for(N) {
    ...
    chunk = si->make_datafrag(asoc, sinfo, len, frag, GFP_KERNEL);
     0x000000000000fb69 <+377>: mov    0x20(%rsp),%rax <----- [2]
     0x000000000000fb6e <+382>: movzbl %cl,%ecx
     0x000000000000fb71 <+385>: mov    $0x14000c0,%r8d
     0x000000000000fb77 <+391>: mov    %r12d,%edx
     0x000000000000fb7a <+394>: mov    (%rsp),%rsi
     0x000000000000fb7e <+398>: mov    0x28(%rsp),%rdi <=(Y)
     0x000000000000fb83 <+403>: callq  *0x8(%rax) <----- [3]
     0x000000000000fb89 <+409>: mov    %rax,%r14
   }

   ret = [1] + N * ([2] + [3])


Another small difference:
  as you can see, comparing to (X), (Y) is using 0x28(%rsp) in the loop,
  instead of %r13.

So that's what I can see from the related generated code.
If 0x848(%r13) is not worse than 0x28(%rsp) for cpu, I think
asoc->stream.si->make_datafrag() is even better. No ?