linux-kernel - Re: [PATCH 1/1] kernel/smp: Split call_single

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c0b145ea557087ebbabe1fcca3b2bf69bedbaff6.camel@gmail.com>
Date:   Mon, 08 Feb 2021 14:03:47 -0300
From:   Leonardo Bras <leobras.c@...il.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Wei Yongjun <weiyongjun1@...wei.com>,
        Qais Yousef <qais.yousef@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] kernel/smp: Split call_single_queue into 3 queues

Hello Sebastian, 
Thanks for the feedback!

On Thu, 2021-01-28 at 11:33 +0100, Sebastian Andrzej Siewior wrote:
> On 2021-01-28 03:55:06 [-0300], Leonardo Bras wrote:
> > Currently, during flush_smp_call_function_queue():
> > - All items are transversed once, for inverting.
> > - The SYNC items are transversed twice.
> > - The ASYNC & IRQ_WORK items are transversed tree times.
> > - The TTWU items are transversed four times;.
> > 
> > Also, a lot of extra work is done to keep track and remove the items
> > already processed in each step.
> > 
> > By using three queues, it's possible to avoid all this work, and
> > all items in list are transversed only twice: once for inverting,
> > and once for processing..
> > 
> > In exchange, this requires 2 extra llist_del_all() in the beginning
> > of flush_smp_call_function_queue(), and some extra logic to decide
> > the correct queue to add the desired csd.
> > 
> > This is not supposed to cause any change in the order the items are
> > processed, but will change the order of printing (cpu offlining)
> > to the order the items will be proceessed.
> > 
> > (The above transversed count ignores the cpu-offlining case, in
> > which all items would be transversed again, in both cases.)
> 
> Numbers would be good.
> 

Sure, I will try to get some time to compare performance.


>  Having three queues increases the memory foot
> print from one pointer to three but we still remain in one cache line.
> One difference your patch makes is this hunk:
> 
> > +	if (smp_add_to_queue(cpu, node))
> >  		send_call_function_single_ipi(cpu);
> 
> Previously only the first addition resulted in sending an IPI. With this
> change you could send two IPIs, one for adding to two independent queues.

Yes, you are correct. 
I need to change this to looking into all queues, which should just add
a few compares, given all llist_heads are in the same cacheline.

> 
> A quick smoke test ended up
>           <idle>-0       [005] d..h1..   146.255996: flush_smp_call_function_queue: A1 S2 I0 T0 X3
> 
> with the patch at the bottom of the mail. This shows that in my
> smoke test at least, the number of items in the individual list is low.

Yes, but depending on workload this list may get longer.

My patch also needs some other changes, so I will send a v2 with those
+ the proposed changes.

> Sebastian

Best regards,
Leonardo Bras