netdev - Re: Page allocator order-0 optimizations merged

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170327122816.dvnfxkyqxasfiknj@techsingularity.net>
Date:   Mon, 27 Mar 2017 13:28:16 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Pankaj Gupta <pagupta@...hat.com>,
        Tariq Toukan <ttoukan.linux@...il.com>,
        Tariq Toukan <tariqt@...lanox.com>, netdev@...r.kernel.org,
        akpm@...ux-foundation.org, linux-mm <linux-mm@...ck.org>,
        Saeed Mahameed <saeedm@...lanox.com>
Subject: Re: Page allocator order-0 optimizations merged

On Mon, Mar 27, 2017 at 10:55:14AM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 27 Mar 2017 03:32:47 -0400 (EDT)
> Pankaj Gupta <pagupta@...hat.com> wrote:
> 
> > Hello,
> > 
> > It looks like a race with softirq and normal process context.
> > 
> > Just thinking if we really want allocations from 'softirqs' to be
> > done using per cpu list? 
> 
> Yes, softirq need fast page allocs. The softirq use-case is refilling
> the DMA RX rings, which is time critical, especially for NIC drivers.
> For this reason most drivers implement different page recycling tricks.
> 
> > Or we can have some check in  'free_hot_cold_page' for softirqs 
> > to check if we are on a path of returning from hard interrupt don't
> > allocate from per cpu list.
> 
> A possible solution, would be use the local_bh_{disable,enable} instead
> of the {preempt_disable,enable} calls.  But it is slower, using numbers
> from [1] (19 vs 11 cycles), thus the expected cycles saving is 38-19=19.
> 
> The problematic part of using local_bh_enable is that this adds a
> softirq/bottom-halves rescheduling point (as it checks for pending
> BHs).  Thus, this might affects real workloads.
> 
> 
> I'm unsure what the best option is.  I'm leaning towards partly
> reverting[1] and go back to doing the slower local_irq_save +
> local_irq_restore as before.
> 
> Afterwards we can add a bulk page alloc+free call, that can amortize
> this 38 cycles cost (of local_irq_{save,restore}).  Or add a function
> call that MUST only be called from contexts with IRQs enabled, which
> allow using the unconditionally local_irq_{disable,enable} as it only
> costs 7 cycles.
> 

It's possible to have a separate list for hard/soft IRQ that are protected
although great care is needed to drain properly. I have a partial prototype
lying around marked as "interesting if we ever need it" but it needs more
work. It's sufficiently complex that I couldn't rush it as a fix with the
time I currently have available. For 4.11, it's safer to revert and try
again later bearing in mind that softirqs are in the critical allocation
path for some drivers.

I'll prepare a patch.

-- 
Mel Gorman
SUSE Labs