linux-kernel - Re: [PATCH 2/2] free_pcppages_bulk: prefetch buddy while not holding lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <525a20be-dea9-ed54-ca8e-8c4bc5e8a04f@intel.com>
Date:   Wed, 24 Jan 2018 11:23:49 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Aaron Lu <aaron.lu@...el.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Huang Ying <ying.huang@...el.com>,
        Kemi Wang <kemi.wang@...el.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH 2/2] free_pcppages_bulk: prefetch buddy while not holding
 lock

On 01/24/2018 10:19 AM, Mel Gorman wrote:
>> IOW, I don't think this has the same downsides normally associated with
>> prefetch() since the data is always used.
> That doesn't side-step the calculations are done twice in the
> free_pcppages_bulk path and there is no guarantee that one prefetch
> in the list of pages being freed will not evict a previous prefetch
> due to collisions.

Fair enough.  The description here could probably use some touchups to
explicitly spell out the downsides.

I do agree with you that there is no guarantee that this will be
resident in the cache before use.  In fact, it might be entertaining to
see if we can show the extra conflicts in the L1 given from this change
given a large enough PCP batch size.

But, this isn't just about the L1.  If the results of the prefetch()
stay in *ANY* cache, then the memory bandwidth impact of this change is
still zero.  You'll have a lot harder time arguing that we're likely to
see L2/L3 evictions in this path for our typical PCP batch sizes.

Do you want to see some analysis for less-frequent PCP frees?  We could
pretty easily instrument the latency doing normal-ish things to see if
we can measure a benefit from this rather than a tight-loop micro.