linux-kernel - Re: [PATCH] mm, slub: Use prefetchw instead of prefetch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <904b6e72-cc2e-2e4d-5601-dacab734bf15@suse.cz>
Date:   Mon, 11 Oct 2021 09:21:01 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Rientjes <rientjes@...gle.com>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm, slub: Use prefetchw instead of prefetch

On 10/11/21 00:49, David Rientjes wrote:
> On Fri, 8 Oct 2021, Hyeonggon Yoo wrote:
> 
>> It's certain that an object will be not only read, but also
>> written after allocation.
>> 
> 
> Why is it certain?  I think perhaps what you meant to say is that if we 
> are doing any prefetching here, then access will benefit from prefetchw 
> instead of prefetch.  But it's not "certain" that allocated memory will be 
> accessed at all.

I think the primary reason there's a prefetch is freelist traversal. The
cacheline we prefetch will be read during the next allocation, so if we
expect there to be one soon, prefetch might help. That the freepointer is
part of object itself and thus the cache line will be probably accessed also
after the allocation, is secondary. Yeah this might help some workloads, but
perhaps hurt others - these things might look obvious in theory but be
rather unpredictable in practice. At least some hackbench results would help...

>> Use prefetchw instead of prefetchw. On supported architecture
> 
> If we're using prefetchw instead of prefetchw, I think the diff would be 
> 0 lines changed :)
> 
>> like x86, it helps to invalidate cache line when the object exists
>> in other processors' cache.
>> 
>> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@...il.com>
>> ---
>>  mm/slub.c | 7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 3d2025f7163b..2aca7523165e 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -352,9 +352,9 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
>>  	return freelist_dereference(s, object + s->offset);
>>  }
>>  
>> -static void prefetch_freepointer(const struct kmem_cache *s, void *object)
>> +static void prefetchw_freepointer(const struct kmem_cache *s, void *object)

I wouldn't rename the function itself, unless we have both  variants for
different situations (we don't). That it uses prefetchw() is internal detail
at this point.

>>  {
>> -	prefetch(object + s->offset);
>> +	prefetchw(object + s->offset);
>>  }
>>  
>>  static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
>> @@ -3195,10 +3195,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>>  			note_cmpxchg_failure("slab_alloc", s, tid);
>>  			goto redo;
>>  		}
>> -		prefetch_freepointer(s, next_object);
>> +		prefetchw_freepointer(s, next_object);
>>  		stat(s, ALLOC_FASTPATH);
>>  	}
>> -
>>  	maybe_wipe_obj_freeptr(s, object);
>>  	init = slab_want_init_on_alloc(gfpflags, s);
>>  
>> -- 
>> 2.27.0
>> 
>> 
>