linux-kernel - Re: [patch] epoll use a single inode ...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <8638AE87-6005-4021-98EB-43E55694B0FB@mac.com>
Date:	Thu, 8 Mar 2007 20:34:50 -0500
From:	Kyle Moffett <mrmacman_g4@....com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	"Michael K. Edwards" <medwards.linux@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Anton Blanchard <anton@...ba.org>,
	Davide Libenzi <davidel@...ilserver.org>,
	Avi Kivity <avi@...o.co.il>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [patch] epoll use a single inode ...

On Mar 08, 2007, at 02:24:04, Eric Dumazet wrote:
> Kyle Moffett a écrit :
>> Prefetching is also fairly critical on a Power4 or G5 PowerPC  
>> system as they have a long memory latency; an L2-cache miss can  
>> cost 200+ cycles.  On such systems the "dcbt" prefetch instruction  
>> brings in a single 128-byte cacheline and has no serializing  
>> effects whatsoever, making it ideal for use in a linked-list- 
>> traversal inner loop.
>
> OK, 200 cycles...
>
> But what is the cost of the conditional branch you added in prefetch 
> (x) ?
>
> if (!x) return;
>
> (correctly predicted or not, but do powerPC have a BTB ?)

Well, PowerPC "dcbt" does prefetch() correctly, it doesn't ever raise  
exceptions, doesn't have any side effects, takes only enough CPU to  
decode the address, and is ignored if it would have to do anything  
other than load the cacheline from RAM.  Prefetch streams are halted  
when they reach the end of a page boundary (no trapping to the MMU)  
and if the TLB entry isn't present then they would asynchronously  
abort.  This is a suitable prefetch implementation for G5:

   #define prefetch(x) __asm__ __volatile__ ("dcbt 0, %0"::"r"(x));

On GCC4+ the "__builtin_prefetch" built-in functions is probably  
mildly better tuned for most architectures:

   #define PREFETCH_RD 0
   #define PREFETCH_WR 1
   #define PREFETCH_LOCALITY_NONE 0
   #define PREFETCH_LOCALITY_LOW  1
   #define PREFETCH_LOCALITY_MED  2
   #define PREFETCH_LOCALITY_HIGH 3
   #define prefetch(x, args...) __builtin_prefetch(x ,##args)

The pertinent GCC docs:
> This function is used to minimize cache-miss latency by moving data  
> into a cache before it is accessed. You can insert calls to  
> __builtin_prefetch into code for which you know addresses of data  
> in memory that is likely to be accessed soon. If the target  
> supports them, data prefetch instructions will be generated. If the  
> prefetch is done early enough before the access then the data will  
> be in the cache by the time it is accessed.
>
> The value of addr is the address of the memory to prefetch. There  
> are two optional arguments, rw and locality. The value of rw is a  
> compile-time constant one or zero; one means that the prefetch is  
> preparing for a write to the memory address and zero, the default,  
> means that the prefetch is preparing for a read. The value locality  
> must be a compile-time constant integer between zero and three. A  
> value of zero means that the data has no temporal locality, so it  
> need not be left in the cache after the access. A value of three  
> means that the data has a high degree of temporal locality and  
> should be left in all levels of cache possible. Values of one and  
> two mean, respectively, a low or moderate degree of temporal  
> locality. The default is three.
> 	for (i = 0; i < n; i++) {
> 		a[i] = a[i] + b[i];
> 		__builtin_prefetch (&a[i+j], 1, 1);
> 		__builtin_prefetch (&b[i+j], 0, 1);
> 		/* ... */
> 	}
>
> Data prefetch does not generate faults if addr is invalid, but the  
> address expression itself must be valid. For example, a prefetch of  
> p->next will not fault if p->next is not a valid address, but  
> evaluation will fault if p is not a valid address.
>
> If the target does not support data prefetch, the address  
> expression is evaluated if it includes side effects but no other  
> code is generated and GCC does not issue a warning.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/