[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080111071936.GA16175@elte.hu>
Date: Fri, 11 Jan 2008 08:19:36 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Andi Kleen <ak@...e.de>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Venki Pallipadi <venkatesh.pallipadi@...el.com>,
suresh.b.siddha@...el.com, Arjan van de Ven <arjan@...radead.org>,
Dave Jones <davej@...hat.com>
Subject: Re: CPA patchset
* Andi Kleen <ak@...e.de> wrote:
> > > > but that's not too smart: why dont they use WB plus cflush
> > > > instead?
> > >
> > > Because they need to access it WC for performance.
> >
> > I think you have it fundamentally backwards: the best for
> > performance is WB + cflush. What would WC offer for performance that
> > cflush cannot do?
>
> Cached requires the cache line to be read first before you can write
> it.
nonsense, and you should know it. It is perfectly possible to construct
fully written cachelines, without reading the cacheline first. MOVDQ is
SSE1 so on basically in every CPU today - and it is 16 byte aligned and
can generate full cacheline writes, _without_ filling in the cacheline
first. Bulk ops (string ops, etc.) will do full cacheline writes too,
without filling in the cacheline. Especially with high performance 3D
ops we do _NOT_ need any funky reads from anywhere because 3D software
can stream a lot of writes out: we construct a full frame or a portion
of a frame, or upload vertices or shader scripts, textures, etc.
( also, _even_ when there is a cache fill pending on for a partially
written cacheline, that might go on in parallel and it is not
necessarily holding up the CPU unless it has an actual data dependency
on that. )
but that's totally besides the point anyway. WC or WB accesses, if a 3D
app or a driver does high-freq change_page_attr() calls, it will _lose_
the performance game:
> > also, it's irrelevant to change_page_attr() call frequency. Just map
> > in everything from the card and use it. In graphics, if you remap
> > anything on the fly and it's not a slowpath you've lost the
> > performance game even before you began it.
>
> The typical case would be lots of user space DRI clients supplying
> their own buffers on the fly. There's not really a fixed pool in this
> case, but it all varies dynamically. In some scenarios that could
> happen quite often.
in what scenarios? Please give me in-tree examples of such high-freq
change_page_attr() cases, where the driver authors would like to call it
with high frequency but are unable to do it and see performance problems
due to the WBINVD.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists