[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BF94CAD.5010504@redhat.com>
Date: Sun, 23 May 2010 18:41:33 +0300
From: Avi Kivity <avi@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
CC: Rusty Russell <rusty@...tcorp.com.au>,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
qemu-devel@...gnu.org
Subject: Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into
ring itself
On 05/23/2010 06:31 PM, Michael S. Tsirkin wrote:
> On Thu, May 20, 2010 at 02:38:16PM +0930, Rusty Russell wrote:
>
>> On Thu, 20 May 2010 02:31:50 pm Rusty Russell wrote:
>>
>>> On Wed, 19 May 2010 05:36:42 pm Avi Kivity wrote:
>>>
>>>>> Note that this is a exclusive->shared->exclusive bounce only, too.
>>>>>
>>>>>
>>>> A bounce is a bounce.
>>>>
>>> I tried to measure this to show that you were wrong, but I was only able
>>> to show that you're right. How annoying. Test code below.
>>>
>> This time for sure!
>>
>
> What do you see?
> On my laptop:
> [mst@...k testring]$ ./rusty1 share 0 1
> CPU 1: share cacheline: 2820410 usec
> CPU 0: share cacheline: 2823441 usec
> [mst@...k testring]$ ./rusty1 unshare 0 1
> CPU 0: unshare cacheline: 2783014 usec
> CPU 1: unshare cacheline: 2782951 usec
> [mst@...k testring]$ ./rusty1 lockshare 0 1
> CPU 1: lockshare cacheline: 1888495 usec
> CPU 0: lockshare cacheline: 1888544 usec
> [mst@...k testring]$ ./rusty1 lockunshare 0 1
> CPU 0: lockunshare cacheline: 1889854 usec
> CPU 1: lockunshare cacheline: 1889804 usec
>
Ugh, can the timing be normalized per operation? This is unreadable.
> So locked version seems to be faster than unlocked,
> and share/unshare not to matter?
>
May be due to the processor using the LOCK operation as a hint to
reserve the cacheline for a bit.
> same on a workstation:
> [root@...19 ~]# ./rusty1 unshare 0 1
> CPU 0: unshare cacheline: 6037002 usec
> CPU 1: unshare cacheline: 6036977 usec
> [root@...19 ~]# ./rusty1 lockunshare 0 1
> CPU 1: lockunshare cacheline: 5734362 usec
> CPU 0: lockunshare cacheline: 5734389 usec
> [root@...19 ~]# ./rusty1 lockshare 0 1
> CPU 1: lockshare cacheline: 5733537 usec
> CPU 0: lockshare cacheline: 5733564 usec
>
> using another pair of CPUs gives a more drastic
> results:
>
> [root@...19 ~]# ./rusty1 lockshare 0 2
> CPU 2: lockshare cacheline: 4226990 usec
> CPU 0: lockshare cacheline: 4227038 usec
> [root@...19 ~]# ./rusty1 lockunshare 0 2
> CPU 0: lockunshare cacheline: 4226707 usec
> CPU 2: lockunshare cacheline: 4226662 usec
> [root@...19 ~]# ./rusty1 unshare 0 2
> CPU 0: unshare cacheline: 14815048 usec
> CPU 2: unshare cacheline: 14815006 usec
>
>
That's expected. Hyperthread will be fastest (shared L1), shared L2/L3
will be slower, cross-socket will suck.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists