netdev - Re: [PATCH] ipv4: move rt_genid to different cache line

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+mtBx_HKz3BNF=_FRTS-a-JZj9ss5TWaJU9d=oFewZbCu1_NQ@mail.gmail.com>
Date:	Fri, 3 Jan 2014 10:01:47 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH] ipv4: move rt_genid to different cache line

On Thu, Jan 2, 2014 at 4:38 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Thu, 2014-01-02 at 12:00 -0800, Tom Herbert wrote:
>> Running a simple netperf TCP_RR test with 200 clients shows that
>> ipv4_dst_check is high on the list of functions in 'perf top'. The
>> pertinent action in this function is in the call to rt_is_expired
>> which checks the route genid (rt->rt_gentid) against the global value.
>> rt_genid is in the same cacheline as dst->__refcnt which is causing
>> false sharing.
>>
>> This fix moves rt_genid into the first cacheline of the dst structure.
>> The dst structue is explicitly packed for cacheline optimization, so to
>> make room for the genid, I moved xfrm to cacheline with __refcnt (under
>> the assumption it is less likely to be in the critical path).
>
> This looks like a difficult to check assumption to me ....
>
> rcu_head surely could be moved without performance impact.
>
Yes, I had thought about that. It's two pointers so this will generate
a extra hole, probably not worth moving it for 64 bit pointer case I
suppose. I can try a patch for that.

> Again, optimizing network stack for 200 TCP_RR special case is very
> questionable. The dst must be refcounted because of special prequeue
> mode, where a thread is blocked in a recvmsg() call.
>
If you can suggest an alternate test we should be running that
reflects the "real world", I can run that.

> Any modern application handling lot of sockets uses poll()/epoll()
> anyway and prequeue is disabled : No refcount is taken, as the dst is
> released (no refcount touch because of RCU) before the skb is queued
> into socket receive queue.
>
Notwithstanding that we can't feasibly audit a million applications to
verify that they follow the "correct" design, it is still common to
dispatch individual sockets to worker threads which will do blocking
reads for long lived requests.

It seems to me you have more of an issue with prequeue itself. Maybe
it is worth considering removing it to eliminate the disparate
behavior between read and poll+read, but then removing a long lived
optimization has is own pitfalls.

>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html