[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180322170842.GG28468@bombadil.infradead.org>
Date: Thu, 22 Mar 2018 10:08:42 -0700
From: Matthew Wilcox <willy@...radead.org>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Matthew Wilcox <mawilcox@...rosoft.com>,
Netdev <netdev@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool
On Thu, Mar 22, 2018 at 09:39:40AM -0700, Alexander Duyck wrote:
> So I was just thinking about this and it would probably make more
> sense to look at addressing this after you take care of your
> conversion from size/offset to a mask. One thing with the mask is that
> it should never reach 64K since that is the largest page size if I
> recall. With that being the case we could look at dropping mask to a
> u16 value and then add a u16 flags field where you could store things
> like this. Then you could avoid having to do the masking and math you
> are having to do below.
With the bit being in the top bit, it's actually no maths at all in the
caller; it only looks like it in C. Here's what GCC ends up doing:
e66: e8 00 00 00 00 callq e6b <__netdev_alloc_skb+0x7b>
e67: R_X86_64_PC32 page_frag_alloc-0x4
e6b: 44 8b 3d 00 00 00 00 mov 0x0(%rip),%r15d
...
e8c: 45 85 ff test %r15d,%r15d
e8f: 79 04 jns e95 <__netdev_alloc_skb+0xa5>
e91: 80 48 78 08 orb $0x8,0x78(%rax)
e95: 80 48 76 20 orb $0x20,0x76(%rax)
ie it's testing the top bit by looking at the sign bit. If I move it to
the second-top bit (1 << 30), it does this instead:
e66: e8 00 00 00 00 callq e6b <__netdev_alloc_skb+0x7b>
e67: R_X86_64_PC32 page_frag_alloc-0x4
e6b: 44 8b 2d 00 00 00 00 mov 0x0(%rip),%r13d
...
e75: 41 81 e5 00 00 00 40 and $0x40000000,%r13d
...
e93: 45 85 ed test %r13d,%r13d
e96: 74 04 je e9c <__netdev_alloc_skb+0xac>
e98: 80 48 78 08 orb $0x8,0x78(%rax)
e9c: 80 48 76 20 orb $0x20,0x76(%rax)
Changing mask to an unsigned short and adding a bool pfmemalloc to the
struct, I get:
e66: e8 00 00 00 00 callq e6b <__netdev_alloc_skb+0x7b>
e67: R_X86_64_PC32 page_frag_alloc-0x4
e6b: 44 0f b6 3d 00 00 00 movzbl 0x0(%rip),%r15d
e72: 00
...
e8d: 45 84 ff test %r15b,%r15b
e90: 74 04 je e96 <__netdev_alloc_skb+0xa6>
e92: 80 48 78 08 orb $0x8,0x78(%rax)
e96: 80 48 76 20 orb $0x20,0x76(%rax)
actually one byte less efficient code due to movzbl being one byte longer.
Powered by blists - more mailing lists