[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <096eec91-89a5-6068-acd3-c457d3ebd4d1@redhat.com>
Date: Mon, 18 Jul 2022 12:42:43 +0200
From: David Hildenbrand <david@...hat.com>
To: Vaibhav Jain <vaibhav@...ux.ibm.com>, linux-kernel@...r.kernel.org
Cc: "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Muchun Song <songmuchun@...edance.com>
Subject: Re: [PATCH] resource: re-factor page_is_ram()
[sorry for the late reply]
>
> The improvement to the gcc (v12.1.1) generated code (x86_64) for
> page_is_ram is quite evident.
>
> With the patch:
> 0x0000000000000920 <+0>: call 0x925 <page_is_ram+5>
> 0x0000000000000925 <+5>: shl $0xc,%rdi
> 0x0000000000000929 <+9>: xor %r8d,%r8d
> 0x000000000000092c <+12>: xor %ecx,%ecx
> 0x000000000000092e <+14>: mov $0x81000200,%edx
> 0x0000000000000933 <+19>: lea 0x1(%rdi),%rsi
> 0x0000000000000937 <+23>: call 0x7e0 <find_next_iomem_res>
> 0x000000000000093c <+28>: test %eax,%eax
> 0x000000000000093e <+30>: sete %al
> 0x0000000000000941 <+33>: movzbl %al,%eax
> 0x0000000000000944 <+36>: ret
> 0x0000000000000945 <+37>: int3
>
> Without the patch:
> 0x0000000000001000 <+0>: call 0x1005 <page_is_ram+5>
> 0x0000000000001005 <+5>: shl $0xc,%rdi
> 0x0000000000001009 <+9>: lea 0xfff(%rdi),%rsi
> 0x0000000000001010 <+16>: cmp %rsi,%rdi
> 0x0000000000001013 <+19>: jae 0x1064 <page_is_ram+100>
> 0x0000000000001015 <+21>: sub $0x40,%rsp
> 0x0000000000001019 <+25>: xor %ecx,%ecx
> 0x000000000000101b <+27>: mov $0x81000200,%edx
> 0x0000000000001020 <+32>: mov %rsp,%r8
> 0x0000000000001023 <+35>: call 0x7e0 <find_next_iomem_res>
> 0x0000000000001028 <+40>: test %eax,%eax
> 0x000000000000102a <+42>: jne 0x105a <page_is_ram+90>
> 0x000000000000102c <+44>: mov (%rsp),%rax
> 0x0000000000001030 <+48>: mov $0x1,%ecx
> 0x0000000000001035 <+53>: lea 0xfff(%rax),%rdx
> 0x000000000000103c <+60>: mov 0x8(%rsp),%rax
> 0x0000000000001041 <+65>: shr $0xc,%rdx
> 0x0000000000001045 <+69>: add $0x1,%rax
> 0x0000000000001049 <+73>: shr $0xc,%rax
> 0x000000000000104d <+77>: cmp %rax,%rdx
> 0x0000000000001050 <+80>: jae 0x105a <page_is_ram+90>
> 0x0000000000001052 <+82>: mov %ecx,%eax
> 0x0000000000001054 <+84>: add $0x40,%rsp
> 0x0000000000001058 <+88>: ret
> 0x0000000000001059 <+89>: int3
> 0x000000000000105a <+90>: xor %ecx,%ecx
> 0x000000000000105c <+92>: add $0x40,%rsp
> 0x0000000000001060 <+96>: mov %ecx,%eax
> 0x0000000000001062 <+98>: ret
> 0x0000000000001063 <+99>: int3
> 0x0000000000001064 <+100>: xor %eax,%eax
> 0x0000000000001066 <+102>: ret
> 0x0000000000001067 <+103>: int3
>
> Looking at the disassembly above, gcc has inlined both walk_system_ram_range()
> and __is_ram() in page_is_ram(). This ends up in page_is_ram() calling
> find_next_iomem_res() directly anyways with bunch of book-keeping
> afterwards which can be avoided.
We usually don't care about such micro-optimizations unless you can
showcase actual performance numbers. Otherwise we'd have constant,
unnecessary code-churn all over the place.
Most probably, all that list walking dominates the runtime either way.
Feel free to proof me wrong ;)
> >>
>> If it doesn't make the code easier to read (at least for me), why do we
>> care?
> IMHO, calling find_next_iomem_res() from page_is_ram() instead of
> calling walk_system_ram_range() makes it easy to trace the path of
> page_is_ram(). Also the dummy callback makes the code flow seems strange
> initially.
>
I'm not convinced, but I don't care enough to object. I'll add more
review feedback to the patch.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists