lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e80cd2e6-6f7c-4337-a170-152926863290@arm.com>
Date:   Wed, 9 Aug 2023 17:08:32 +0100
From:   Ryan Roberts <ryan.roberts@....com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        David Hildenbrand <david@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v4 2/5] mm: LARGE_ANON_FOLIO for improved performance

[...]

>>>> Let me reiterate [1]:
>>>>   My impression is we only agreed on one thing: at the current stage, we
>>>>   should respect things we absolutely have to. We didn't agree on what
>>>>   "never" means ("never 2MB" or "never >4KB"), and we didn't touch on
>>>>   how "always" should behave at all.
>>>>
>>>> And [2]:
>>>>   (Thanks to David, now I agree that) we have to interpret MADV_NOHUGEPAGE
>>>>   as nothing >4KB.
>>>>
>>>> My final take [3]:
>>>>   I agree these points require more discussion. But I don't think we
>>>>   need to conclude them now, unless they cause correctness issues like
>>>>   ignoring MADV_NOHUGEPAGE would.
>>>
>>> Thanks, I've read all of these comments previously, and appreciate the time you
>>> have put into the feedback. I'm not sure I fully agree with your point that we
>>> don't need to conclude on a policy now; I certainly don't think we need the
>>> whole thing in place on day 1, but I do think that whatever we put in should
>>> strive to be a strict subset of where we think we are going. For example, if we
>>> put something in with one policy (i.e. "never" only means "never 2MB") then find
>>> a problem and have to change that to be more conservative, are we risking perf
>>> regressions for any LAF users that started using it on day 1?
>>
>> It's not that I don't want to -- I just don't think we have enough
>> information before we have a wider deployment [1] and gain a better
>> understanding of real-world scenarios.
>>
>> Of course we could force a conclusion, a mostly opinion-based one. But
>> it would still involve prolonged discussions and delay this series, or
>> rush into decisions we might regret later.
>>
>> [1] Our fleets (servers, laptops and phones) support large-scale
>> experiments and I plan to run them on both client and server devices.

This all sounds great and I'm looking forward to seeing results! But I guess I
had been assuming that this sort of testing would be preferable to do before we
merge; that allows us to get confidence in the approach and reduces the changes
of having to change it later. I guess you have policies that prevent you from
testing this series at the scale you want until it is merged?

I'm not convinced this testing will help us answer the "what does never mean?"
question; if nothing breaks in your testing, it doesn't mean there aren't
systems out there that would break - it's hard to prove a negative. I think its
mostly embedded systems that use thp=never to reduce memory footprint to the
absolute minimum?


>>
>>>> But I should have been clear about the parameters to
>>>> hugepage_vma_check(): enforce_sysfs=false.
>>>
>>> So hugepage_vma_check(..., smaps=false, in_pf=true, enforce_sysfs=false) would
>>> give us:
>>>
>>>                 | prctl/fw  | sysfs     | sysfs     | sysfs
>>>                 | disable   | never     | madvise   | always
>>> ----------------|-----------|-----------|-----------|-----------
>>> no hint         | S         | LAF>S     | LAF>S     | THP>LAF>S
>>> MADV_HUGEPAGE   | S         | LAF>S     | THP>LAF>S | THP>LAF>S
>>> MADV_NOHUGEPAGE | S         | S         | S         | S
>>>
>>> Where "prctl/fw disable" trumps the sysfs setting.
>>>
>>> I can certainly see the benefit of this approach; it gives us a way to enable
>>> LAF while disabling THP (thp=never). It doesn't give us a way to enable THP
>>> without enabling LAF though (unless you recompile with LAF disabled). Does
>>> anyone see a problem with this?
>>
>> I do myself :)
>>
>> This is just something temporary to get this series landed. We are
>> hiding behind a Kconfig, not making any ABI changes, and not exposing
>> this policy to userspace (i.e., not updating Documentation/, man
>> pages, etc.)
>>
>> Meanwhile, we can keep discussing all the open questions in parallel.

You're right - don't want to slow down the testing, so I'm going to post a v5
tomorrow with the policy in the table above. We're still waiting for the
prerequisites to land before we can kick off testing in anger though.

> 
> And the stat ABI changes should be discussed before or at the same
> time. If we came up with a policy but there was *zero* observability
> of how well that policy works...

Yep agreed. I have a series at [1] which I hoped would kickstart that discussion.

[1] https://lore.kernel.org/linux-mm/20230613160950.3554675-1-ryan.roberts@arm.com/

Thanks,
Ryan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ