linux-kernel - Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod6k-pwbVRFis0QyGeQbAdmBHx2V0suD_7r-0OTfdxJhGA@mail.gmail.com>
Date:   Mon, 15 Jan 2018 09:04:58 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Andrey Ryabinin <aryabinin@...tuozzo.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Cgroups <cgroups@...r.kernel.org>, Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

On Mon, Jan 15, 2018 at 4:29 AM, Andrey Ryabinin
<aryabinin@...tuozzo.com> wrote:
>
>
> On 01/13/2018 01:57 AM, Shakeel Butt wrote:
>> On Fri, Jan 12, 2018 at 4:24 AM, Michal Hocko <mhocko@...nel.org> wrote:
>>> On Fri 12-01-18 00:59:38, Andrey Ryabinin wrote:
>>>> On 01/11/2018 07:29 PM, Michal Hocko wrote:
>>> [...]
>>>>> I do not think so. Consider that this reclaim races with other
>>>>> reclaimers. Now you are reclaiming a large chunk so you might end up
>>>>> reclaiming more than necessary. SWAP_CLUSTER_MAX would reduce the over
>>>>> reclaim to be negligible.
>>>>>
>>>>
>>>> I did consider this. And I think, I already explained that sort of race in previous email.
>>>> Whether "Task B" is really a task in cgroup or it's actually a bunch of reclaimers,
>>>> doesn't matter. That doesn't change anything.
>>>
>>> I would _really_ prefer two patches here. The first one removing the
>>> hard coded reclaim count. That thing is just dubious at best. If you
>>> _really_ think that the higher reclaim target is meaningfull then make
>>> it a separate patch. I am not conviced but I will not nack it it either.
>>> But it will make our life much easier if my over reclaim concern is
>>> right and we will need to revert it. Conceptually those two changes are
>>> independent anywa.
>>>
>>
>> Personally I feel that the cgroup-v2 semantics are much cleaner for
>> setting limit. There is no race with the allocators in the memcg,
>> though oom-killer can be triggered. For cgroup-v1, the user does not
>> expect OOM killer and EBUSY is expected on unsuccessful reclaim. How
>> about we do something similar here and make sure oom killer can not be
>> triggered for the given memcg?
>>
>> // pseudo code
>> disable_oom(memcg)
>> old = xchg(&memcg->memory.limit, requested_limit)
>>
>> reclaim memory until usage gets below new limit or retries are exhausted
>>
>> if (unsuccessful) {
>>   reset_limit(memcg, old)
>>   ret = EBUSY
>> } else
>>   ret = 0;
>> enable_oom(memcg)
>>
>> This way there is no race with the allocators and oom killer will not
>> be triggered. The processes in the memcg can suffer but that should be
>> within the expectation of the user. One disclaimer though, disabling
>> oom for memcg needs more thought.
>
> That's might be worse. If limit is too low, all allocations (except __GFP_NOFAIL of course) will start
> failing. And the kernel not always careful enough in -ENOMEM handling.
> Also, it's not much different from oom killing everything, the end result is almost the same -
> nothing will work in that cgroup.
>

By disabling memcg oom, I meant to treat all allocations from that
memcg as __GFP_NOFAIL until the oom is disabled. I will see if I can
convert this into an actual code.

>
>> Shakeel
>>