linux-kernel - Re: [PATCH v2] vmpressure: implement strict mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOK=xROD2AKbgw4V65ddqWFODtn4B1-uYG-NF==oANqVFmZZtg@mail.gmail.com>
Date:	Mon, 1 Jul 2013 17:22:36 +0900
From:	Hyunhee Kim <hyunhee.kim@...sung.com>
To:	Anton Vorontsov <anton@...msg.org>
Cc:	Luiz Capitulino <lcapitulino@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, mhocko@...e.cz, kmpark@...radead.org
Subject: Re: [PATCH v2] vmpressure: implement strict mode

2013/6/29 Anton Vorontsov <anton@...msg.org>:
> On Fri, Jun 28, 2013 at 03:44:02PM -0400, Luiz Capitulino wrote:
>> > Why can't you use poll() and demultiplex the events? Check if there is an
>> > event in the crit fd, and if there is, then just ignore all the rest.
>>
>> This may be a valid workaround for current kernels, but application
>> behavior will be different among kernels with a different number of
>> events.
>
> This is not a workaround, this is how poll works, and this is kinda
> expected... But not that I had this plan in mind when I was designing the
> current scheme... :)
>
>> Say, we events on top of critical. Then crit fd will now be
>> notified for cases where it didn't use to on older kernels.
>
> I'm not sure I am following here... but thinking about it more, I guess
> the extra read() will be needed anyway (to reset the counter).
>
>> > > However, it *is* possible to make non-strict work on strict if we make
>> > > strict default _and_ make reads on memory.pressure_level return
>> > > available events. Just do this on app initialization:
>> > >
>> > > for each event in memory.pressure_level; do
>> > >   /* register eventfd to be notified on "event" */
>> > > done
>> >
>> > This scheme registers "all" events.
>>
>> Yes, because I thought that's the user-case that matters for activity
>> manager :)
>
> Some activity managers use only low levels (Android), some might use only
> medium levels (simple load-balancing).

When the platform like Android uses only "low" level, is it the
process you intended when designing vmpressure?

1. activity manager receives "low" level events
2. it reads and checks the current memory (e.g. available memory) using vmstat
3. if the available memory is not under the threshold (defined e.g. by
activity manager), activity manager does nothing
4. if the available memory is under the threshold, activity manager
handles it by e.g. reclaiming or killing processes?

At first time when I saw this vmpressure, I thought that I should
register all events ("low", "medium", and "critical
") and use different handler for each event. However, without the mode
like strict mode, I should see too many events. So, now, I think that
it is better to use only one level and run each handler after checking
available memory as you mentioned.

But,

1. Isn't it overhead to read event and check memory state every time
we receive events?
    - sometimes, even when there are lots of available memory, low
level event could occur if most of them is reclaimable memory not free
pages.
    - Don't most of platforms use available memory to judge their
current memory state? Is there any reason vmpressure use reclaim rate?
IMO, activity manager doesn't have to check available memory if it
could receive signal based on the available memory.

2. If we use only "medium" to avoid the overheads occurred when using
"low" level, isn't it possible to miss sending events when there is a
little available memory but reclaim ratio is high?

IMHO, we cannot consider and cover all the use cases, but considering
some use cases and giving some guides and directions to use this
vmpressure will be helpful to make many platform accept this for their
low memory manager.

Thanks,
Hyunhee Kim.

>
> Being able to register only "all" does not make sense to me.
>
>> > Here is more complicated case:
>> >
>> > Old kernels, pressure_level reads:
>> >
>> >   low, med, crit
>> >
>> > The app just wants to listen for med level.
>> >
>> > New kernels, pressure_level reads:
>> >
>> >   low, FOO, med, BAR, crit
>> >
>> > How would application decide which of FOO and BAR are ex-med levels?
>>
>> What you meant by ex-med?
>
> The scale is continuous and non-overlapping. If you add some other level,
> you effectively "shrinking" other levels, so the ex-med in the list above
> might correspond to "FOO, med" or "med, BAR" or "FOO, med, BAR", and that
> is exactly the problem.
>
>> Let's not over-design. I agree that allowing the API to be extended
>> is a good thing, but we shouldn't give complex meaning to events,
>> otherwise we're better with a numeric scale instead.
>
> I am not over-desiging at all. Again, you did not provide any solution for
> the case if we are going to add a new level. Thing is, I don't know if we
> are going to add more levels, but the three-levels scheme is not something
> scientifically proven, it is just an arbitrary thing we made up. We may
> end up with four, or five... or not.
>
> Thanks,
>
> Anton
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/