linux-kernel - Re: [PATCH 2/5] vmevent: Convert from deferred timer to deferred work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120608121334.GA20772@lizard>
Date:	Fri, 8 Jun 2012 05:13:34 -0700
From:	Anton Vorontsov <anton.vorontsov@...aro.org>
To:	leonid.moiseichuk@...ia.com
Cc:	kosaki.motohiro@...il.com, penberg@...nel.org,
	b.zolnierkie@...sung.com, john.stultz@...aro.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	linaro-kernel@...ts.linaro.org, patches@...aro.org,
	kernel-team@...roid.com
Subject: Re: [PATCH 2/5] vmevent: Convert from deferred timer to deferred work

On Fri, Jun 08, 2012 at 11:03:29AM +0000, leonid.moiseichuk@...ia.com wrote:
> > -----Original Message-----
> > From: ext Anton Vorontsov [mailto:cbouatmailru@...il.com]
> > Sent: 08 June, 2012 13:35
> ...
> > > Context switches, parsing, activity in userspace even memory situation is
> > not changed.
> > 
> > Sure, there is some additional overhead. I'm just saying that it is not drastic. It
> > would be like 100 sprintfs + 100 sscanfs + 2 context switches? Well, it is
> > unfortunate... but come on, today's phones are running X11 and Java. :-)
> 
> Vmstat generation is not so trivial. Meminfo has even higher overhead. I just checked generation time using idling device and open/read test:
> - vmstat min 30, avg 94 max 2746 uSeconds
> - meminfo min 30, average 65 max 15961 uSeconds
> 
> In comparison /proc/version for the same conditions: min 30, average 41, max 1505 uSeconds

Hm. I would expect that avg value for meminfo will be much worse
than vmstat (meminfo grabs some locks).

OK, if we consider 100ms interval, then this would be like 0.1%
overhead? Not great, but still better than memcg:

http://lkml.org/lkml/2011/12/21/487 

:-)

Personally? I'm all for saving these 0.1% tho, I'm all for vmevent.
But, for example, it's still broken for SMP as it is costly to
update vm_stat. And I see no way to fix this.

So, I guess the right approach would be to find ways to not depend on
frequent vm_stat updates (and thus reads).

userland deferred timers (and infrequent reads from vmstat) +
"userland vm pressure notifications" looks promising for the userland
solution.

For in-kernel solution it is all the same, a deferred timer that
reads vm_stat occasionally (no pressure case) + in-kernel shrinker
notifications for fast reaction under pressure.

> > > In kernel space you can use sliding timer (increasing interval) + shinker.
> > 
> > Well, w/ Minchan's idea, we can get shrinker notifications into the userland,
> > so the sliding timer thing would be still possible.
> 
> Only as a post-schrinker actions. In case of memory stressing or
> close-to-stressing conditions shrinkers called very often, I saw up to
> 50 times per second.

Well, yes. But in userland you would just poll/select on the shrinker
notification fd, you won't get more than you can (or want to) process.

-- 
Anton Vorontsov
Email: cbouatmailru@...il.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/