lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170920230910.GA18540@cmpxchg.org>
Date:   Wed, 20 Sep 2017 19:11:46 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     John Stoffel <john@...ffel.org>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, clm@...com,
        jack@...e.cz
Subject: Re: [PATCH 0/6] More graceful flusher thread memory reclaim wakeup

[ Fixed up CC list. John, you're sending email with
  From: John Stoffel <john@...d.stoffel.home> ]

On Wed, Sep 20, 2017 at 01:32:25PM -0600, Jens Axboe wrote:
> On 09/20/2017 01:29 PM, John Stoffel wrote:
> > On Tue, Sep 19, 2017 at 01:53:01PM -0600, Jens Axboe wrote:
> >> We've had some issues with writeback in presence of memory reclaim
> >> at Facebook, and this patch set attempts to fix it up. The real
> >> functional change is the last patch in the series, the first 5 are
> >> prep and cleanup patches.
> >>
> >> The basic idea is that we have callers that call
> >> wakeup_flusher_threads() with nr_pages == 0. This means 'writeback
> >> everything'. For memory reclaim situations, we can end up queuing
> >> a TON of these kinds of writeback units. This can cause softlockups
> >> and further memory issues, since we allocate huge amounts of
> >> struct wb_writeback_work to handle this writeback. Handle this
> >> situation more gracefully.
> > 
> > This looks nice, but do you have any numbers to show how this improves
> > things?  I read the patches, but I'm not strong enough to comment on
> > them at all.  But I am interested in how this improves writeback under
> > pressure, if at all.
> 
> Writeback should be about the same, it's mostly about preventing
> softlockups and excessive memory usage, under conditions where we are
> actively trying to reclaim/clean memory. It was bad enough to cause
> softlockups for writeback work processing, while the pending writeback
> work units grew to insane lengths.

In numbers, we have seen situations where we had 600 million writeback
work items queued up from reclaim under pressure. That's 35G worth of
work descriptors, and the machine was struggling to remain responsive
due to a lack of memory.

Once writeback against all outstanding dirty pages has been requested,
there really isn't a need to queue even a second work item; the job is
already being performed. We can queue the next one when it completes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ