linux-kernel - Write-back from inside FS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <46FCC686.3050009@yandex.ru>
Date:	Fri, 28 Sep 2007 12:16:54 +0300
From:	Artem Bityutskiy <dedekind@...dex.ru>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Write-back from inside FS - need suggestions

Hi,

we are writing anew flash FS (UBIFS) and need some advise/suggestion.
Brief FS info and the code are available at
http://www.linux-mtd.infradead.org/doc/ubifs.html.

At any point of time we may have a plenty of cached stuff which have to
be written back later to the flash media: dirty pages an dirty inodes.
This is what we call "liability" - current set of dirty pages and
inodes UBIFS must be able to write back on demand.

The problem is that we cannot do accurate flash space accounting due
to several reasons:
1. Wastage - some smal random amount of flash space at ends or
  eraseblocks cannot be used.
2. Compression - we do not know how well will the pages be compressed,
  so we do not know how much flash space will they consume.

So, if our current liability is X, we do not know exactly how much
flash space (Y) it will take. All we can do is to introduce some
pessimistic, worst-case function Y = F(X). This pessimistic function
assumes that pages won't be compressible, and it assumes worst-case
wastage. In real life it is hardly going to happen, but possible.
The functiion is really bad and may lead to huge over-estimations
like 40%.

So, if we are, say, in ->prepare_write(), we have to decide whether
there is enough flash space to write-back this page later. We do not
want to fail with -ENOSPC when,say, pdflush writes the page back. So
we use our pessimistic function F(X) to decide whether we have enough
space or not. If there is a plenty of flash space, the F(X) says "yes",
and just we proceed. The question is what do we do if F(X) says "no"?

If we just return -ENOSPC, the flash space utilization becomes too
poor, just because F() is really rough. We do have space in most
real-life cases. All we have to do in this case is to lessen our
liability. IOW, we have to flush few dirty inodes/pages, then we'd
be able to proceed.

So my question is: how can we flush _few_ oldest dirty pages/inodes
while we are inside UBIFS (e.g., in ->prepare_write(), ->mkdir(),
->link(), etc)?

I failed to find VFS calls which would do this. Stuff like
sync_sb_inodes() is not exactly what we need. Should we implement
a similar function? Since we have to call it from inside UBIFS, which
means we are holding i_mutex and the inode is locked, the function
has to be smart enough not to wait on this inode, but wait on other
inodes if needed.

A solution like kicking pdflush to do the job and wait on a waitqueue
would probably also work, but I'd prefer to do this from the context
of current task.

Should we have our own list of inodes and call write_inode_now() for
dirty ones? But I'd prefer to let VFS pick oldest victims.

So I'm asking for ideas which would work and be acceptable by the
community later.

Thanks! 

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/