lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <nywd7rix6qapogaqg5kxny3k272ptpzi4qv5q2dqts42bqbxrw@te5enoxvdhti>
Date: Thu, 29 Feb 2024 16:16:00 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Brian Foster <bfoster@...hat.com>
Cc: linux-bcachefs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	djwong@...nel.org
Subject: Re: [PATCH 04/21] bcachefs: Disk space accounting rewrite

On Thu, Feb 29, 2024 at 01:44:27PM -0500, Brian Foster wrote:
> On Wed, Feb 28, 2024 at 11:10:12PM -0500, Kent Overstreet wrote:
> > I think it ended up not needing to be moved, and I just forgot to drop
> > it - originally I disallowed accounting entries that referenced
> > nonexistent devices, but that wasn't workable so now it's only nonzero
> > accounting keys that aren't allowed to reference nonexistent devices.
> > 
> > I'll see if I can delete it.
> > 
> 
> Do you mean to delete the change that moves the call, or the flush call
> entirely?

Delte the change, I think there's further cleanup (& probably bugs to
fix) possible with that flush call but I'm not going to get into it
right now.

> > +/*
> > + * Notes on disk accounting:
> > + *
> > + * We have two parallel sets of counters to be concerned with, and both must be
> > + * kept in sync.
> > + *
> > + *  - Persistent/on disk accounting, stored in the accounting btree and updated
> > + *    via btree write buffer updates that treat new accounting keys as deltas to
> > + *    apply to existing values. But reading from a write buffer btree is
> > + *    expensive, so we also have
> > + *
> 
> I find the wording a little odd here, and I also think it would be
> helpful to explain how/from where the deltas originate. For example,
> something along the lines of:
> 
> "Persistent/on disk accounting, stored in the accounting btree and
> updated via btree write buffer updates. Accounting updates are
> represented as deltas that originate from <somewhere? trans triggers?>.
> Accounting keys represent these deltas through commit into the write
> buffer. The accounting/delta keys in the write buffer are then
> accumulated into the appropriate accounting btree key at write buffer
> flush time."

yeah, that's worth including.

There's an interesting point that you're touching on; btree write buffer
are always dependent state changes from some other (non write buffer)
btree; we never look at a write buffer btree and generate an update
there - we can't, reading from a write buffer btree doesn't get you
anything consistent or up to date.

So in normal operation it really only makes sense to do write buffer
updates from a transactional trigger - that's the only way to use them
and have them be consistent with the resst of the filesystem.

And since triggers work by comparing old and new, they naturally
generate updates that are deltas.

> > + *  - In memory accounting, where accounting is stored as an array of percpu
> > + *    counters, indexed by an eytzinger array of disk acounting keys/bpos (which
> > + *    are the same thing, excepting byte swabbing on big endian).
> > + *
> 
> Not really sure about the keys vs. bpos thing, kind of related to my
> comments on the earlier patch. It might be more clear to just elide the
> implementation details here, i.e.:
> 
> "In memory accounting, where accounting is stored as an array of percpu
> counters that are cheap to read, but not persistent. Updates to in
> memory accounting are propagated from the transaction commit path."
> 
> ... but NBD, and feel free to reword, drop and/or correct any of that
> text.

It's there because bch2_accounting_mem_read() takes a bpos when it
should be a disk_accounting_key. I'll fix that if I can...

> > + *    Cheap to read, but non persistent.
> > + *
> > + * To do a disk accounting update:
> > + * - initialize a disk_accounting_key, to specify which counter is being update
> > + * - initialize counter deltas, as an array of 1-3 s64s
> > + * - call bch2_disk_accounting_mod()
> > + *
> > + * This queues up the accounting update to be done at transaction commit time.
> > + * Underneath, it's a normal btree write buffer update.
> > + *
> > + * The transaction commit path is responsible for propagating updates to the in
> > + * memory counters, with bch2_accounting_mem_mod().
> > + *
> > + * The commit path also assigns every disk accounting update a unique version
> > + * number, based on the journal sequence number and offset within that journal
> > + * buffer; this is used by journal replay to determine which updates have been
> > + * done.
> > + *
> > + * The transaction commit path also ensures that replicas entry accounting
> > + * updates are properly marked in the superblock (so that we know whether we can
> > + * mount without data being unavailable); it will update the superblock if
> > + * bch2_accounting_mem_mod() tells it to.
> 
> I'm not really sure what this last paragraph is telling me, but granted
> I've not got that far into the code yet either.

yeah that's for a whole different subsystem that happens to be slaved to
the accounting - replicas.c, which also used to help out quite a bit
with the accounting but now it's pretty much just for managing the
superblock replicas section.

The superblock replicas section is just a list of entries, where each
entry is a list of devices - "there is replicated data present on this
set of devices". We also have full counters of how much data is present
replicated across each set of devices, so the superblock section is just
a truncated version of the accounting - "data exists on these devices",
instead of saying how much.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ