lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4882517F.6080407@tuffmail.co.uk>
Date:	Sat, 19 Jul 2008 21:41:35 +0100
From:	Alan Jenkins <alan-jenkins@...fmail.co.uk>
To:	Alexey Starikovskiy <astarikovskiy@...e.de>
CC:	Henrique de Moraes Holschuh <hmh@....eng.br>,
	linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race
 condition)

Alexey Starikovskiy wrote:
> Alan,

Thanks for your robust response.  Sorry for not discussing this more in
the first place.  I wanted to write the code that fixes my system first,
and post it as soon as possible.  I think I missed a flag to say "this
post is not an immediate request for submission".

> I don't think this is very good idea -- now every interrupt will add a
> new entry to workqueue,
> and given the rate of this interrupt, it could be become quite long.
>

I confess I gave up too quickly on the synchronisation issues and
resorted to brute force.  Patch #1 is preventing a narrow race:

- acpi_ec_write_cmd() (QUERY)
- EC writes 0 (no event) to input buffer
- new event arrives, triggering a GPE interrupt which is ignored because
QUERY_PENDING is set.
- QUERY_PENDING is cleared (but too late).

Now we ignored an event, which will be delayed until the next GPE interrupt.

The original reason for patch #1 was to stop patch #2 driving me
insane.  If you apply the second patch on it's own, you break
EC_FLAGS_QUERY_PENDING.  It gets cleared as soon as the _first_ query
command is submitted.  It works, but it means the flag no longer has a
clearly defined meaning.  I tried to fix that by only clearing it once
the loop has finished - and then realised I was widening a pre-existing
race condition.

As you say, I took the easy way out and pushed it all into the
workqueue.  So the workqueue ends up as a buffer of GPE occurrences.  I
did look at reducing the memory usage while avoiding race conditions,
but I couldn't find a reasonable solution.  But I looked at it again now
and I have a better solution:

...
    /* moved here from acpi_ec_transaction_unlocked() */
    clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);

    while (acpi_ec_query(ec, &value) == 0)
        acpi_ec_gpe_run_handler(ec, value);
}

The PENDING flag then reflects whether or not there's an item on the
workqueue.  Does that make sense?

> Your second patch is redundant if you add queue entry for each
> interrupt, and as it does not
> require any investments into memory, I like it more.
>
It's not quite redundant with the first patch.  We still have GPE
polling mode - patch #1 doesn't affect that.  In polling mode, it's
essential to query all the pending events - otherwise, if they arrive
more frequently than the polling interval then you will inevitably drop
events.

Patch #2 is also required to fix my buggy hardware.  My laptop's EC
buffers multiple events, but clears SCI_EVT after every query.  This
caused problems in polling mode; with multiple events between polling
intervals only one gets queried - and after a while the buffer overflows
and it breaks completely.
> Also, it is very cool to rip of things you don't care for, but that
> essentially reverts a fix done for 9998,
> so at least, you should ask these people if you broke their setups.
>
I assume you're referring to the "would benefit from wider testing"
patch #3. Thanks for identifying the bugzilla entry - I had difficulty
separating the different entries on GPEs.  I'm optimistic that we can
fix all these crazy buffering EC's without having to disable GPE interrupts.

The reason I like my proposed fix is that it makes the code simple
enough that I can understand it, without making any assumptions about
how many interrupts arrive per GPE.  The EC can be broken in lots of
ways, so long as:

1. We receive interrupts when one or more GPE's are pending.
2. We don't get a constant interrupt storm.

I don't think I missed anything.  Is there anything else I should check
before I try to get testing?

Thanks
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ