full-disclosure - Alphanumeric Shellcode Encoding and Detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Date: Mon, 4 Aug 2008 13:02:26 +0300
From: "Avraham Moshe Schneider" <Avraham.Schneider@...ddin.com>
To: <full-disclosure@...ts.grok.org.uk>
Subject: Alphanumeric Shellcode Encoding and Detection

Hello Full-Disclosure,

I'd like to share with you some of the work I've done researching alphanumeric shellcode detection.
Although alphanumeric shellcode encoding isn't new, I believe this post will present you with some value.

There are two main reasons why someone might use an alphanumeric shellcode encoder -
  1. To encode bytes not allowed by the vulnerable application
  2. To evade detection by an Intrusion Detection system.
Even though the first case is arguably the more common occurrence, and even though there might be more room for improvement there, I have focused my work on the second.

You will find attached an alphanumeric shellcode encoder that makes the decoder routine detection more difficult.
It has some cool new features, but again, the purpose of this post is not that much about the new decoder, but rather on the subject of using alphanumeric shellcodes to evade detection, and investigating generic detection possibilities.

I'm aware of several methods for detecting shellcode executing attacks:
 - Detect attacks by understanding the vulnerable protocol fields/lengths and checking the supplied values This is straight forward but has little (or nothing) to do with alphanumeric shellcode encoding.
The IDS is updated with new signatures for the new threats. It would be hard for an attacker to evade detection if the IDS has properly added a signature against the vulnerability itself, and the IDS is not vulnerable to evasion attacks at a lower protocol layer. The downside is, you can only detect what you know, so the IDS has to be constantly updated with signatures for the new threats.

 - Detect attacks by searching for known shellcode patterns This is also straight forward. The IDS is updated with signatures for all known shellcode patterns.
The advantage of doing this is that due to the skills required and the time consuming process involved, the attacker is less likely to implement his own custom shellcode. This allows the IDS to detect unknown attacks without updating it with new signatures.
Of course it does not remove the IDS vendor from the responsibility of adding new signatures to detect new threats, but it adds another line of defense.
I listed above a couple of cases for which attackers may need to encode their shellcode using an alphanumeric encoder. Evading this detection is one of them, and to counter that, we have yet another line of defense ā€“

 - Detect attacks by searching for known encoded shellcode's decoder routine patterns Since the encoded shellcode must include a decoder routine that will be used to decode the encoded data (and then execute it), the IDS is updated with signatures for the known decoder routines. This adds an important value to the previous detection method ā€“ But it still leaves a hole - encoders that generate difficult to detect decoder routines (like the tool I have attached).
Emulation, or profiling may be used to detect these generically, but since the IDS does not know where to start scanning, and since the IDS cannot perform these tests on all bytes of all streams (costly in terms of CPU), it may want to -

 - Detect attacks by searching for known shellcode's encoded data This requires updating the IDS with smart signatures that can decode alphanumeric streams and compare decoded data against known shellcode patterns. It requires updating the IDS with new decoding algorithms.
Assuming the use of alphanumeric encoders/decoders and their limitation of using a limited instruction set, there is a difficulty of coming up with new encoding schemes and this detection method really adds a great value to the IDS.

The previous shellcode encoder Matt Conover, Soren Macbeth and I wrote back in 2004 used a fixed key throughout the shellcode.
It may seem that detecting the encoded data would be easy with a fixed key and a known shellcode pattern, but since you have 62 different keys to choose from (assuming alphanumeric char-set composes 0-9A-Za-z), you would need to check each known pattern against 62 different keys.

In that sense, the previous work would be of greater use for an attacker to evade this detection method, but on the other hand, the previous mentioned detection method would detect the decoder routine, as it is not that stealthy.

The presented decoder is IMO harder to detect yet detection of the encoded data is much simpler.
The attacker would be faced with this tradeoff, and an IDS that implements all of the above techniques, would provide IMO good defense against known and unknown threats involving shellcode execution.

There might be an encoding/decoding technique that does not exhibit this tradeoff, for example, appending a random fixed number of bytes between encoded data bytes that the decoder could skip.

I haven't yet tried implementing this due to time constraints, but I believe it is possible.

I have attached the new encoder/decoder code, as well as a function that may be used to scan alphanumeric strings.

This function should not be used in an IDS product for the following reasons:
1. It uses a naive search algorithm.
2. It scans a buffer and not a stream.
3. It rescans the entire buffer with each decoder.

IMO, implementing a multi-pattern type Boyer-Moore algorithm, such as the one presented by Sun Wu and Udi Manber, is a very good IPS solution for the following reasons:
1. The number of skip bytes is double the pattern length.
2. Since we are looking for a known part of a known shellcode, we know the shellcode length - hence - 3. On a non-alphanumeric byte we can skip double the length of the shellcode minus the position of the pattern within the shellcode.
(basically we can add the knowledge of the original shellcode length to the skip table)

I have implemented a single pattern Boyer-Moore algorithm implementation that handles a file stream but as it does not satisfy the requirements listed above, and in order to reduce the size of the code I have not included it here. If you are interested, contact me privately.

Also, please feel free to e-mail me with any questions or comments.

I'd like to thank Matt Conover for reviewing my work and giving me good feedback.

Regards,
Avri
**********************************************************************************************

The contents of this email and any attachments are confidential.
It is intended for the named recipient(s) only.
If you have received this email in error please notify the system manager or  the 
sender immediately and do not disclose the contents to anyone or make copies.
** eSafe scanned this email for viruses, vandals and malicious content **

**********************************************************************************************

View attachment "alnum_decoder_encoder.c" of type "text/plain" (101570 bytes)

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/