So, what are bad passwords, who put them there, and how can I remove them?
(this is gonna be a long one, bear with me)
The topic has been brought up recently, not just by
@WreckTangle, and I thought I would share my thoughts and how I've been dealing with them over the years. This is a complex topic, and is not covered in the depth I think it should have been. I've posted on Hashkiller previously about it, and it has been a topic of disagreement even in cracking contests.
In my view, there are four classes of "bad" passwords. There is a fifth category that I am specifically not addressing here, which is "bad parsing". Troy Hunt's lists are particular examples of this.
1. Passwords which are unlikely to be re-used, because they are random or machine-generated. It may have been an achievement to crack a 30 character password
+0^AtOUrjubwD/Hu$ojmwuUMk'NoK@
, but the likelihood of it being re-used is quite small. Does it belong in a general purpose list? Well, if you are cracking a list for a contest, and it's one of the critical high-value entries, then you better have it in your list. But otherwise, no. It does not belong in a generic list. A subset of these are "known" values, such as
BAF3A9C3DBFA8454937DB77F2B8852B1
.
2. Passwords which contain salts and/or peppers. A good example of this is
the river runs through itunbroidery
,
xCg532%@%gdvf^5DGaa6&*rFTfg^FD4$OIFThrR_gh(ugf*/es@live[redacted]
, or
somepasswordBSF75663
. Each of these used a static salt, prepended to the actual password. Some have even been more complex, embedding separators, and the like.
3. Passwords which contain usernames, or userids. A good sample of these are
1351-saralucia
,
Admin1234
, and
pqonieneo4N36tbfzC
. These, too, can be quite complex, involving padding, embedded NUL characters, and so on.
4. Passwords which, while hashing to the correct value, are not in fact the correct password for the original hash. These break into two sub-categories: Hash algorithm collisions; incorrect hash types. In the realm of password collisions, some hashing algorithms (well, in fact, all hashing algorithms, but some are more likely to have this problem) generate the same hash for multiple inputs. Incorrect hash types are a more broad category, but include has solutions like
5f4dcc3b5aa765d61d8327deb882cf99G%Y
,
55ae17202f23e50f30883ee4bb581001
and
a60259fe312f320dfcf2c003041b9218$yHlU/^\.R3',Dvt7,%VY~(P~s$S.{
.
Let's talk about each of these, why they are there, how they make their way into lists, and how to get rid of them (if you think it is appropriate).
Random password
Random passwords, or machine generated passwords, can be very effective. Properly random passwords are very difficult to crack, and if not re-used, are a good security measure (though typically unworkable for humans, since they inevitably put them on sticky notes on the side of their monitors). A user recently asked me "how do I get rid of the random passwords in my lists" (and was more specific, indicating that he knew there were fixed-length random strings). But, regretfully, there is no automated way that I'm aware of that can analyze a password, and declare it random. "sotesifaa" may look random, but it's actually the first letters of the first paragraph of page 124 of a book on my shelf ("the 8087 primer" if anyone is interested :-). Many of the lists do have lots of 5 character random passwords in them, as well as 10 character. Turns out that many of these were added as a result of a class 4 (see above) cracking attempt. Should they be included? This is a deeper issue.
Many hash lists found contain *lots* of seeded, or random hashes. Sometimes, these are used to identify where the lists leaked from. Sometimes, they are added to add bulk to lists, to give them the appearance of being larger than they are, or to hide the origin of the list. Sometimes, users are added *by the site owners* themselves, to artificially inflate the user count. When these are cracked (often by long term brute force attacks), these random passwords get lumped into the "found" pile, and end up on password lists.
A "special case" of these are the "known value" hashes.
BAF3A9C3DBFA8454937DB77F2B8852B1
for example, is the root key hash for a version of the Nokia phones - known to many that were doing phone hacking back in the day. There are many other "known value" passwords that are used, that appear to be other things.
Should they be there? Yes, they should remain on your general purpose lists. They are not useful for working on complex, or expensive hashes, which is why you need to generate *your own* "top 10,000 password" list. These are passwords which have been seen *most frequency" on the types of lists you have to work with. Your top 10k list is not going to be the same as mine. If you do pentesting for "Joe's Bakery and Shell Reloading", the passwords used are going to be different than pentesting for a "Dragon Farm" website.
Extra salt
Passwords which contain embedded salts and/or peppers should be stripped. These are, actually, a subset of class 4 - using the wrong hash, and getting the right value. Because internally, a given application may be adding a static salt, that salt may not be available in the hash lists, or (in some case) not even known. It is through pure luck that some of these salts and peppers have been found (but, far more often, by careful examination of the source code for a given application). Now, because the salts/peppers are not used in the hash lists, some people think that the only "correct" solution is to include the salt/pepper in the hash (because, in many cases, the algorithm cannot be solved with Hashcat without doing this, because Hashcat does not support that particular hash-with-salt-and-pepper method). This is wrong, and no matter if you have to post-process the results, or use a different program (like mdxfind), only the actual password should be stored.
The problem arises when you are called to crack an unknown list. If that list contains salted and or peppered variants such as the examples given, then you won't find them. This increases the time-to-ID new hashes, unless you also pre-process your lists with known founds. As usual, this is a trade off, and one that you will have to determine for yourself.
How to remove them? This is a two-phase problem, involving both your "solved" passwords, and also the original file. I think the easiest way to do this is to work through your solved passwords first, then clean up the original list that contained them. When I'm doing this, I take all of the "suspect" solutions, and place them in a working directory. I then spend time (sometimes minutes, often hours, occasionally weeks) working through them, and solving them with the correct algorithms.
I then take all of the "bad" password solutions, and find the common elements. For example, let's look at "somepassword" as a salt:
Code:
grep ^somepassword infile >work
(edit work to confirm it contains only the passwords I want to get rid of)
sed 's/somepassword//' work >>infile
rling -bs infile infile.new work
cat work >>junk
mv infile.new infile
"junk" passwords, a term I followed from the hashes.org "junk" password lists, can then be merged into the junk lists. These are *still* useful for the future, but not part of the "everyday" passwords that I will use.
Users and padding
When usernames are included (or usernames and padding), this is handled much in the same way as salts. What becomes different is how you re-solve the hashes (again, copy them to a work directory, work through to identify the proper algorithm, then apply to the original hash lists). Once you have identified this as a username-included solution, then you may be able to find many more passwords in the original password list that have common username included. When the usernames approach single-user, this is another good reason to not have them in the original list. Admin is an example of a username which *could* appear as part of a password, but if you see GenMahem47-monkey, you might reasonably assume that the username is GenMahem47, and the password is monkey. Again, this is one of the cases where judgement, and the use of a "junk" password list, may be very useful.
Password removal techniques follow the general case of the salted passwords, shown above, but significant work should go into the vetting of the removal lists. Pay attention to patterns that reveal themselves (are usernames really user numbers? Are they sequential? Are you seeing the general case of [0-9][0-9][0-9]-password, for example, throughout your password lists? This can help you in analysis.
Padding is another issue. Sometimes, you can see "Joeblogs password", or null-padded passwords. These may show that there is a fixed-length-maximum username field, followed by the password. Pay attention to details - they matter.
Invalid hashes
This was one of the core reasons that I made mdxfind. Hashes within hashes. When I started to see "passwords" of the form
5f4dcc3b5aa765d61d8327deb882cf99G%j
, I knew I had to do something. But they didn't stop there. I saw
b78b3c0674aed5a05ecf4931d20d1c3d
, and
f881b9b4af89da8a203598b410e1d846
and more.
Today, i regularly look through my MD5x01, SHA1x01, and many other "solved" hashes for strings of hex digits. Virtually all of them are the results of incorrect hashes. If you see in your "solved" list a hash like
1ed735adc33427448d0c7264e479be40:5f4dcc3b5aa765d61d8327deb882cf99G%j
, you should know that, while this is a "valid" solution (meaning that if you take the MD5 of the part on the right of the :, you will get the result on the left), it is in no way a "password" solution.
These require work. There is no shortcut.
Much of my time is spent cleaning up lists like this. In general, I will look for *any* solution with a length over 16 characters as being suspect (since, according to the histograms of my "true solved" passwords, lengths over 16 characters represent less than 1% of the solutions). I get *very* interested in lengths >= 32 characters. So, how do I go about it?
Code:
awk 'length > 64 {print;}' *.MD5x01 | grep -v HEX >work/level1.txt
cd work
cut -c 34-65 level1.txt >level2.txt
cut -c 66- level1.txt >level3.txt
mdxfind -f level2.txt /dictall/* | mdsplit level2.txt
mdxfind -f leve3.txt /dictall/* | mdsplit level3.txt
cut -c 34- level2.MD5x01 | mdxfind -f level1.txt -h ^md5salt$ -s level3.txt stdin | mdsplit level1.txt
...
Then I repeat, looking for other hashed and salted variants. Then, instead of only looking at 32 characters in the "level2" hashes, I look at 40, and 64 characters. Then I try different combinations. Rinse and repeat, as required.
Once I find sets of invalid hashes in the password files, I again extract them to the "junk" list, and remove them from the standard working dictionaries that I uses. The work on this is ongoing, and is not a short term issue.
How to prevent this?
Know your passwords. Look at the solutions. The "most important" think you can do is not to just blindly look for the "latest great password list", but to understand how people think, how passwords are generated, and what kinds of things those passwords are protecting. Build your *own* lists from passwords you have solved, then add rules as
@tychotithonus suggests. Start with the most common passwords - which is particularly true on salted or high-iteration hash algorithms. Use brute force as a last resort, not as crutch.
I get more done with 1M passwords and rules than many people with billions of "Real Passwords Solutions - only $19.95".
But *most* important - don't give up. Try new things. Share your knowledge,