Creating massive Wordlist - Any advice?

r00x

Member
Joined
Nov 19, 2020
Messages
6
Reaction score
4
Credits
96
I decided to download alot of Wordlists and combine them to one.

Before I start, i like to ask for your advice.

My steps would be:

1. Combine lists to one file
Code:
cat * > out_file.txt

2. Remove whitespaces (this part first, because otherwise grep counts wrong)
Code:
LC_ALL=C tr -d "\11" < in_file.txt > out_file.txt

3. Keep only wanted characters (remove unwanted chars)
Code:
LC_ALL=C tr -cd "\12\15\40-\176\200\247\337\304\344\326\366\334\374" < in_file > out_file.txt

4. Remove words under 8 and over 30 chars, also removes blank lines (everyone with a password over 30 chars deserves it to be safe :) )
Code:
LC_ALL=C grep -x '.\{8,30\}' in_file.txt > out_file.txt

5. Remove duplicates
Code:
LC_ALL=C sort --parallel=10 -S 90% -T /path with enougth space/ -u in_file.txt > out_file.txt

Am I missing something?

Wordlists I like to use:

  • crackstation
    wpa-sec.stanev
    weakpass
    dicass
    Pwdb_Public
    cyclone_hk
    THP_password
    skullsec
    kaonashi

I know masks and rules are better, this is just for the last way out.
 

Cyclone

Active member
Trusted
Contributor
Retired Moderator
Joined
Dec 30, 2019
Messages
3,175
Reaction score
900
Credits
5,898
Ask 10 people how to compile a wordlist and you'll get a dozen different answers. :)

I recommend experimenting with this until you refine what method works best for you and your available resources.

My 2 cents:
~I've cracked a lot of plaintext with whitespaces in them.
~8-30 char len is ok for wpa2, but there's a lot of < 8 char plaintext out there depending on what password policies were enforced (especially older db's).
~I prefer hashcat/utils/len over grep for scrubbing char length. @Waffle also has a tool 'splitlen' for splitting wordlist by length.
~Some password policies allow special char, so go easy on what you scrub.
~Things like removing whitespace, char length 8-30, scrubbing, etc, are all debatable and can vary depending on the intended application.
~I sort my wordlists by probability, not alphabetically. This makes a big difference in cracking efficiency, especially on slow algos like bcrypt which could run for days on a multi-gigabyte wordlist. Cyclone_hk is sorted by probability, so try testing it against other wordlists that are sorted alphabetically and see which one cracks faster.
~To compare which method works best, try creating several wordlists using differing methods (ex: basewords.txt, char8-30.txt, scrubbed.txt, wordlist_all.txt, sorted_probability.txt, sorted_alphabetically.txt, etc) and then test them against your hash list to see which method works best. Then test your wordlists with different rules to see which wordlist+rule combo works best for your hash list.

I recommend a very useful tool 'rling' by @Waffle for sorting, deduping, counting, etc. Using rling will greatly speed up your wordlist processing.

Here's a list of some of my fav tools:
 

r00x

Member
Joined
Nov 19, 2020
Messages
6
Reaction score
4
Credits
96
Thanks for your reply.

Cyclone said:
Ask 10 people how to compile a wordlist and you'll get a dozen different answers. :)
That's what i've found. :) so true.

I use other specific unsorted wordlists with rules depending on the case and I know sort by probability is more efficient.
But I didn't see any tool which is able to handle that amount of data and remove duplicates without any problems.

I already found rling but have not tested it yet. Maybe i give it a try.

I only remove non printable characters. My main targets are german passwords. And as you said, it depends on the intended application.
 

Mockedarche

Active member
Contributor
Joined
Dec 30, 2019
Messages
799
Reaction score
47
Credits
267
The main thing i'd suggest is having different versions. Having a master wordlist is great but man does it suck how long it takes to run. Also i'd argue a smaller wordlist with a better rule list is better. So be very picky about what wordlists you add I would personally find as many databases/leaks as I could and run a competition to speak (with a predetermined % to crack to be allowed in).
 

pasnger57

Active member
Contributor
Joined
Dec 30, 2019
Messages
3,531
Reaction score
1,127
Credits
7,291
i woud remove Numeric lines too as that can be Brute forced up to 11 or 12 with resealable Rigs Time (working together for 12)
as 8-10 can be check on just about any rig
 

r00x

Member
Joined
Nov 19, 2020
Messages
6
Reaction score
4
Credits
96
i woud remove Numeric lines too as that can be Brute forced up to 11 or 12 with resealable Rigs Time (working together for 12)
as 8-10 can be check on just about any rig
I thought about it, but I don't know how without messing up. Idea is great. Someone have a solution?

The main thing i'd suggest is having different versions. Having a master wordlist is great but man does it suck how long it takes to run. Also i'd argue a smaller wordlist with a better rule list is better. So be very picky about what wordlists you add I would personally find as many databases/leaks as I could and run a competition to speak (with a predetermined % to crack to be allowed in).
As I said. This list shoud only be the last way out when everything else failed. Already had good results with small wordlists and rules.
 

Akasha

Active member
Joined
Aug 22, 2020
Messages
432
Reaction score
270
Credits
3,737
Why dont you put all Wordlists into 1 Folder and drag the folder into hashcat , same effect, iam right?
 

BTC_12345

Active member
Contributor
Joined
Dec 30, 2019
Messages
1,500
Reaction score
2,503
Credits
5,206
Hi there, Can you tell me please if you know how to customise worldlist in ULM to have words from 8 char to lets say 16 or up. I can see this option. Thanks Kev
Go to 'Downsize' (at the top) then select 'Save items within the range' (on right side)

When you process the list it will ask for lower and upper bound of items to save
 

kevtheskin

Active member
Contributor
Joined
Dec 30, 2019
Messages
2,893
Reaction score
701
Credits
2,387
Go to 'Downsize' (at the top) then select 'Save items within the range' (on right side)

When you process the list it will ask for lower and upper bound of items to save
Hello again, Worked a treat . Is there a way except for using Blandys tool to remove duplicate lines . Cheers Kev
 

carnivore1

Active member
Contributor
Joined
Dec 30, 2019
Messages
2,899
Reaction score
5,591
Credits
12,705
Hello again, Worked a treat . Is there a way except for using Blandys tool to remove duplicate lines . Cheers Kev
know when you are building a list vertically, there is a box you check at bottom to sort and remove dupes.......
 
Top