So, looks like the code is pretty much done, now. It's a lot faster that rli/rli2, and you can grab it from
https://github.com/Cynosureprime/rling. There are binaries for both Windows and linux (x86-64, 64 bit), plus source code. There are additional utilities that really help in processing wordlist, on a massive scale.
To save everyone some time - if you don't create wordlists, but simply consume them, you probably won't need these tools. If you process any amount of data, generate your own wordlists, or do more extensive work with large lists - have a look. I'll give some usage examples, so you'll have an idea of what usage is intended, and how best to bend these tools to your will.
In basic use, you can use rling (ARE-ling - yes, it's supposed to be RLI next gen, but ARE-ling sounds better) to simply de-duplicate your lists, while retaining their existing order.
rling infile outfile
will read in all of infile into memory, de-duplicate it, and write the output to outfile. Because it reads into memory, it's very fast, with typical speeds of millions of lines per second. On one of my test systems, it takes about 25 seconds to read, de-duplicate, and write a 10 gigabyte 1,000,000,000 line file. In general, though, you are going to need about 3x RAM, compared to the file size. Because of this, rling has several modes.
rling -b infile outfile
rling -f infile outfile
rling -2 infile outfile /dev/null
By default, rling uses a hash table to speed lookups. This takes extra memory, so the -b option forgoes the hash for a binary search, using about 2/3rds of the memory as the default. Still can be considerable if you have large files, so the -f option create a disk database, allowing you to handle unlimited size files (well, limited to your disk space) and not use much RAM at all. -2 uses even less RAM, but the input files _must_ be in sorted order.
Most people will be able to get by with just the defaults. But rling can do a lot more than just "dedupe wordlists". Let's say you get a new wordlist, and want to run it against some hashes. But you don't want to try words you've already tried.
rling new-words.txt outfile /dict/old/* /some/other/files*
rling will rip through (at millions of lines per second) your old wordlists, comparing them against the new-words.txt files. Any words that exist in your existing files in /dict/old/* and /some/other/files* will be removed from the new-words.txt file, and the result written to "outfile". Handy.
rling can also be used in a pipleline, which is great if you have a bunch of wordlists, and want to put them together in specific ways. For example, getting ready for a contest?
zcat /archive/names/firstname.[a-n]* | rling stdin stdout | gzip -9 >names.a-n.gz
grep -h ^[a-fA-F] /archive/names/lastname* | rling stdint stdout | gzip -9 >lastname.a-f.gz
This will read your existing files (or grep a bunch of files, in the second case) de-dupe them, and pipe them to gzip to be compressed.
Let's say you have a list of candidate words you want to extract from a bunch of files. rling can help with that too:
cat /archive/names/* | gzip -c stdin namesfound.txt /tmp/list-of-names
Here we look through all the files in the /archive/names/ directory, and show the "common" (-c) lines between them and the "/tmp/list-of-names" files. Kinda like the inverse of removing them.
Want more? How about this:
getpass /myfounds/*.MD5x01 | rling -q w stdin myMD5.pass
This will extract passwords from your solved hashes, pipe them to rling, and use the -q option to sort the words, arrange them _by usage frequency_, and write to myMD5.pass
You can do this with your hashcat .pot files too - getpass is reasonably smart, and can extract just the passwords from these files too.
getpass *.pot | rling -q a stdin allpot.pass
In this case it will display not only the solved passwords, arranged by frequency, but also counts, word length, and a histogram of the selected passwords.
There's so much more. Check out rehex (which will fix your existing passwords if they are un $HEX[]ed), and all of the other options.
You can even use it to sort your wordlists (a lot faster than /bin/sort :-)
rling -bs old-hashes.pass sorted-hashes.pass
There's lots more to try. Enjoy!