Announcing MassiveSort: an alternative merge / sort / de-dupe tool

Waffle · Aug 7, 2015

HashHunter said:
Mdxfind report:

Took 220 seconds to read hashes
Searching through 6,019,165 unique hashes from 1.txt
There were 52,075 duplicate hashes on input
Maximum hash chain depth is 142443
Minimum hash length is 32 characters

52075 ?????

i just removed duplicates using your tool
still 52075 duplicates ?

not sure bug is in mdxfind or in your tool :(

MDXfind looks only at the "hash" portion of the line. If you have junk (like salts, usernames, etc) after the ascii-hex, other programs won't see that as a duplicate line (since there is ancillary information on the line), but MDXfind will see that as a duplicate line - since they are the identical ascii-hash value.

Same for UPPER-CASE vs lower-case ascii-hex values - mdxfind treats them as the same value, other programs may not.

HashHunter · Aug 7, 2015

faredge said:
Probably mine!

Though you should try the --whitespace Trim option first. MassiveSort doesn't remove spaces at the end of a line unless you tell it to. Most other tools do. (Most people won't notice one way or the other).

Oh, and about that backup....

before using your tool i used :

grep -E -o -i '[0-9a-fA-F]{32}$' hash.txt > 32_hex.txt

Waffle said:
MDXfind looks only at the "hash" portion of the line. If you have junk (like salts, usernames, etc) after the ascii-hex, other programs won't see that as a duplicate line (since there is ancillary information on the line), but MDXfind will see that as a duplicate line - since they are the identical ascii-hash value.

Same for UPPER-CASE vs lower-case ascii-hex values - mdxfind treats them as the same value, other programs may not.

yup it is possible
thx :)
i will try
cat 32_hex.txt | tr [A-Z] [a-z] > new_32_hex.txt
&will report

& did u mean that other program like hashcat , john the ripper etc.. are not able to crack even a simple hash if i conver hash to uppercase ??
:\:

faredge · Aug 9, 2015

@HashHunter

MassiveSort is not case sensitive. UPPER and lower case hex values are definitely different for it. That may explain some of the issues. So converting to lower case (as you just did) is the way to go if you're trying to de-duplicate hash lists.

Same for salts, usernames, etc. MassiveSort is comparing the raw binary values of each line. It is (quite deliberately) very dumb.

@Waffle

I ran MdxFind over one of the 40 hex hashes.org left files, and it reported 4 duplicates. Which is strange, because my own sanity checks reported no duplicates.

HashHunter · Aug 10, 2015

input file 1

1
123
456
852
sfd
fh2gfhf2g
fh2g5h1fg

input file 2

46g56f4
fhkmfg151
1
123
456
852

Command
MassiveSort.exe merge -o test.txt -i "t/"

output:

1
123
456
46g56f4
6
852
d
fh2g5h1fg
fh2gfhf2g
fhkmfg151
g
sfd

expected output

1
123
456
852
sfd
fh2gfhf2g
fh2g5h1fg

46g56f4

fhkmfg151

please explain .. :)

HashHunter · Aug 10, 2015

MassiveSort.exe merge -o test.txt -i "t/" --save-duplicates

saved duplicates are

1
123
456
6
852

y this digit 6 coming again and again :/
please explain :)

faredge · Aug 10, 2015

HashHunter said:
y this digit 6 coming again and again :/
please explain :)

Sorry, a bug in the line processing code. Should be fixed in version 0.1.5.

Oh and thanks for the test case! Really easy to fix these kinds of things with that sort of detail.

Bitbucket

bitbucket.org

HashHunter · Aug 10, 2015

faredge said:
HashHunter said:

y this digit 6 coming again and again :/
please explain :)

Click to expand...

Sorry, a bug in the line processing code. Should be fixed in version 0.1.5.

Oh and thanks for the test case! Really easy to fix these kinds of things with that sort of detail.

Bitbucket

bitbucket.org

no problem bro
:)

and bro can you add a option for converting Uppercase to lowercase ??

faredge · Apr 3, 2025

So... after almost 10 years, I have an update

Release release-0.2.0 · ligos/MassiveSort

0.2.0 Update to use .NET 8.0. Tested on Windows and Debian platforms. Other Linux distributions supported by dotnet should also work. Increase --max-sort-size to support sorting over 2GB of dat...

github.com

Highlights:

Uses .NET 8
Better support for very large files (eg: a few hundred GBs)
Uses up to your physical RAM size to sort - up from 1GB

I'll probably upload a new release in ~2035, so don't wait too hard.

blandyuk · Apr 3, 2025

Great work. I should update mine at some point tbh. Good shout with .NET 8 as I've been porting some other stuff over, even to .NET 9 also.

Hashpup2222 · Apr 4, 2025

Hash-IT said:
If you are accepting feature requests, could we have a sort by length option ? Like short to long ?

Code:

for i in (seq 1 10); do
    pw-inspector -m $i -M $i
done

cat wordlist1-20.lst | ./script > wordlist1-10.lst

faredge · Apr 14, 2025

I can always accept feature requests. No guarantees about when they might ship (and given my multi-year release cadence, "soon" is not likely)

Filter output · Issue #1 · ligos/MassiveSort

Ability to filter the output by various rules. Probably regex give best expressiveness. Need to confirm performance isn't hurt (too much).

github.com

Announcing MassiveSort: an alternative merge / sort / de-dupe tool

Waffle

Active member

HashHunter

Active member

faredge

Active member

HashHunter

Active member

HashHunter

Active member

faredge

Active member

Bitbucket

HashHunter

Active member

Bitbucket

faredge

Active member

Release release-0.2.0 · ligos/MassiveSort

blandyuk

Active member

Hashpup2222

Active member

faredge

Active member

Filter output · Issue #1 · ligos/MassiveSort