Announcing MassiveSort: an alternative merge / sort / de-dupe tool

Waffle

Active member
Contributor
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
1,708
Reaction score
71
Credits
816
HashHunter said:
Mdxfind report:

Took 220 seconds to read hashes
Searching through 6,019,165 unique hashes from 1.txt
There were 52,075 duplicate hashes on input
Maximum hash chain depth is 142443
Minimum hash length is 32 characters


52075 ?????

i just removed duplicates using your tool
still 52075 duplicates ?



not sure bug is in mdxfind or in your tool :(


MDXfind looks only at the "hash" portion of the line. If you have junk (like salts, usernames, etc) after the ascii-hex, other programs won't see that as a duplicate line (since there is ancillary information on the line), but MDXfind will see that as a duplicate line - since they are the identical ascii-hash value.

Same for UPPER-CASE vs lower-case ascii-hex values - mdxfind treats them as the same value, other programs may not.
 

HashHunter

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
370
Reaction score
0
Credits
0
faredge said:
Probably mine!

Though you should try the --whitespace Trim option first. MassiveSort doesn't remove spaces at the end of a line unless you tell it to. Most other tools do. (Most people won't notice one way or the other).

Oh, and about that backup....

before using your tool i used :

grep -E -o -i '[0-9a-fA-F]{32}$' hash.txt > 32_hex.txt


Waffle said:
MDXfind looks only at the "hash" portion of the line. If you have junk (like salts, usernames, etc) after the ascii-hex, other programs won't see that as a duplicate line (since there is ancillary information on the line), but MDXfind will see that as a duplicate line - since they are the identical ascii-hash value.

Same for UPPER-CASE vs lower-case ascii-hex values - mdxfind treats them as the same value, other programs may not.


yup it is possible
thx :)
i will try
cat 32_hex.txt | tr [A-Z] [a-z] > new_32_hex.txt
&will report

& did u mean that other program like hashcat , john the ripper etc.. are not able to crack even a simple hash if i conver hash to uppercase ??
:\:
 

faredge

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
168
Reaction score
6
Credits
14
@HashHunter

MassiveSort is not case sensitive. UPPER and lower case hex values are definitely different for it. That may explain some of the issues. So converting to lower case (as you just did) is the way to go if you're trying to de-duplicate hash lists.

Same for salts, usernames, etc. MassiveSort is comparing the raw binary values of each line. It is (quite deliberately) very dumb.

@Waffle

I ran MdxFind over one of the 40 hex hashes.org left files, and it reported 4 duplicates. Which is strange, because my own sanity checks reported no duplicates.
 

HashHunter

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
370
Reaction score
0
Credits
0
input file 1


1
123
456
852
sfd
fh2gfhf2g
fh2g5h1fg



input file 2


46g56f4
fhkmfg151
1
123
456
852



Command
MassiveSort.exe merge -o test.txt -i "t/"

output:



1
123
456
46g56f4
6
852
d
fh2g5h1fg
fh2gfhf2g
fhkmfg151
g
sfd


expected output



1
123
456
852
sfd
fh2gfhf2g
fh2g5h1fg

46g56f4

fhkmfg151

please explain .. :)
 

HashHunter

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
370
Reaction score
0
Credits
0
MassiveSort.exe merge -o test.txt -i "t/" --save-duplicates

saved duplicates are


1
123
456
6
852


y this digit 6 coming again and again :/
please explain :)
 

faredge

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
168
Reaction score
6
Credits
14
HashHunter said:
y this digit 6 coming again and again :/
please explain :)

Sorry, a bug in the line processing code. Should be fixed in version 0.1.5.

Oh and thanks for the test case! Really easy to fix these kinds of things with that sort of detail.

 

HashHunter

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
370
Reaction score
0
Credits
0
faredge said:
HashHunter said:
y this digit 6 coming again and again :/
please explain :)

Sorry, a bug in the line processing code. Should be fixed in version 0.1.5.

Oh and thanks for the test case! Really easy to fix these kinds of things with that sort of detail.

no problem bro
:)

and bro can you add a option for converting Uppercase to lowercase ??
 

faredge

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
168
Reaction score
6
Credits
14
So... after almost 10 years, I have an update 🤣


Highlights:
  • Uses .NET 8
  • Better support for very large files (eg: a few hundred GBs)
  • Uses up to your physical RAM size to sort - up from 1GB
I'll probably upload a new release in ~2035, so don't wait too hard.
 

blandyuk

Active member
Trusted
Contributor
VIP Member
Feedback: 0 / 0 / 0
Joined
Jul 6, 2011
Messages
18,600
Reaction score
428
Credits
11,317
Great work. I should update mine at some point tbh. Good shout with .NET 8 as I've been porting some other stuff over, even to .NET 9 also.
 

Hashpup2222

Active member
Feedback: 0 / 0 / 0
Joined
Feb 24, 2022
Messages
118
Reaction score
15
Credits
909
If you are accepting feature requests, could we have a sort by length option ? Like short to long ?
Code:
for i in (seq 1 10); do
    pw-inspector -m $i -M $i
done

cat wordlist1-20.lst | ./script > wordlist1-10.lst
 

faredge

Active member
Feedback: 0 / 0 / 0
Joined
Dec 30, 2019
Messages
168
Reaction score
6
Credits
14
I can always accept feature requests. No guarantees about when they might ship (and given my multi-year release cadence, "soon" is not likely)

 
Top