dupa – duplicate analyzer

I’ve finally managed to polish and publish my pet-project for cleaning my home directory from duplicates. It’s here: https://github.com/dopiera/dupa. It has actually proven useful to me multiple times, so I thought I’d share it broader.

I bet that I am not the only person who have repeatedly downloaded photos from my camera/phone without removing them from the device, so downloaded all of them again every time I wanted to download only the newest ones. I also bet I’m not the only person in the world who has copied the data between computers and ended up with 2 mostly similar data sets. This tool helped me get out of this situation.

It works by computing hashes from files and then uses some heuristics to find similar directories or directories which contain mostly duplicates of files scattered elsewhere (think of a big dump of photos, most of which are in other directories sorted by your trips).

The code is available on github: https://github.com/dopiera/dupa. Help yourselves. It actually has a man page, so you can read on how to use it and how it works there.

Published by

dopiera

Full time geek.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.