Hi folks!

Why I have written modsplit?

Sometimes it is really anoying that you cannot recover old data without
errors or failures. Certainly there are checksums, recovery code and all
these stuff. But what if a backup medium gets lost or destroyed?
The solution should be a distributable backup with (nearly) arbitrary
redundancy and without too much overhead!

I gave this idea some thoughts and remembered an old theorem, which is well
known for centuries and can be applied to accomplish this goal:
The Chinese Remainder Theorem states, that a number can be identified
by its remainders, if the moduli have no common divisors and their product is
greater than the given number.

This theorem is sometimes used for proving other theorems in Complexity
Theory. You can even use it to speed up multiplications.

So, instead of searching the web for a solution of my problem, I felt urged
to write such an easy thing myself. The program takes less than 600 lines of
code.

To compile modsplit under Linux, simply do
gcc -o modsplit -O2 -Wall modsplit.c

The usage is simple:

modsplit --split <sourcefile> <dest1> <dest2> <dest3> <dest4> <dest5> <dest6>
will create six destination files out of your source. If you place them on
6 different mediums, possibly using network devices, you will be able
to restore your source even if two of the destinations are not available!
modsplit --restore <file_to_restore> <file1> <file2> <file3> <file4>
will do the job...

You can even use pipes:
tar cvf - <src> | modsplit --split - <dest1> .. <dest k>
modsplit --restore - <file1> <file2> <file3> <file4> | tar tvf -

Warning: If using stdin, we have no chance to get the filesize on opening.
On restore, the resulting file may be up to 3 ZERO-bytes longer than expected.


Some ideas...

It gets really interesting, if you think about using the internet.
Consider that you and five of your friends have decided to backup your
harddrives. Each of you need less than 65 GB of disk space to save 40 GB.
And even if two of the systems are crashing down, everybody can restore
his/her 40 GB.

Writing a network device driver using this technique, one could even create
an redundant and fast virtual RAID-System. Think of a very large distibuted
archive of data, which is mostly accessed for reading. And many people are
using this archive over a network. They can share their data. And even if
they all have a low upload-rate and a high download-rate (such as DSL), they
can access their data with nearly full download-rate as long as they are
not accessing it concurrently.

Cheers,
Thorsten
reinecke@thorstenreinecke.de