Damaged files coming from a storage failure can present a very annoying problem:
Data is no longer laid out in order. In order words, the blocks are a bit like a deck of cards after shuffling.
To make things worse, the blocks do not correspond to frames. A block can contain several frames, or a frame can be split over several blocks. Blocks have a fixed size (in general same as the underlying filesystem)
De-shuffling is the technique that finds the right order of the blocks.
De-shuffling is just the first step of the repair. It is followed by a reindexing or a container structure correction in most cases. It is usually expensive, as it is an investigative repair.
How it works:
De-shuffling is very hard to program and depends on the audio and video formats present in the file.
Some video formats have a regular, predictable structure: DV, DVCPro HD, DVCPro50
In this case, we can detect where the discontinuities occur and determine the position inside frame of start and end of the block.
Once you have a soup of blocks, you try to match start and end positions using heuristic algorithms.
Surprisingly, it gives pretty good results.
Variants:
Brute force
If the block size and layout is known (usually it follows a regular pattern), then the only problem is to put them in order.
For video formats that don't follow a predictable structure, the puzzle can still be solved by brute force. With JPEG media, it has been possible to write an algorithm to assemble several blocks in all possible combinations and determine which one contain a valid frame. Slowly, the program manages to progress through the blocks, rebuilding the frames one by one.
Brute force de-shuffling is extremely difficult.
Mixed content
Repair from storage failure becomes almost impossible when the recovered files contain blocks that not only are in wrong order, but belong to 2 or more movies.
For example, half of the blocks of a file belong to a DVCPro HD clip, and the other half to an Intermediate clip. The blocks are interleaved.
Needless to say, the puzzle acquires another dimension here: to be successful, one has to combine together blocks from several files.
As of today, such a repair has never been carried out, due to excessive costs involved.