Unlike video, audio is usually not re-indexed, but scraped and aggregated.
We talk of audio "frames" by making extensive the concept of video frame. But in fact in most audio codecs there's no such things as a "frame": Audio is often represented by a continuous flow of data. So here frame means a block of audio data that corresponds to the duration of a video frame.
Most of the time, audio frames are interleaved between video frames. As video frames are usually easy to detect, at least the point where they begin, it's usually easy to figure out where an audio frame ends: exactly where the next video frame begins. But to determine exactly where a video frame ends and an audio frame begins is far from easy.
Case #1: Fixed-length
The easiest case is when audio blocks have a fixed length, or at least a predictable one.
Even though, the fact that movies, in particular professionally recorded files, can have 2, 4 and up to 8 audio channels, make the scraping process a bit more complex.
What if you don't have fixed length, easily identifiable audio data blocks?
Several things can happen, from easiest to most difficult.
Case #2: Fields of dummy data
It's common in recording devices to use more space than needed in the file. The blocks of data are separated by placeholders full of zeros or a similar no-information pattern. The reason is probably that audio and video follow completely different processing paths and are put together asynchronously, preserving the order of decoding, and to make this possible, some extra space has to be allocated as the final size is not predictable. Those fields of zeros are sometimes a reliable marker of the beginning of an audio block.
Case #3: Overlapping
Some recording devices also use overlapping: the end of an audio block is repeated at the beginning of the next one. If the overlap is big enough, it makes audio scraping very reliable.
Case #4: Density analysis
Audio data can in general, be detected by statistical methods. For example, raw audio (also called PCM audio) is the transcription of a physical signal, and as such, tends of have a continuous variation. The average value is also quite predictable.
Density analysis can not be 100% accurate, because statistical methods only tell things about big amounts of information: The exact point where an audio frame starts cannot be determined. This uncertainty can lead to several defects in repaired audio.
Case #5: Probabilistic parsing & decoding
AAC is one of the most difficult audio codec, as far as repair is concerned. The previous methods don't work. To determine that some data represent AAC data, you have to try to decode it as if it were AAC. If no inconsistency was found, and the result looks "audible", then we consider it is actually AAC. This approach is complex, is not 100% reliable, and requires big amount of work (by the Repair Technician, and then by the computer).
Errors
As explained above, scraping is not always perfect. Errors can cause defects and audio video sync problems in repaired files. In some cases, you need to hide defects and correct audio sync after a repair.
Aggregating technique
The most common way to create a valid audio file from scraped audio data is to use a valid audio file with exactly the same settings as a base, then insert the audio data inside it. Finally, the key operation consists in correcting several parameters encoded in the file, like number of samples, total length of data, ...
The file can then be open with QuickTime Player.