Audio signals: Comparison

Comparison means relative degree of similarity based out of some characteristics between two things. Both the things need to be on the same ground, following the same base rules and Audio Comparison is no different. We generate fingerprints from audio files and compare them based out of them.

Fingerprint generation in audio files can be done using multiple algorithms such as Echoprint, Chromaprint etc. For further implementation, we will go ahead with Chromaprint. There are 4 steps to compare two audio files and are listed below:

  • Input Source and Target audio files

For our example, we will be writing python script. Following snippet helps achieve initialize our source and target files.

  • Generate fingerprints for source and target files

For the generation of fingerprints using Chromaprint algorithm, we use command-line tool named fpcalc. This tool generates fingerprints using Chromaprint, but FFMPEG is required to build.

This generates two lists, fingerprint_source and fingerprint_target. Both these lists contain generated fingerprints of 32-bit size from fpcalc tool.

  • Calculate similarity score

For comparing the source and target files, we do not directly compare all the fingerprints. We compare corresponding fingerprints in both lists. Comparison is done based on the number of bits matching in given batch of fingerprints. While generating the fingerprints of audio files using Chromaprint, sometimes generated fingerprints end up with some unwanted errors causing some flips in the bits. Error in flipped bits upto 1 consists of 98% of the cases. So if the difference between fingerprint bits is unto 1, it is safe to assume that the fingerprints are similar.

This provides the similarity between any given lists of fingerprints based on the bit difference between corresponding fingerprints.

  • Check for offset in audio files

This covers the part if the fingerprints are similar or not for given source and target files. But this does not cover cases where the source and target files are shifted at the start or end. These are the cases where source and target files may be similar but have some shift or offset in the file.

To cover the latter, all we have to do is little loop around the fingerprints. We can introduce a variable, step, representing the current offset from the beginning of the source file and then repeat the comparison process and calculate the similarity score between the lists. This process ends up with an array of similarity scores or confidences between lists for all the offsets.

This finds the maximum similarity or confidence and at what offset it is calculated. Confidence and offset tells if the files are similar and how much is shift or offset from start of the source file.

This is how script looks at the end.

Any suggestions or thoughts, let me know:
Insta + Twitter + LinkedIn + Medium + Facebook | @shivama205

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store