Comparison means relative degree of similarity based out of some characteristics between two things. Both the things need to be on the same ground, following the same base rules and Audio Comparison is no different. We generate fingerprints from audio files and compare them based out of them.

Fingerprint generation in audio files can be done using multiple algorithms such as Echoprint, Chromaprint etc. For further implementation, we will go ahead with Chromaprint. There are 4 steps to compare two audio files and are listed below:

  • Input Source and Target audio files

For our example, we will be writing python script. Following snippet helps achieve initialize our source and target files.

# import argparse def initialize():
parser = argparse.ArgumentParser()
parser.add_argument("-i ", "--source-file", help="source file")
parser.add_argument("-o ", "--target-file", help="target file")
args = parser.parse_args()
SOURCE_FILE = args.source_file if args.source_file else None
TARGET_FILE = args.target_file if args.target_file else None
raise Exception("Source or Target files not specified.")
return SOURCE_FILE, TARGET_FILEif __name__ == "__main__":
  • Generate fingerprints for source and target files

For the generation of fingerprints using Chromaprint algorithm, we use command-line tool named fpcalc. This tool generates fingerprints using Chromaprint, but FFMPEG is required to build.

# correlation.pyimport commands # seconds to sample audio file for
sample_time = 5000
# calculate fingerprint
def calculate_fingerprints(filename):
fpcalc_out = commands.getoutput('fpcalc -raw -length %i %s'
% (sample_time, filename))
fingerprint_index = fpcalc_out.find('FINGERPRINT=') + 12
# convert fingerprint to list of integers
fingerprints = map(int, fpcalc_out[fingerprint_index:].split(','))

return fingerprints
def correlate(source, target):
fingerprint_source = calculate_fingerprints(source)
fingerprint_target = calculate_fingerprints(target)

This generates two lists, fingerprint_source and fingerprint_target. Both these lists contain generated fingerprints of 32-bit size from fpcalc tool.

  • Calculate similarity score

For comparing the source and target files, we do not directly compare all the fingerprints. We compare corresponding fingerprints in both lists. Comparison is done based on the number of bits matching in given batch of fingerprints. While generating the fingerprints of audio files using Chromaprint, sometimes generated fingerprints end up with some unwanted errors causing some flips in the bits. Error in flipped bits upto 1 consists of 98% of the cases. So if the difference between fingerprint bits is unto 1, it is safe to assume that the fingerprints are similar.

# # returns correlation between lists
def correlation(listx, listy):
if len(listx) == 0 or len(listy) == 0:
# Error checking in main program should prevent us from ever being
# able to get here.
raise Exception('Empty lists cannot be correlated.')
if len(listx) > len(listy):
listx = listx[:len(listy)]
elif len(listx) < len(listy):
listy = listy[:len(listx)]

covariance = 0
for i in range(len(listx)):
covariance += 32 - bin(listx[i] ^ listy[i]).count("1")
covariance = covariance / float(len(listx))

return covariance/32

This provides the similarity between any given lists of fingerprints based on the bit difference between corresponding fingerprints.

  • Check for offset in audio files

This covers the part if the fingerprints are similar or not for given source and target files. But this does not cover cases where the source and target files are shifted at the start or end. These are the cases where source and target files may be similar but have some shift or offset in the file. import numpy # number of points to scan cross correlation over
span = 150
# step size (in points) of cross correlation
step = 1
# minimum number of points that must overlap in cross correlation
# exception is raised if this cannot be met
min_overlap = 20
# return cross correlation, with listy offset from listx
def cross_correlation(listx, listy, offset):
if offset > 0:
listx = listx[offset:]
listy = listy[:len(listx)]
elif offset < 0:
offset = -offset
listy = listy[offset:]
listx = listx[:len(listy)]
if min(len(listx), len(listy)) < min_overlap:
# Error checking in main program should prevent us from ever being
# able to get here.
#raise Exception('Overlap too small: %i' % min(len(listx), len(listy)))
return correlation(listx, listy)
# cross correlate listx and listy with offsets from -span to span
def compare(listx, listy, span, step):
if span > min(len(listx), len(listy)):
# Error checking in main program should prevent us from ever being
# able to get here.
raise Exception('span >= sample size: %i >= %i\n'
% (span, min(len(listx), len(listy)))
+ 'Reduce span, reduce crop or increase sample_time.')
corr_xy = []
for offset in numpy.arange(-span, span + 1, step):
corr_xy.append(cross_correlation(listx, listy, offset))
return corr_xy

To cover the latter, all we have to do is little loop around the fingerprints. We can introduce a variable, step, representing the current offset from the beginning of the source file and then repeat the comparison process and calculate the similarity score between the lists. This process ends up with an array of similarity scores or confidences between lists for all the offsets. # report match when cross correlation has a peak exceeding threshold
threshold = 0.5
# return index of maximum value in list
def max_index(listx):
max_index = 0
max_value = listx[0]
for i, value in enumerate(listx):
if value > max_value:
max_value = value
max_index = i
return max_index
def get_max_corr(corr, source, target):
max_corr_index = max_index(corr)
max_corr_offset = -span + max_corr_index * step
print "max_corr_index = ", max_corr_index, "max_corr_offset = ", max_corr_offset
# report matches
if corr[max_corr_index] > threshold:
print('%s and %s match with correlation of %.4f at offset %i'
% (source, target, corr[max_corr_index], max_corr_offset))

This finds the maximum similarity or confidence and at what offset it is calculated. Confidence and offset tells if the files are similar and how much is shift or offset from start of the source file.

This is how script looks at the end.

Any suggestions or thoughts, let me know:
Insta + Twitter + LinkedIn + Medium + Facebook | @shivama205