Create spectrogram (frequencies for all time chunks)
Find peaks: mechanically do this by running a max pooling filter, then find places where original == filtered
Optionally discard some peaks, either top k or thresholded
Create pairs to hash. Not all pairs—pick a few anchors along with “target zone.” Various schemes for deciding what pairs to generate.
Hash each pair. Each pair is $(f_a, f_b, \Delta T)$ . Store these along with other info, esp. absolute timestamp, song ID.
At query time, look up by hashes. Compare absolute timestamps within sample and within orig. If many consistent offsets, then good. Score is # matches with the most frequent offset.