Files
lidify/docs/implementation-summaries/vibe-matching-overhaul
2025-12-25 18:58:06 -06:00
..
2025-12-25 18:58:06 -06:00

Vibe Matching Algorithm Overhaul Plan

Overview

This document outlines the plan to overhaul the vibe matching algorithm to use cosine similarity on a comprehensive feature vector that includes all 9 ML mood predictions, audio features, and genre/tag matching.

Current State (Before Overhaul)

What We Have

  • ML Mood Predictions (9 total):

    • moodHappy, moodSad, moodRelaxed, moodAggressive (existing)
    • moodParty, moodAcoustic, moodElectronic (newly added)
    • danceabilityMl, aggressivenessMl (existing)
  • Audio Features:

    • bpm, key, keyScale (major/minor)
    • energy, danceability, valence, arousal
    • instrumentalness, acousticness, speechiness
  • Metadata:

    • lastfmTags (JSON array of tag objects with name/count)
    • essentiaGenres (JSON array of genre strings)
    • trackGenres relation (linked genre records)

Previous Algorithm (Weighted Manhattan Distance)

// Old approach - arbitrary weights, limited features
const weights = {
  energy: 1.5,
  danceability: 1.2,
  valence: 1.0,
  arousal: 1.0,
  instrumentalness: 0.8,
  bpm: 0.5,
};

let score = 0;
for (const [feature, weight] of Object.entries(weights)) {
  const diff = Math.abs(sourceTrack[feature] - candidateTrack[feature]);
  score += diff * weight;
}
// Lower score = more similar (inverted logic)

Problems with old approach:

  1. Only used 6 features, ignored all ML mood predictions
  2. Arbitrary weights with no scientific basis
  3. Manhattan distance less effective for high-dimensional feature spaces
  4. No genre/tag matching
  5. Score inversion was confusing

New Algorithm (Cosine Similarity)

Phase 1: Database Schema Update

Add new mood fields to Prisma schema:

model Track {
  // ... existing fields ...
  
  // ML Mood Predictions (0.0-1.0)
  moodHappy       Float?
  moodSad         Float?
  moodRelaxed     Float?
  moodAggressive  Float?
  moodParty       Float?      // NEW
  moodAcoustic    Float?      // NEW
  moodElectronic  Float?      // NEW
  
  // ... rest of schema ...
}

Migration command:

cd backend
npx prisma db push --skip-generate

Phase 2: Audio Analyzer Update

Update services/audio-analyzer/analyzer.py to extract and save all 7 mood predictions:

# MusiCNN mood classifiers
mood_models = {
    'moodHappy': 'mood_happy-musicnn-msd-2',
    'moodSad': 'mood_sad-musicnn-msd-2',
    'moodRelaxed': 'mood_relaxed-musicnn-msd-2',
    'moodAggressive': 'mood_aggressive-musicnn-msd-2',
    'moodParty': 'mood_party-musicnn-msd-2',
    'moodAcoustic': 'mood_acoustic-musicnn-msd-2',
    'moodElectronic': 'mood_electronic-musicnn-msd-2',
}

# Save all to database
UPDATE "Track" SET
  "moodHappy" = %s,
  "moodSad" = %s,
  "moodRelaxed" = %s,
  "moodAggressive" = %s,
  "moodParty" = %s,
  "moodAcoustic" = %s,
  "moodElectronic" = %s,
  ...

Phase 3: Feature Vector Construction

Build a normalized feature vector for each track:

interface TrackFeatures {
  // ML Moods (0-1)
  moodHappy: number | null;
  moodSad: number | null;
  moodRelaxed: number | null;
  moodAggressive: number | null;
  moodParty: number | null;
  moodAcoustic: number | null;
  moodElectronic: number | null;
  
  // Audio Features
  energy: number | null;
  arousal: number | null;
  danceability: number | null;
  danceabilityMl: number | null;
  instrumentalness: number | null;
  bpm: number | null;
  keyScale: string | null;
  
  // Metadata
  lastfmTags: any;
  essentiaGenres: any;
}

function buildFeatureVector(track: TrackFeatures): number[] {
  return [
    // 7 ML Mood predictions (indices 0-6)
    track.moodHappy ?? 0.5,
    track.moodSad ?? 0.5,
    track.moodRelaxed ?? 0.5,
    track.moodAggressive ?? 0.5,
    track.moodParty ?? 0.5,
    track.moodAcoustic ?? 0.5,
    track.moodElectronic ?? 0.5,
    
    // Core audio features (indices 7-10)
    track.energy ?? 0.5,
    track.arousal ?? 0.5,
    track.danceabilityMl ?? track.danceability ?? 0.5,
    track.instrumentalness ?? 0.5,
    
    // Normalized BPM (index 11)
    // Maps 60-180 BPM to 0-1 range
    Math.max(0, Math.min(1, ((track.bpm ?? 120) - 60) / 120)),
    
    // Key mode (index 12)
    // Major = 1, Minor = 0
    track.keyScale === 'major' ? 1 : 0,
  ];
}

Feature Vector Dimensions: 13

Phase 4: Cosine Similarity Calculation

function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;
  
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }
  
  if (magnitudeA === 0 || magnitudeB === 0) return 0;
  
  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}

Properties:

  • Returns value between -1 and 1 (for our 0-1 normalized vectors, always 0 to 1)
  • 1.0 = identical vectors (perfect match)
  • 0.0 = orthogonal vectors (no similarity)
  • Higher = better (intuitive, no inversion needed)

Phase 5: Tag/Genre Bonus

Add bonus points for matching tags and genres:

function calculateTagBonus(
  sourceTrack: TrackFeatures,
  candidateTrack: TrackFeatures
): number {
  let bonus = 0;
  
  // Extract tags
  const sourceTags = new Set<string>();
  const candidateTags = new Set<string>();
  
  // Parse lastfmTags
  if (Array.isArray(sourceTrack.lastfmTags)) {
    sourceTrack.lastfmTags.forEach((t: any) => {
      if (t?.name) sourceTags.add(t.name.toLowerCase());
    });
  }
  if (Array.isArray(candidateTrack.lastfmTags)) {
    candidateTrack.lastfmTags.forEach((t: any) => {
      if (t?.name) candidateTags.add(t.name.toLowerCase());
    });
  }
  
  // Parse essentiaGenres
  if (Array.isArray(sourceTrack.essentiaGenres)) {
    sourceTrack.essentiaGenres.forEach((g: string) => {
      sourceTags.add(g.toLowerCase());
    });
  }
  if (Array.isArray(candidateTrack.essentiaGenres)) {
    candidateTrack.essentiaGenres.forEach((g: string) => {
      candidateTags.add(g.toLowerCase());
    });
  }
  
  // Count overlapping tags
  let overlap = 0;
  for (const tag of sourceTags) {
    if (candidateTags.has(tag)) overlap++;
  }
  
  // Bonus: up to 0.1 (10%) for tag overlap
  // Normalized by the smaller set size to handle varying tag counts
  const minSize = Math.min(sourceTags.size, candidateTags.size);
  if (minSize > 0) {
    bonus = (overlap / minSize) * 0.1;
  }
  
  return bonus;
}

Phase 6: Final Score Calculation

function calculateVibeScore(
  sourceTrack: TrackFeatures,
  candidateTrack: TrackFeatures
): number {
  // Build feature vectors
  const sourceVector = buildFeatureVector(sourceTrack);
  const candidateVector = buildFeatureVector(candidateTrack);
  
  // Calculate cosine similarity (0-1)
  const cosineSim = cosineSimilarity(sourceVector, candidateVector);
  
  // Add tag bonus (0-0.1)
  const tagBonus = calculateTagBonus(sourceTrack, candidateTrack);
  
  // Final score: cosine similarity + tag bonus
  // Capped at 1.0
  const finalScore = Math.min(1.0, cosineSim + tagBonus);
  
  return finalScore;
}

Phase 7: Integration into Radio Endpoint

Update backend/src/routes/library.ts:

// In the vibe radio section
const sourceTrack = await prisma.track.findUnique({
  where: { id: trackId },
  select: {
    moodHappy: true,
    moodSad: true,
    moodRelaxed: true,
    moodAggressive: true,
    moodParty: true,
    moodAcoustic: true,
    moodElectronic: true,
    energy: true,
    arousal: true,
    danceability: true,
    danceabilityMl: true,
    instrumentalness: true,
    bpm: true,
    keyScale: true,
    lastfmTags: true,
    essentiaGenres: true,
  },
});

// Get candidates
const candidates = await prisma.track.findMany({
  where: {
    id: { not: trackId },
    analysisStatus: 'enhanced', // Only use analyzed tracks
  },
  select: { /* same fields */ },
  take: 500, // Get more candidates for better matching
});

// Score all candidates
const scored = candidates.map(candidate => ({
  ...candidate,
  vibeScore: calculateVibeScore(sourceTrack, candidate),
}));

// Sort by score (highest first)
scored.sort((a, b) => b.vibeScore - a.vibeScore);

// Take top N for the queue
const vibeQueue = scored.slice(0, limit);

// DO NOT SHUFFLE - preserve the sorted order!

Implementation Checklist

  • Phase 1: Add moodParty, moodAcoustic, moodElectronic to Prisma schema
  • Phase 2: Update audio analyzer to extract all 7 moods
  • Phase 3: Implement buildFeatureVector() function
  • Phase 4: Implement cosineSimilarity() function
  • Phase 5: Implement calculateTagBonus() function (called computeTagBonus)
  • Phase 6: Implement calculateVibeScore() combining all components
  • Phase 7: Integrate into /library/radio endpoint
  • Phase 8: Update frontend to display match percentage (optional enhancement)
  • Phase 9: Re-analyze tracks to populate new mood fields

Re-Analysis Script

To populate the new mood fields for existing tracks:

-- Reset analysis status for enhanced tracks to re-run analysis
UPDATE "Track"
SET "analysisStatus" = 'pending'
WHERE "analysisStatus" = 'enhanced';

Or use the existing script:

docker exec lidify_db psql -U lidifydb -d lidify -f /path/to/reset-analysis-for-new-moods.sql

Expected Improvements

  1. Better Similarity Matching: Cosine similarity is mathematically proven to work well for high-dimensional feature vectors
  2. Full ML Utilization: All 9 mood predictions now contribute to matching
  3. Genre Awareness: Tag/genre overlap provides meaningful boost
  4. Intuitive Scores: Higher score = better match (no inversion)
  5. Normalized Features: All features scaled to 0-1 for fair comparison

Testing Strategy

  1. Pick a track with known characteristics (e.g., happy upbeat pop song)
  2. Generate vibe queue
  3. Verify top matches share similar mood profiles
  4. Check that match percentages in UI reflect actual similarity
  5. Test with various genres to ensure cross-genre matching works appropriately

Files Modified

  • backend/prisma/schema.prisma - New mood fields
  • backend/src/routes/library.ts - New scoring algorithm
  • services/audio-analyzer/analyzer.py - Extract all 7 moods
  • frontend/components/player/VibeOverlay.tsx - Display all moods
  • frontend/lib/audio-state-context.tsx - Extended AudioFeatures interface

Notes

  • Gaia: Essentia has a companion library called Gaia for large-scale similarity search using KD-trees. This is overkill for our scale (< 100k tracks) but could be considered for future scaling.
  • MusiCNN Limitations: The model was trained on MSD (Million Song Dataset) which is pop/rock heavy. For classical/ambient music, predictions may be less reliable. We've added normalization to handle this.
  • Shuffle Interaction: Vibe mode automatically disables shuffle to preserve the sorted order.