Mirror/lidify

Fork 0

Files

History

Kevin O'Neill 021aec7a63 Initial release v1.0.0

2025-12-25 18:58:06 -06:00

README.md

Initial release v1.0.0

2025-12-25 18:58:06 -06:00

README.md

Vibe Matching Algorithm Overhaul Plan

Overview

This document outlines the plan to overhaul the vibe matching algorithm to use cosine similarity on a comprehensive feature vector that includes all 9 ML mood predictions, audio features, and genre/tag matching.

Current State (Before Overhaul)

What We Have

ML Mood Predictions (9 total):
- moodHappy, moodSad, moodRelaxed, moodAggressive (existing)
- moodParty, moodAcoustic, moodElectronic (newly added)
- danceabilityMl, aggressivenessMl (existing)
Audio Features:
- bpm, key, keyScale (major/minor)
- energy, danceability, valence, arousal
- instrumentalness, acousticness, speechiness
Metadata:
- lastfmTags (JSON array of tag objects with name/count)
- essentiaGenres (JSON array of genre strings)
- trackGenres relation (linked genre records)

Previous Algorithm (Weighted Manhattan Distance)

// Old approach - arbitrary weights, limited features
const weights = {
  energy: 1.5,
  danceability: 1.2,
  valence: 1.0,
  arousal: 1.0,
  instrumentalness: 0.8,
  bpm: 0.5,
};

let score = 0;
for (const [feature, weight] of Object.entries(weights)) {
  const diff = Math.abs(sourceTrack[feature] - candidateTrack[feature]);
  score += diff * weight;
}
// Lower score = more similar (inverted logic)

Problems with old approach:

Only used 6 features, ignored all ML mood predictions
Arbitrary weights with no scientific basis
Manhattan distance less effective for high-dimensional feature spaces
No genre/tag matching
Score inversion was confusing

New Algorithm (Cosine Similarity)

Phase 1: Database Schema Update ✅

Add new mood fields to Prisma schema:

model Track {
  // ... existing fields ...
  
  // ML Mood Predictions (0.0-1.0)
  moodHappy       Float?
  moodSad         Float?
  moodRelaxed     Float?
  moodAggressive  Float?
  moodParty       Float?      // NEW
  moodAcoustic    Float?      // NEW
  moodElectronic  Float?      // NEW
  
  // ... rest of schema ...
}

Migration command:

cd backend
npx prisma db push --skip-generate

Phase 2: Audio Analyzer Update ✅

Update services/audio-analyzer/analyzer.py to extract and save all 7 mood predictions:

# MusiCNN mood classifiers
mood_models = {
    'moodHappy': 'mood_happy-musicnn-msd-2',
    'moodSad': 'mood_sad-musicnn-msd-2',
    'moodRelaxed': 'mood_relaxed-musicnn-msd-2',
    'moodAggressive': 'mood_aggressive-musicnn-msd-2',
    'moodParty': 'mood_party-musicnn-msd-2',
    'moodAcoustic': 'mood_acoustic-musicnn-msd-2',
    'moodElectronic': 'mood_electronic-musicnn-msd-2',
}

# Save all to database
UPDATE "Track" SET
  "moodHappy" = %s,
  "moodSad" = %s,
  "moodRelaxed" = %s,
  "moodAggressive" = %s,
  "moodParty" = %s,
  "moodAcoustic" = %s,
  "moodElectronic" = %s,
  ...

Phase 3: Feature Vector Construction

Build a normalized feature vector for each track:

interface TrackFeatures {
  // ML Moods (0-1)
  moodHappy: number | null;
  moodSad: number | null;
  moodRelaxed: number | null;
  moodAggressive: number | null;
  moodParty: number | null;
  moodAcoustic: number | null;
  moodElectronic: number | null;
  
  // Audio Features
  energy: number | null;
  arousal: number | null;
  danceability: number | null;
  danceabilityMl: number | null;
  instrumentalness: number | null;
  bpm: number | null;
  keyScale: string | null;
  
  // Metadata
  lastfmTags: any;
  essentiaGenres: any;
}

function buildFeatureVector(track: TrackFeatures): number[] {
  return [
    // 7 ML Mood predictions (indices 0-6)
    track.moodHappy ?? 0.5,
    track.moodSad ?? 0.5,
    track.moodRelaxed ?? 0.5,
    track.moodAggressive ?? 0.5,
    track.moodParty ?? 0.5,
    track.moodAcoustic ?? 0.5,
    track.moodElectronic ?? 0.5,
    
    // Core audio features (indices 7-10)
    track.energy ?? 0.5,
    track.arousal ?? 0.5,
    track.danceabilityMl ?? track.danceability ?? 0.5,
    track.instrumentalness ?? 0.5,
    
    // Normalized BPM (index 11)
    // Maps 60-180 BPM to 0-1 range
    Math.max(0, Math.min(1, ((track.bpm ?? 120) - 60) / 120)),
    
    // Key mode (index 12)
    // Major = 1, Minor = 0
    track.keyScale === 'major' ? 1 : 0,
  ];
}

Feature Vector Dimensions: 13

Phase 4: Cosine Similarity Calculation

function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;
  
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }
  
  if (magnitudeA === 0 || magnitudeB === 0) return 0;
  
  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}

Properties:

Returns value between -1 and 1 (for our 0-1 normalized vectors, always 0 to 1)
1.0 = identical vectors (perfect match)
0.0 = orthogonal vectors (no similarity)
Higher = better (intuitive, no inversion needed)

Phase 5: Tag/Genre Bonus

Add bonus points for matching tags and genres:

function calculateTagBonus(
  sourceTrack: TrackFeatures,
  candidateTrack: TrackFeatures
): number {
  let bonus = 0;
  
  // Extract tags
  const sourceTags = new Set<string>();
  const candidateTags = new Set<string>();
  
  // Parse lastfmTags
  if (Array.isArray(sourceTrack.lastfmTags)) {
    sourceTrack.lastfmTags.forEach((t: any) => {
      if (t?.name) sourceTags.add(t.name.toLowerCase());
    });
  }
  if (Array.isArray(candidateTrack.lastfmTags)) {
    candidateTrack.lastfmTags.forEach((t: any) => {
      if (t?.name) candidateTags.add(t.name.toLowerCase());
    });
  }
  
  // Parse essentiaGenres
  if (Array.isArray(sourceTrack.essentiaGenres)) {
    sourceTrack.essentiaGenres.forEach((g: string) => {
      sourceTags.add(g.toLowerCase());
    });
  }
  if (Array.isArray(candidateTrack.essentiaGenres)) {
    candidateTrack.essentiaGenres.forEach((g: string) => {
      candidateTags.add(g.toLowerCase());
    });
  }
  
  // Count overlapping tags
  let overlap = 0;
  for (const tag of sourceTags) {
    if (candidateTags.has(tag)) overlap++;
  }
  
  // Bonus: up to 0.1 (10%) for tag overlap
  // Normalized by the smaller set size to handle varying tag counts
  const minSize = Math.min(sourceTags.size, candidateTags.size);
  if (minSize > 0) {
    bonus = (overlap / minSize) * 0.1;
  }
  
  return bonus;
}

Phase 6: Final Score Calculation

function calculateVibeScore(
  sourceTrack: TrackFeatures,
  candidateTrack: TrackFeatures
): number {
  // Build feature vectors
  const sourceVector = buildFeatureVector(sourceTrack);
  const candidateVector = buildFeatureVector(candidateTrack);
  
  // Calculate cosine similarity (0-1)
  const cosineSim = cosineSimilarity(sourceVector, candidateVector);
  
  // Add tag bonus (0-0.1)
  const tagBonus = calculateTagBonus(sourceTrack, candidateTrack);
  
  // Final score: cosine similarity + tag bonus
  // Capped at 1.0
  const finalScore = Math.min(1.0, cosineSim + tagBonus);
  
  return finalScore;
}

Phase 7: Integration into Radio Endpoint

Update backend/src/routes/library.ts:

// In the vibe radio section
const sourceTrack = await prisma.track.findUnique({
  where: { id: trackId },
  select: {
    moodHappy: true,
    moodSad: true,
    moodRelaxed: true,
    moodAggressive: true,
    moodParty: true,
    moodAcoustic: true,
    moodElectronic: true,
    energy: true,
    arousal: true,
    danceability: true,
    danceabilityMl: true,
    instrumentalness: true,
    bpm: true,
    keyScale: true,
    lastfmTags: true,
    essentiaGenres: true,
  },
});

// Get candidates
const candidates = await prisma.track.findMany({
  where: {
    id: { not: trackId },
    analysisStatus: 'enhanced', // Only use analyzed tracks
  },
  select: { /* same fields */ },
  take: 500, // Get more candidates for better matching
});

// Score all candidates
const scored = candidates.map(candidate => ({
  ...candidate,
  vibeScore: calculateVibeScore(sourceTrack, candidate),
}));

// Sort by score (highest first)
scored.sort((a, b) => b.vibeScore - a.vibeScore);

// Take top N for the queue
const vibeQueue = scored.slice(0, limit);

// DO NOT SHUFFLE - preserve the sorted order!

Implementation Checklist

Phase 1: Add moodParty, moodAcoustic, moodElectronic to Prisma schema
Phase 2: Update audio analyzer to extract all 7 moods
Phase 3: Implement buildFeatureVector() function
Phase 4: Implement cosineSimilarity() function
Phase 5: Implement calculateTagBonus() function (called computeTagBonus)
Phase 6: Implement calculateVibeScore() combining all components
Phase 7: Integrate into /library/radio endpoint
Phase 8: Update frontend to display match percentage (optional enhancement)
Phase 9: Re-analyze tracks to populate new mood fields

Re-Analysis Script

To populate the new mood fields for existing tracks:

-- Reset analysis status for enhanced tracks to re-run analysis
UPDATE "Track"
SET "analysisStatus" = 'pending'
WHERE "analysisStatus" = 'enhanced';

Or use the existing script:

docker exec lidify_db psql -U lidifydb -d lidify -f /path/to/reset-analysis-for-new-moods.sql

Expected Improvements

Better Similarity Matching: Cosine similarity is mathematically proven to work well for high-dimensional feature vectors
Full ML Utilization: All 9 mood predictions now contribute to matching
Genre Awareness: Tag/genre overlap provides meaningful boost
Intuitive Scores: Higher score = better match (no inversion)
Normalized Features: All features scaled to 0-1 for fair comparison

Testing Strategy

Pick a track with known characteristics (e.g., happy upbeat pop song)
Generate vibe queue
Verify top matches share similar mood profiles
Check that match percentages in UI reflect actual similarity
Test with various genres to ensure cross-genre matching works appropriately

Files Modified

backend/prisma/schema.prisma - New mood fields
backend/src/routes/library.ts - New scoring algorithm
services/audio-analyzer/analyzer.py - Extract all 7 moods
frontend/components/player/VibeOverlay.tsx - Display all moods
frontend/lib/audio-state-context.tsx - Extended AudioFeatures interface

Notes

Gaia: Essentia has a companion library called Gaia for large-scale similarity search using KD-trees. This is overkill for our scale (< 100k tracks) but could be considered for future scaling.
MusiCNN Limitations: The model was trained on MSD (Million Song Dataset) which is pop/rock heavy. For classical/ambient music, predictions may be less reliable. We've added normalization to handle this.
Shuffle Interaction: Vibe mode automatically disables shuffle to preserve the sorted order.