397 lines
11 KiB
Markdown
397 lines
11 KiB
Markdown
# Vibe Matching Algorithm Overhaul Plan
|
|
|
|
## Overview
|
|
|
|
This document outlines the plan to overhaul the vibe matching algorithm to use **cosine similarity** on a comprehensive feature vector that includes all 9 ML mood predictions, audio features, and genre/tag matching.
|
|
|
|
## Current State (Before Overhaul)
|
|
|
|
### What We Have
|
|
- **ML Mood Predictions (9 total):**
|
|
- `moodHappy`, `moodSad`, `moodRelaxed`, `moodAggressive` (existing)
|
|
- `moodParty`, `moodAcoustic`, `moodElectronic` (newly added)
|
|
- `danceabilityMl`, `aggressivenessMl` (existing)
|
|
|
|
- **Audio Features:**
|
|
- `bpm`, `key`, `keyScale` (major/minor)
|
|
- `energy`, `danceability`, `valence`, `arousal`
|
|
- `instrumentalness`, `acousticness`, `speechiness`
|
|
|
|
- **Metadata:**
|
|
- `lastfmTags` (JSON array of tag objects with name/count)
|
|
- `essentiaGenres` (JSON array of genre strings)
|
|
- `trackGenres` relation (linked genre records)
|
|
|
|
### Previous Algorithm (Weighted Manhattan Distance)
|
|
```typescript
|
|
// Old approach - arbitrary weights, limited features
|
|
const weights = {
|
|
energy: 1.5,
|
|
danceability: 1.2,
|
|
valence: 1.0,
|
|
arousal: 1.0,
|
|
instrumentalness: 0.8,
|
|
bpm: 0.5,
|
|
};
|
|
|
|
let score = 0;
|
|
for (const [feature, weight] of Object.entries(weights)) {
|
|
const diff = Math.abs(sourceTrack[feature] - candidateTrack[feature]);
|
|
score += diff * weight;
|
|
}
|
|
// Lower score = more similar (inverted logic)
|
|
```
|
|
|
|
**Problems with old approach:**
|
|
1. Only used 6 features, ignored all ML mood predictions
|
|
2. Arbitrary weights with no scientific basis
|
|
3. Manhattan distance less effective for high-dimensional feature spaces
|
|
4. No genre/tag matching
|
|
5. Score inversion was confusing
|
|
|
|
---
|
|
|
|
## New Algorithm (Cosine Similarity)
|
|
|
|
### Phase 1: Database Schema Update ✅
|
|
Add new mood fields to Prisma schema:
|
|
|
|
```prisma
|
|
model Track {
|
|
// ... existing fields ...
|
|
|
|
// ML Mood Predictions (0.0-1.0)
|
|
moodHappy Float?
|
|
moodSad Float?
|
|
moodRelaxed Float?
|
|
moodAggressive Float?
|
|
moodParty Float? // NEW
|
|
moodAcoustic Float? // NEW
|
|
moodElectronic Float? // NEW
|
|
|
|
// ... rest of schema ...
|
|
}
|
|
```
|
|
|
|
**Migration command:**
|
|
```bash
|
|
cd backend
|
|
npx prisma db push --skip-generate
|
|
```
|
|
|
|
### Phase 2: Audio Analyzer Update ✅
|
|
Update `services/audio-analyzer/analyzer.py` to extract and save all 7 mood predictions:
|
|
|
|
```python
|
|
# MusiCNN mood classifiers
|
|
mood_models = {
|
|
'moodHappy': 'mood_happy-musicnn-msd-2',
|
|
'moodSad': 'mood_sad-musicnn-msd-2',
|
|
'moodRelaxed': 'mood_relaxed-musicnn-msd-2',
|
|
'moodAggressive': 'mood_aggressive-musicnn-msd-2',
|
|
'moodParty': 'mood_party-musicnn-msd-2',
|
|
'moodAcoustic': 'mood_acoustic-musicnn-msd-2',
|
|
'moodElectronic': 'mood_electronic-musicnn-msd-2',
|
|
}
|
|
|
|
# Save all to database
|
|
UPDATE "Track" SET
|
|
"moodHappy" = %s,
|
|
"moodSad" = %s,
|
|
"moodRelaxed" = %s,
|
|
"moodAggressive" = %s,
|
|
"moodParty" = %s,
|
|
"moodAcoustic" = %s,
|
|
"moodElectronic" = %s,
|
|
...
|
|
```
|
|
|
|
### Phase 3: Feature Vector Construction
|
|
Build a normalized feature vector for each track:
|
|
|
|
```typescript
|
|
interface TrackFeatures {
|
|
// ML Moods (0-1)
|
|
moodHappy: number | null;
|
|
moodSad: number | null;
|
|
moodRelaxed: number | null;
|
|
moodAggressive: number | null;
|
|
moodParty: number | null;
|
|
moodAcoustic: number | null;
|
|
moodElectronic: number | null;
|
|
|
|
// Audio Features
|
|
energy: number | null;
|
|
arousal: number | null;
|
|
danceability: number | null;
|
|
danceabilityMl: number | null;
|
|
instrumentalness: number | null;
|
|
bpm: number | null;
|
|
keyScale: string | null;
|
|
|
|
// Metadata
|
|
lastfmTags: any;
|
|
essentiaGenres: any;
|
|
}
|
|
|
|
function buildFeatureVector(track: TrackFeatures): number[] {
|
|
return [
|
|
// 7 ML Mood predictions (indices 0-6)
|
|
track.moodHappy ?? 0.5,
|
|
track.moodSad ?? 0.5,
|
|
track.moodRelaxed ?? 0.5,
|
|
track.moodAggressive ?? 0.5,
|
|
track.moodParty ?? 0.5,
|
|
track.moodAcoustic ?? 0.5,
|
|
track.moodElectronic ?? 0.5,
|
|
|
|
// Core audio features (indices 7-10)
|
|
track.energy ?? 0.5,
|
|
track.arousal ?? 0.5,
|
|
track.danceabilityMl ?? track.danceability ?? 0.5,
|
|
track.instrumentalness ?? 0.5,
|
|
|
|
// Normalized BPM (index 11)
|
|
// Maps 60-180 BPM to 0-1 range
|
|
Math.max(0, Math.min(1, ((track.bpm ?? 120) - 60) / 120)),
|
|
|
|
// Key mode (index 12)
|
|
// Major = 1, Minor = 0
|
|
track.keyScale === 'major' ? 1 : 0,
|
|
];
|
|
}
|
|
```
|
|
|
|
**Feature Vector Dimensions: 13**
|
|
|
|
### Phase 4: Cosine Similarity Calculation
|
|
|
|
```typescript
|
|
function cosineSimilarity(a: number[], b: number[]): number {
|
|
let dotProduct = 0;
|
|
let magnitudeA = 0;
|
|
let magnitudeB = 0;
|
|
|
|
for (let i = 0; i < a.length; i++) {
|
|
dotProduct += a[i] * b[i];
|
|
magnitudeA += a[i] * a[i];
|
|
magnitudeB += b[i] * b[i];
|
|
}
|
|
|
|
if (magnitudeA === 0 || magnitudeB === 0) return 0;
|
|
|
|
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
|
|
}
|
|
```
|
|
|
|
**Properties:**
|
|
- Returns value between -1 and 1 (for our 0-1 normalized vectors, always 0 to 1)
|
|
- 1.0 = identical vectors (perfect match)
|
|
- 0.0 = orthogonal vectors (no similarity)
|
|
- Higher = better (intuitive, no inversion needed)
|
|
|
|
### Phase 5: Tag/Genre Bonus
|
|
|
|
Add bonus points for matching tags and genres:
|
|
|
|
```typescript
|
|
function calculateTagBonus(
|
|
sourceTrack: TrackFeatures,
|
|
candidateTrack: TrackFeatures
|
|
): number {
|
|
let bonus = 0;
|
|
|
|
// Extract tags
|
|
const sourceTags = new Set<string>();
|
|
const candidateTags = new Set<string>();
|
|
|
|
// Parse lastfmTags
|
|
if (Array.isArray(sourceTrack.lastfmTags)) {
|
|
sourceTrack.lastfmTags.forEach((t: any) => {
|
|
if (t?.name) sourceTags.add(t.name.toLowerCase());
|
|
});
|
|
}
|
|
if (Array.isArray(candidateTrack.lastfmTags)) {
|
|
candidateTrack.lastfmTags.forEach((t: any) => {
|
|
if (t?.name) candidateTags.add(t.name.toLowerCase());
|
|
});
|
|
}
|
|
|
|
// Parse essentiaGenres
|
|
if (Array.isArray(sourceTrack.essentiaGenres)) {
|
|
sourceTrack.essentiaGenres.forEach((g: string) => {
|
|
sourceTags.add(g.toLowerCase());
|
|
});
|
|
}
|
|
if (Array.isArray(candidateTrack.essentiaGenres)) {
|
|
candidateTrack.essentiaGenres.forEach((g: string) => {
|
|
candidateTags.add(g.toLowerCase());
|
|
});
|
|
}
|
|
|
|
// Count overlapping tags
|
|
let overlap = 0;
|
|
for (const tag of sourceTags) {
|
|
if (candidateTags.has(tag)) overlap++;
|
|
}
|
|
|
|
// Bonus: up to 0.1 (10%) for tag overlap
|
|
// Normalized by the smaller set size to handle varying tag counts
|
|
const minSize = Math.min(sourceTags.size, candidateTags.size);
|
|
if (minSize > 0) {
|
|
bonus = (overlap / minSize) * 0.1;
|
|
}
|
|
|
|
return bonus;
|
|
}
|
|
```
|
|
|
|
### Phase 6: Final Score Calculation
|
|
|
|
```typescript
|
|
function calculateVibeScore(
|
|
sourceTrack: TrackFeatures,
|
|
candidateTrack: TrackFeatures
|
|
): number {
|
|
// Build feature vectors
|
|
const sourceVector = buildFeatureVector(sourceTrack);
|
|
const candidateVector = buildFeatureVector(candidateTrack);
|
|
|
|
// Calculate cosine similarity (0-1)
|
|
const cosineSim = cosineSimilarity(sourceVector, candidateVector);
|
|
|
|
// Add tag bonus (0-0.1)
|
|
const tagBonus = calculateTagBonus(sourceTrack, candidateTrack);
|
|
|
|
// Final score: cosine similarity + tag bonus
|
|
// Capped at 1.0
|
|
const finalScore = Math.min(1.0, cosineSim + tagBonus);
|
|
|
|
return finalScore;
|
|
}
|
|
```
|
|
|
|
### Phase 7: Integration into Radio Endpoint
|
|
|
|
Update `backend/src/routes/library.ts`:
|
|
|
|
```typescript
|
|
// In the vibe radio section
|
|
const sourceTrack = await prisma.track.findUnique({
|
|
where: { id: trackId },
|
|
select: {
|
|
moodHappy: true,
|
|
moodSad: true,
|
|
moodRelaxed: true,
|
|
moodAggressive: true,
|
|
moodParty: true,
|
|
moodAcoustic: true,
|
|
moodElectronic: true,
|
|
energy: true,
|
|
arousal: true,
|
|
danceability: true,
|
|
danceabilityMl: true,
|
|
instrumentalness: true,
|
|
bpm: true,
|
|
keyScale: true,
|
|
lastfmTags: true,
|
|
essentiaGenres: true,
|
|
},
|
|
});
|
|
|
|
// Get candidates
|
|
const candidates = await prisma.track.findMany({
|
|
where: {
|
|
id: { not: trackId },
|
|
analysisStatus: 'enhanced', // Only use analyzed tracks
|
|
},
|
|
select: { /* same fields */ },
|
|
take: 500, // Get more candidates for better matching
|
|
});
|
|
|
|
// Score all candidates
|
|
const scored = candidates.map(candidate => ({
|
|
...candidate,
|
|
vibeScore: calculateVibeScore(sourceTrack, candidate),
|
|
}));
|
|
|
|
// Sort by score (highest first)
|
|
scored.sort((a, b) => b.vibeScore - a.vibeScore);
|
|
|
|
// Take top N for the queue
|
|
const vibeQueue = scored.slice(0, limit);
|
|
|
|
// DO NOT SHUFFLE - preserve the sorted order!
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Checklist
|
|
|
|
- [x] **Phase 1:** Add `moodParty`, `moodAcoustic`, `moodElectronic` to Prisma schema
|
|
- [x] **Phase 2:** Update audio analyzer to extract all 7 moods
|
|
- [x] **Phase 3:** Implement `buildFeatureVector()` function
|
|
- [x] **Phase 4:** Implement `cosineSimilarity()` function
|
|
- [x] **Phase 5:** Implement `calculateTagBonus()` function (called `computeTagBonus`)
|
|
- [x] **Phase 6:** Implement `calculateVibeScore()` combining all components
|
|
- [x] **Phase 7:** Integrate into `/library/radio` endpoint
|
|
- [ ] **Phase 8:** Update frontend to display match percentage (optional enhancement)
|
|
- [ ] **Phase 9:** Re-analyze tracks to populate new mood fields
|
|
|
|
---
|
|
|
|
## Re-Analysis Script
|
|
|
|
To populate the new mood fields for existing tracks:
|
|
|
|
```sql
|
|
-- Reset analysis status for enhanced tracks to re-run analysis
|
|
UPDATE "Track"
|
|
SET "analysisStatus" = 'pending'
|
|
WHERE "analysisStatus" = 'enhanced';
|
|
```
|
|
|
|
Or use the existing script:
|
|
```bash
|
|
docker exec lidify_db psql -U lidifydb -d lidify -f /path/to/reset-analysis-for-new-moods.sql
|
|
```
|
|
|
|
---
|
|
|
|
## Expected Improvements
|
|
|
|
1. **Better Similarity Matching:** Cosine similarity is mathematically proven to work well for high-dimensional feature vectors
|
|
2. **Full ML Utilization:** All 9 mood predictions now contribute to matching
|
|
3. **Genre Awareness:** Tag/genre overlap provides meaningful boost
|
|
4. **Intuitive Scores:** Higher score = better match (no inversion)
|
|
5. **Normalized Features:** All features scaled to 0-1 for fair comparison
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
1. Pick a track with known characteristics (e.g., happy upbeat pop song)
|
|
2. Generate vibe queue
|
|
3. Verify top matches share similar mood profiles
|
|
4. Check that match percentages in UI reflect actual similarity
|
|
5. Test with various genres to ensure cross-genre matching works appropriately
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
- `backend/prisma/schema.prisma` - New mood fields
|
|
- `backend/src/routes/library.ts` - New scoring algorithm
|
|
- `services/audio-analyzer/analyzer.py` - Extract all 7 moods
|
|
- `frontend/components/player/VibeOverlay.tsx` - Display all moods
|
|
- `frontend/lib/audio-state-context.tsx` - Extended AudioFeatures interface
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- **Gaia:** Essentia has a companion library called Gaia for large-scale similarity search using KD-trees. This is overkill for our scale (< 100k tracks) but could be considered for future scaling.
|
|
- **MusiCNN Limitations:** The model was trained on MSD (Million Song Dataset) which is pop/rock heavy. For classical/ambient music, predictions may be less reliable. We've added normalization to handle this.
|
|
- **Shuffle Interaction:** Vibe mode automatically disables shuffle to preserve the sorted order.
|
|
|