Files
lidify/docs/vibe-system.md
2025-12-25 18:58:06 -06:00

652 lines
18 KiB
Markdown

# Lidify Vibe System Documentation
This document provides comprehensive documentation of the Vibe System - how Lidify analyzes tracks, collects audio metrics, and compares them for vibe matching. Use this as a reference for building frontend interfaces.
---
## Table of Contents
1. [Overview](#overview)
2. [Metrics Collected](#metrics-collected)
3. [Data Structures](#data-structures)
4. [Vibe Matching Algorithm](#vibe-matching-algorithm)
5. [API Endpoints](#api-endpoints)
6. [Frontend Integration Guide](#frontend-integration-guide)
7. [Existing Components Reference](#existing-components-reference)
---
## Overview
The Vibe System uses a combination of **audio signal analysis** and **ML-based mood prediction** to understand the "feel" of a track. It operates in two modes:
| Mode | Description | Accuracy |
|------|-------------|----------|
| **Standard** | Heuristic-based analysis using audio signal features (BPM, key, energy) | Good |
| **Enhanced** | ML-based analysis using MusiCNN neural network for mood prediction | Best |
The system enables:
- Finding tracks with similar vibes to a source track
- Generating mood-based playlists
- Visualizing track characteristics in real-time
---
## Metrics Collected
### Core Audio Features (Always Available)
These are extracted directly from audio signal analysis at 44.1kHz:
| Metric | Type | Range | Description |
|--------|------|-------|-------------|
| `bpm` | Float | 60-200 | Tempo in beats per minute |
| `beatsCount` | Int | 0+ | Total number of beats detected |
| `key` | String | "C", "F#", etc. | Musical key |
| `keyScale` | String | "major" \| "minor" | Major or minor tonality |
| `keyStrength` | Float | 0-1 | Confidence of key detection |
| `energy` | Float | 0-1 | RMS-based intensity level |
| `loudness` | Float | dB | Average loudness |
| `dynamicRange` | Float | dB | Difference between quietest and loudest |
| `danceability` | Float | 0-1 | Rhythm regularity and groove potential |
### ML Mood Predictions (Enhanced Mode)
Seven core mood dimensions predicted by the MusiCNN model:
| Metric | Type | Range | Description | Icon Suggestion |
|--------|------|-------|-------------|-----------------|
| `moodHappy` | Float | 0-1 | Happiness/cheerfulness probability | Smile |
| `moodSad` | Float | 0-1 | Sadness/melancholy probability | Frown |
| `moodRelaxed` | Float | 0-1 | Calm/peaceful probability | Coffee |
| `moodAggressive` | Float | 0-1 | Intensity/aggression probability | Flame |
| `moodParty` | Float | 0-1 | Upbeat/party probability | PartyPopper |
| `moodAcoustic` | Float | 0-1 | Acoustic instrumentation probability | Guitar |
| `moodElectronic` | Float | 0-1 | Electronic/synthetic probability | Radio |
### Derived Features (Computed)
These are calculated from the ML predictions:
#### Valence (Emotional Positivity)
```typescript
// Formula:
valence = (
moodHappy * 0.5 + // Happy mood (50% weight)
moodParty * 0.3 + // Party mood (30% weight)
(1 - moodSad) * 0.2 // Inverse of sadness (20% weight)
)
```
| Value | Interpretation |
|-------|----------------|
| 0.0 - 0.3 | Melancholic, sad |
| 0.3 - 0.6 | Neutral, balanced |
| 0.6 - 1.0 | Happy, positive |
#### Arousal (Energy/Excitement Level)
```typescript
// Formula:
arousal = (
moodAggressive * 0.35 + // Aggressive mood (35% weight)
moodParty * 0.25 + // Party mood (25% weight)
moodElectronic * 0.2 + // Electronic sound (20% weight)
(1 - moodRelaxed) * 0.1 + // Inverse of relaxation (10% weight)
(1 - moodAcoustic) * 0.1 // Inverse of acoustic (10% weight)
)
```
| Value | Interpretation |
|-------|----------------|
| 0.0 - 0.3 | Calm, peaceful |
| 0.3 - 0.6 | Moderate energy |
| 0.6 - 1.0 | High energy, intense |
### Additional Features
| Metric | Type | Range | Description |
|--------|------|-------|-------------|
| `instrumentalness` | Float | 0-1 | Voice presence (0=vocal, 1=instrumental) |
| `acousticness` | Float | 0-1 | Acoustic vs. processed sound |
| `speechiness` | Float | 0-1 | Spoken word detection |
| `danceabilityMl` | Float | 0-1 | ML-based danceability (more accurate) |
### Metadata & Tags
| Field | Type | Description |
|-------|------|-------------|
| `moodTags` | String[] | Derived mood labels (e.g., ["chill", "happy"]) |
| `essentiaGenres` | String[] | ML-predicted genres (e.g., ["rock", "electronic"]) |
| `lastfmTags` | String[] | User-generated tags from Last.fm |
| `analysisStatus` | String | "pending" \| "processing" \| "completed" \| "failed" |
| `analysisMode` | String | "standard" \| "enhanced" |
| `analyzedAt` | DateTime | When analysis was performed |
---
## Data Structures
### TypeScript Interface
```typescript
interface AudioFeatures {
// Core audio features
bpm?: number | null;
beatsCount?: number | null;
key?: string | null;
keyScale?: string | null;
keyStrength?: number | null;
energy?: number | null;
loudness?: number | null;
dynamicRange?: number | null;
danceability?: number | null;
// Derived features
valence?: number | null;
arousal?: number | null;
// Additional features
instrumentalness?: number | null;
acousticness?: number | null;
speechiness?: number | null;
danceabilityMl?: number | null;
// ML Mood predictions (Enhanced mode)
moodHappy?: number | null;
moodSad?: number | null;
moodRelaxed?: number | null;
moodAggressive?: number | null;
moodParty?: number | null;
moodAcoustic?: number | null;
moodElectronic?: number | null;
// Metadata
analysisStatus?: string | null;
analysisMode?: string | null;
analyzedAt?: string | null;
// Tags
moodTags?: string[];
essentiaGenres?: string[];
lastfmTags?: string[];
}
```
### Feature Display Configuration
Recommended configuration for displaying features in UI:
```typescript
const FEATURE_CONFIG = [
{
key: "energy",
label: "Energy",
icon: "Zap", // lucide-react icon
min: 0,
max: 1,
lowLabel: "Calm",
highLabel: "Intense",
},
{
key: "valence",
label: "Mood",
icon: "Heart",
min: 0,
max: 1,
lowLabel: "Melancholic",
highLabel: "Happy",
},
{
key: "danceability",
label: "Groove",
icon: "Footprints",
min: 0,
max: 1,
lowLabel: "Freeform",
highLabel: "Danceable",
},
{
key: "bpm",
label: "Tempo",
icon: "Gauge",
min: 60,
max: 180,
lowLabel: "Slow",
highLabel: "Fast",
unit: "BPM",
},
{
key: "arousal",
label: "Arousal",
icon: "AudioWaveform",
min: 0,
max: 1,
lowLabel: "Peaceful",
highLabel: "Energetic",
},
];
const ML_MOOD_CONFIG = [
{ key: "moodHappy", label: "Happy", icon: "Smile", color: "yellow-400" },
{ key: "moodSad", label: "Sad", icon: "Frown", color: "blue-400" },
{ key: "moodRelaxed", label: "Relaxed", icon: "Coffee", color: "green-400" },
{ key: "moodAggressive", label: "Aggressive", icon: "Flame", color: "red-400" },
{ key: "moodParty", label: "Party", icon: "PartyPopper", color: "pink-400" },
{ key: "moodAcoustic", label: "Acoustic", icon: "Guitar", color: "amber-400" },
{ key: "moodElectronic", label: "Electronic", icon: "Radio", color: "purple-400" },
];
```
---
## Vibe Matching Algorithm
### Feature Vector Construction
The system builds a **13-dimensional feature vector** for each track:
```typescript
const buildFeatureVector = (track: AudioFeatures) => [
// ML Mood predictions (7 features) - 1.3x weight for semantic importance
getMoodValue(track.moodHappy, 0.5) * 1.3,
getMoodValue(track.moodSad, 0.5) * 1.3,
getMoodValue(track.moodRelaxed, 0.5) * 1.3,
getMoodValue(track.moodAggressive, 0.5) * 1.3,
getMoodValue(track.moodParty, 0.5) * 1.3,
getMoodValue(track.moodAcoustic, 0.5) * 1.3,
getMoodValue(track.moodElectronic, 0.5) * 1.3,
// Audio features (5 features)
track.energy ?? 0.5,
calculateEnhancedArousal(track),
track.danceabilityMl ?? track.danceability ?? 0.5,
track.instrumentalness ?? 0.5,
// BPM (octave-aware normalization)
1 - octaveAwareBPMDistance(track.bpm ?? 120, 120),
// Valence
calculateEnhancedValence(track),
];
// Helper: Get mood value with fallback
const getMoodValue = (value: number | null | undefined, fallback: number) =>
value ?? fallback;
```
### Cosine Similarity Calculation
Tracks are compared using cosine similarity:
```typescript
const cosineSimilarity = (vectorA: number[], vectorB: number[]): number => {
let dotProduct = 0;
let magA = 0;
let magB = 0;
for (let i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
magA += vectorA[i] * vectorA[i];
magB += vectorB[i] * vectorB[i];
}
return dotProduct / (Math.sqrt(magA) * Math.sqrt(magB));
};
```
### Tag/Genre Bonus
Additional boost for shared tags:
```typescript
const computeTagBonus = (
sourceTags: string[],
sourceGenres: string[],
trackTags: string[],
trackGenres: string[]
): number => {
const sourceSet = new Set(
[...sourceTags, ...sourceGenres].map(t => t.toLowerCase())
);
const trackSet = new Set(
[...trackTags, ...trackGenres].map(t => t.toLowerCase())
);
const overlap = [...sourceSet].filter(tag => trackSet.has(tag)).length;
return Math.min(0.05, overlap * 0.01); // Max 5% bonus
};
```
### Final Score
```typescript
const finalScore = cosineSimilarity(sourceVector, targetVector) * 0.95 + tagBonus;
```
### Matching Thresholds
| Mode | Minimum Similarity |
|------|-------------------|
| Enhanced | 40% |
| Standard | 50% |
Lower threshold for Enhanced mode because ML predictions provide more nuanced differentiation.
### Octave-Aware BPM Matching
Treats harmonically related tempos as similar (60 BPM ≈ 120 BPM ≈ 240 BPM):
```typescript
const octaveAwareBPMDistance = (bpm1: number, bpm2: number): number => {
const normalizeToOctave = (bpm: number): number => {
while (bpm < 77) bpm *= 2;
while (bpm > 154) bpm /= 2;
return bpm;
};
const norm1 = normalizeToOctave(bpm1);
const norm2 = normalizeToOctave(bpm2);
const logDistance = Math.abs(Math.log2(norm1) - Math.log2(norm2));
return Math.min(logDistance, 1);
};
```
---
## API Endpoints
### Get Track Audio Features
```
GET /api/tracks/:id/features
```
Response:
```json
{
"bpm": 128.5,
"energy": 0.78,
"valence": 0.65,
"arousal": 0.72,
"danceability": 0.85,
"key": "C",
"keyScale": "major",
"moodHappy": 0.72,
"moodSad": 0.15,
"moodRelaxed": 0.28,
"moodAggressive": 0.45,
"moodParty": 0.68,
"moodAcoustic": 0.12,
"moodElectronic": 0.78,
"analysisMode": "enhanced",
"analysisStatus": "completed"
}
```
### Find Similar Tracks (Vibe Match)
```
GET /api/library/vibe-match?trackId=:id&limit=20
```
Response:
```json
{
"source": { /* track with features */ },
"matches": [
{
"track": { /* track data */ },
"similarity": 0.87,
"features": { /* audio features */ }
}
]
}
```
### Generate Mood Mix
```
POST /api/mixes/mood
```
Request:
```json
{
"valence": { "min": 0.6, "max": 1.0 },
"energy": { "min": 0.5, "max": 0.8 },
"danceability": { "min": 0.7, "max": 1.0 },
"bpm": { "min": 100, "max": 140 },
"limit": 15
}
```
### Get Mood Presets
```
GET /api/mixes/mood-presets
```
Response:
```json
[
{
"id": "chill",
"name": "Chill Vibes",
"color": "from-blue-600 to-purple-600",
"params": {
"valence": { "min": 0.3, "max": 0.7 },
"energy": { "min": 0.1, "max": 0.4 }
}
}
]
```
---
## Frontend Integration Guide
### Displaying Feature Values
Normalize values for consistent display:
```typescript
function normalizeValue(
value: number | null | undefined,
min: number,
max: number
): number {
if (value === null || value === undefined) return 0;
return Math.max(0, Math.min(1, (value - min) / (max - min)));
}
// Usage
const normalizedBpm = normalizeValue(track.bpm, 60, 180);
const normalizedEnergy = normalizeValue(track.energy, 0, 1);
```
### Calculating Match Scores
```typescript
function calculateFeatureMatch(
sourceVal: number | null,
currentVal: number | null,
min: number,
max: number
): { diff: number; match: number } {
const sourceNorm = normalizeValue(sourceVal, min, max);
const currentNorm = normalizeValue(currentVal, min, max);
const diff = Math.abs(sourceNorm - currentNorm);
const match = Math.round((1 - diff) * 100);
return { diff, match };
}
```
### Match Score Color Coding
```typescript
function getMatchColor(matchPercent: number): string {
if (matchPercent >= 80) return "text-green-400"; // Excellent
if (matchPercent >= 60) return "text-yellow-400"; // Good
return "text-red-400"; // Different
}
function getMatchDescription(matchPercent: number): string {
if (matchPercent >= 80) return "Excellent match - very similar vibe";
if (matchPercent >= 60) return "Good match - similar energy";
return "Different vibe - exploring variety";
}
```
### Visualization Recommendations
#### 1. Radar Chart (Spider Graph)
Best for comparing multiple features at once. Shows source track (dashed line) vs current track (solid fill).
#### 2. Progress Bars
Best for individual feature comparison with source marker overlay.
#### 3. Mood Grid
4x2 or 4x4 grid of ML mood indicators with percentage matches.
#### 4. Valence-Arousal Quadrant
2D scatter plot with:
- X-axis: Valence (sad → happy)
- Y-axis: Arousal (calm → energetic)
Quadrants:
- Top-right: Happy + Energetic (Party)
- Top-left: Sad + Energetic (Angry/Tense)
- Bottom-right: Happy + Calm (Peaceful)
- Bottom-left: Sad + Calm (Melancholic)
---
## Existing Components Reference
### VibeOverlay
Location: `frontend/components/player/VibeOverlay.tsx`
Full-featured overlay showing:
- Overall match percentage
- Feature-by-feature comparison bars
- ML mood grid (enhanced mode)
- Source vs current legend
### VibeGraph
Location: `frontend/components/player/VibeGraph.tsx`
Compact radar chart for:
- 4-feature comparison (Energy, Mood, Dance, BPM)
- Match score badge
- Inline display in player
### MoodMixer
Location: `frontend/components/MoodMixer.tsx`
Modal for:
- Quick mood presets
- Custom range sliders
- Generating mood-based playlists
---
## Special Considerations
### Out-of-Distribution (OOD) Detection
The MusiCNN model was trained on pop/rock music. For other genres (classical, ambient, jazz), predictions may be unreliable. The backend normalizes these cases:
**Detection criteria:**
- All mood values > 0.7 with low variance
- All mood values clustered around 0.5
**UI Recommendation:** Show a subtle indicator when `analysisMode` is "standard" or when predictions seem unreliable.
### Handling Missing Data
Always provide fallback values:
```typescript
const safeFeatures = {
energy: track.energy ?? 0.5,
valence: track.valence ?? 0.5,
bpm: track.bpm ?? 120,
// ... etc
};
```
### Analysis Status States
| Status | UI Treatment |
|--------|--------------|
| `pending` | Show "Analyzing..." with spinner |
| `processing` | Show progress indicator |
| `completed` | Show full vibe data |
| `failed` | Show fallback/retry option |
---
## Quick Reference: Value Ranges
| Metric | Min | Max | Neutral |
|--------|-----|-----|---------|
| All mood* | 0 | 1 | 0.5 |
| energy | 0 | 1 | 0.5 |
| valence | 0 | 1 | 0.5 |
| arousal | 0 | 1 | 0.5 |
| danceability | 0 | 1 | 0.5 |
| bpm | 60 | 200 | 120 |
| keyStrength | 0 | 1 | - |
---
## File Locations
| Component | Path |
|-----------|------|
| Audio Analyzer (Python) | `services/audio-analyzer/analyzer.py` |
| Vibe Matching Logic | `backend/src/routes/library.ts` |
| Database Schema | `backend/prisma/schema.prisma` |
| Frontend Vibe Overlay | `frontend/components/player/VibeOverlay.tsx` |
| Frontend Vibe Graph | `frontend/components/player/VibeGraph.tsx` |
| Mood Mixer | `frontend/components/MoodMixer.tsx` |
| Audio State Context | `frontend/lib/audio-state-context.tsx` |
---
## Research Background
The Vibe System's valence and arousal calculations are informed by music psychology research:
### Valence (Emotional Positivity)
**Key Finding:** Mode/tonality is the strongest predictor of perceived valence in music.
- **Lee et al. (ICASSP 2020)** - Demonstrated that musical mode (major vs. minor) has the highest correlation with listener-reported valence
- Major keys contribute positively (+0.3 in our formula), minor keys negatively (-0.2)
- This aligns with centuries of music theory and empirical psychology research
### Arousal (Energy/Excitement)
**Key Finding:** The "electronic" mood prediction from ML models is unreliable for arousal calculation.
- **Grekow (2018)** - Found that direct energy and tempo features outperform genre-based predictions for arousal
- Our implementation replaces the "electronic" mood with explicit energy and BPM contributions
- This provides more consistent arousal predictions across diverse genres
### Feature Weights
The specific weights in our formulas (e.g., 0.35 for happy mood, 0.25 for energy) were tuned through:
1. Initial values from published research
2. Empirical testing on a diverse music library
3. User feedback on vibe matching accuracy
### References
- Lee, J., et al. (2020). "Music Emotion Recognition Using Valence-Arousal Regression." ICASSP 2020.
- Grekow, J. (2018). "Music Emotion Maps in Arousal-Valence Space." IFIP International Conference on Computer Information Systems and Industrial Management.