18 KiB
Lidify Vibe System Documentation
This document provides comprehensive documentation of the Vibe System - how Lidify analyzes tracks, collects audio metrics, and compares them for vibe matching. Use this as a reference for building frontend interfaces.
Table of Contents
- Overview
- Metrics Collected
- Data Structures
- Vibe Matching Algorithm
- API Endpoints
- Frontend Integration Guide
- Existing Components Reference
Overview
The Vibe System uses a combination of audio signal analysis and ML-based mood prediction to understand the "feel" of a track. It operates in two modes:
| Mode | Description | Accuracy |
|---|---|---|
| Standard | Heuristic-based analysis using audio signal features (BPM, key, energy) | Good |
| Enhanced | ML-based analysis using MusiCNN neural network for mood prediction | Best |
The system enables:
- Finding tracks with similar vibes to a source track
- Generating mood-based playlists
- Visualizing track characteristics in real-time
Metrics Collected
Core Audio Features (Always Available)
These are extracted directly from audio signal analysis at 44.1kHz:
| Metric | Type | Range | Description |
|---|---|---|---|
bpm |
Float | 60-200 | Tempo in beats per minute |
beatsCount |
Int | 0+ | Total number of beats detected |
key |
String | "C", "F#", etc. | Musical key |
keyScale |
String | "major" | "minor" | Major or minor tonality |
keyStrength |
Float | 0-1 | Confidence of key detection |
energy |
Float | 0-1 | RMS-based intensity level |
loudness |
Float | dB | Average loudness |
dynamicRange |
Float | dB | Difference between quietest and loudest |
danceability |
Float | 0-1 | Rhythm regularity and groove potential |
ML Mood Predictions (Enhanced Mode)
Seven core mood dimensions predicted by the MusiCNN model:
| Metric | Type | Range | Description | Icon Suggestion |
|---|---|---|---|---|
moodHappy |
Float | 0-1 | Happiness/cheerfulness probability | Smile |
moodSad |
Float | 0-1 | Sadness/melancholy probability | Frown |
moodRelaxed |
Float | 0-1 | Calm/peaceful probability | Coffee |
moodAggressive |
Float | 0-1 | Intensity/aggression probability | Flame |
moodParty |
Float | 0-1 | Upbeat/party probability | PartyPopper |
moodAcoustic |
Float | 0-1 | Acoustic instrumentation probability | Guitar |
moodElectronic |
Float | 0-1 | Electronic/synthetic probability | Radio |
Derived Features (Computed)
These are calculated from the ML predictions:
Valence (Emotional Positivity)
// Formula:
valence = (
moodHappy * 0.5 + // Happy mood (50% weight)
moodParty * 0.3 + // Party mood (30% weight)
(1 - moodSad) * 0.2 // Inverse of sadness (20% weight)
)
| Value | Interpretation |
|---|---|
| 0.0 - 0.3 | Melancholic, sad |
| 0.3 - 0.6 | Neutral, balanced |
| 0.6 - 1.0 | Happy, positive |
Arousal (Energy/Excitement Level)
// Formula:
arousal = (
moodAggressive * 0.35 + // Aggressive mood (35% weight)
moodParty * 0.25 + // Party mood (25% weight)
moodElectronic * 0.2 + // Electronic sound (20% weight)
(1 - moodRelaxed) * 0.1 + // Inverse of relaxation (10% weight)
(1 - moodAcoustic) * 0.1 // Inverse of acoustic (10% weight)
)
| Value | Interpretation |
|---|---|
| 0.0 - 0.3 | Calm, peaceful |
| 0.3 - 0.6 | Moderate energy |
| 0.6 - 1.0 | High energy, intense |
Additional Features
| Metric | Type | Range | Description |
|---|---|---|---|
instrumentalness |
Float | 0-1 | Voice presence (0=vocal, 1=instrumental) |
acousticness |
Float | 0-1 | Acoustic vs. processed sound |
speechiness |
Float | 0-1 | Spoken word detection |
danceabilityMl |
Float | 0-1 | ML-based danceability (more accurate) |
Metadata & Tags
| Field | Type | Description |
|---|---|---|
moodTags |
String[] | Derived mood labels (e.g., ["chill", "happy"]) |
essentiaGenres |
String[] | ML-predicted genres (e.g., ["rock", "electronic"]) |
lastfmTags |
String[] | User-generated tags from Last.fm |
analysisStatus |
String | "pending" | "processing" | "completed" | "failed" |
analysisMode |
String | "standard" | "enhanced" |
analyzedAt |
DateTime | When analysis was performed |
Data Structures
TypeScript Interface
interface AudioFeatures {
// Core audio features
bpm?: number | null;
beatsCount?: number | null;
key?: string | null;
keyScale?: string | null;
keyStrength?: number | null;
energy?: number | null;
loudness?: number | null;
dynamicRange?: number | null;
danceability?: number | null;
// Derived features
valence?: number | null;
arousal?: number | null;
// Additional features
instrumentalness?: number | null;
acousticness?: number | null;
speechiness?: number | null;
danceabilityMl?: number | null;
// ML Mood predictions (Enhanced mode)
moodHappy?: number | null;
moodSad?: number | null;
moodRelaxed?: number | null;
moodAggressive?: number | null;
moodParty?: number | null;
moodAcoustic?: number | null;
moodElectronic?: number | null;
// Metadata
analysisStatus?: string | null;
analysisMode?: string | null;
analyzedAt?: string | null;
// Tags
moodTags?: string[];
essentiaGenres?: string[];
lastfmTags?: string[];
}
Feature Display Configuration
Recommended configuration for displaying features in UI:
const FEATURE_CONFIG = [
{
key: "energy",
label: "Energy",
icon: "Zap", // lucide-react icon
min: 0,
max: 1,
lowLabel: "Calm",
highLabel: "Intense",
},
{
key: "valence",
label: "Mood",
icon: "Heart",
min: 0,
max: 1,
lowLabel: "Melancholic",
highLabel: "Happy",
},
{
key: "danceability",
label: "Groove",
icon: "Footprints",
min: 0,
max: 1,
lowLabel: "Freeform",
highLabel: "Danceable",
},
{
key: "bpm",
label: "Tempo",
icon: "Gauge",
min: 60,
max: 180,
lowLabel: "Slow",
highLabel: "Fast",
unit: "BPM",
},
{
key: "arousal",
label: "Arousal",
icon: "AudioWaveform",
min: 0,
max: 1,
lowLabel: "Peaceful",
highLabel: "Energetic",
},
];
const ML_MOOD_CONFIG = [
{ key: "moodHappy", label: "Happy", icon: "Smile", color: "yellow-400" },
{ key: "moodSad", label: "Sad", icon: "Frown", color: "blue-400" },
{ key: "moodRelaxed", label: "Relaxed", icon: "Coffee", color: "green-400" },
{ key: "moodAggressive", label: "Aggressive", icon: "Flame", color: "red-400" },
{ key: "moodParty", label: "Party", icon: "PartyPopper", color: "pink-400" },
{ key: "moodAcoustic", label: "Acoustic", icon: "Guitar", color: "amber-400" },
{ key: "moodElectronic", label: "Electronic", icon: "Radio", color: "purple-400" },
];
Vibe Matching Algorithm
Feature Vector Construction
The system builds a 13-dimensional feature vector for each track:
const buildFeatureVector = (track: AudioFeatures) => [
// ML Mood predictions (7 features) - 1.3x weight for semantic importance
getMoodValue(track.moodHappy, 0.5) * 1.3,
getMoodValue(track.moodSad, 0.5) * 1.3,
getMoodValue(track.moodRelaxed, 0.5) * 1.3,
getMoodValue(track.moodAggressive, 0.5) * 1.3,
getMoodValue(track.moodParty, 0.5) * 1.3,
getMoodValue(track.moodAcoustic, 0.5) * 1.3,
getMoodValue(track.moodElectronic, 0.5) * 1.3,
// Audio features (5 features)
track.energy ?? 0.5,
calculateEnhancedArousal(track),
track.danceabilityMl ?? track.danceability ?? 0.5,
track.instrumentalness ?? 0.5,
// BPM (octave-aware normalization)
1 - octaveAwareBPMDistance(track.bpm ?? 120, 120),
// Valence
calculateEnhancedValence(track),
];
// Helper: Get mood value with fallback
const getMoodValue = (value: number | null | undefined, fallback: number) =>
value ?? fallback;
Cosine Similarity Calculation
Tracks are compared using cosine similarity:
const cosineSimilarity = (vectorA: number[], vectorB: number[]): number => {
let dotProduct = 0;
let magA = 0;
let magB = 0;
for (let i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
magA += vectorA[i] * vectorA[i];
magB += vectorB[i] * vectorB[i];
}
return dotProduct / (Math.sqrt(magA) * Math.sqrt(magB));
};
Tag/Genre Bonus
Additional boost for shared tags:
const computeTagBonus = (
sourceTags: string[],
sourceGenres: string[],
trackTags: string[],
trackGenres: string[]
): number => {
const sourceSet = new Set(
[...sourceTags, ...sourceGenres].map(t => t.toLowerCase())
);
const trackSet = new Set(
[...trackTags, ...trackGenres].map(t => t.toLowerCase())
);
const overlap = [...sourceSet].filter(tag => trackSet.has(tag)).length;
return Math.min(0.05, overlap * 0.01); // Max 5% bonus
};
Final Score
const finalScore = cosineSimilarity(sourceVector, targetVector) * 0.95 + tagBonus;
Matching Thresholds
| Mode | Minimum Similarity |
|---|---|
| Enhanced | 40% |
| Standard | 50% |
Lower threshold for Enhanced mode because ML predictions provide more nuanced differentiation.
Octave-Aware BPM Matching
Treats harmonically related tempos as similar (60 BPM ≈ 120 BPM ≈ 240 BPM):
const octaveAwareBPMDistance = (bpm1: number, bpm2: number): number => {
const normalizeToOctave = (bpm: number): number => {
while (bpm < 77) bpm *= 2;
while (bpm > 154) bpm /= 2;
return bpm;
};
const norm1 = normalizeToOctave(bpm1);
const norm2 = normalizeToOctave(bpm2);
const logDistance = Math.abs(Math.log2(norm1) - Math.log2(norm2));
return Math.min(logDistance, 1);
};
API Endpoints
Get Track Audio Features
GET /api/tracks/:id/features
Response:
{
"bpm": 128.5,
"energy": 0.78,
"valence": 0.65,
"arousal": 0.72,
"danceability": 0.85,
"key": "C",
"keyScale": "major",
"moodHappy": 0.72,
"moodSad": 0.15,
"moodRelaxed": 0.28,
"moodAggressive": 0.45,
"moodParty": 0.68,
"moodAcoustic": 0.12,
"moodElectronic": 0.78,
"analysisMode": "enhanced",
"analysisStatus": "completed"
}
Find Similar Tracks (Vibe Match)
GET /api/library/vibe-match?trackId=:id&limit=20
Response:
{
"source": { /* track with features */ },
"matches": [
{
"track": { /* track data */ },
"similarity": 0.87,
"features": { /* audio features */ }
}
]
}
Generate Mood Mix
POST /api/mixes/mood
Request:
{
"valence": { "min": 0.6, "max": 1.0 },
"energy": { "min": 0.5, "max": 0.8 },
"danceability": { "min": 0.7, "max": 1.0 },
"bpm": { "min": 100, "max": 140 },
"limit": 15
}
Get Mood Presets
GET /api/mixes/mood-presets
Response:
[
{
"id": "chill",
"name": "Chill Vibes",
"color": "from-blue-600 to-purple-600",
"params": {
"valence": { "min": 0.3, "max": 0.7 },
"energy": { "min": 0.1, "max": 0.4 }
}
}
]
Frontend Integration Guide
Displaying Feature Values
Normalize values for consistent display:
function normalizeValue(
value: number | null | undefined,
min: number,
max: number
): number {
if (value === null || value === undefined) return 0;
return Math.max(0, Math.min(1, (value - min) / (max - min)));
}
// Usage
const normalizedBpm = normalizeValue(track.bpm, 60, 180);
const normalizedEnergy = normalizeValue(track.energy, 0, 1);
Calculating Match Scores
function calculateFeatureMatch(
sourceVal: number | null,
currentVal: number | null,
min: number,
max: number
): { diff: number; match: number } {
const sourceNorm = normalizeValue(sourceVal, min, max);
const currentNorm = normalizeValue(currentVal, min, max);
const diff = Math.abs(sourceNorm - currentNorm);
const match = Math.round((1 - diff) * 100);
return { diff, match };
}
Match Score Color Coding
function getMatchColor(matchPercent: number): string {
if (matchPercent >= 80) return "text-green-400"; // Excellent
if (matchPercent >= 60) return "text-yellow-400"; // Good
return "text-red-400"; // Different
}
function getMatchDescription(matchPercent: number): string {
if (matchPercent >= 80) return "Excellent match - very similar vibe";
if (matchPercent >= 60) return "Good match - similar energy";
return "Different vibe - exploring variety";
}
Visualization Recommendations
1. Radar Chart (Spider Graph)
Best for comparing multiple features at once. Shows source track (dashed line) vs current track (solid fill).
2. Progress Bars
Best for individual feature comparison with source marker overlay.
3. Mood Grid
4x2 or 4x4 grid of ML mood indicators with percentage matches.
4. Valence-Arousal Quadrant
2D scatter plot with:
- X-axis: Valence (sad → happy)
- Y-axis: Arousal (calm → energetic)
Quadrants:
- Top-right: Happy + Energetic (Party)
- Top-left: Sad + Energetic (Angry/Tense)
- Bottom-right: Happy + Calm (Peaceful)
- Bottom-left: Sad + Calm (Melancholic)
Existing Components Reference
VibeOverlay
Location: frontend/components/player/VibeOverlay.tsx
Full-featured overlay showing:
- Overall match percentage
- Feature-by-feature comparison bars
- ML mood grid (enhanced mode)
- Source vs current legend
VibeGraph
Location: frontend/components/player/VibeGraph.tsx
Compact radar chart for:
- 4-feature comparison (Energy, Mood, Dance, BPM)
- Match score badge
- Inline display in player
MoodMixer
Location: frontend/components/MoodMixer.tsx
Modal for:
- Quick mood presets
- Custom range sliders
- Generating mood-based playlists
Special Considerations
Out-of-Distribution (OOD) Detection
The MusiCNN model was trained on pop/rock music. For other genres (classical, ambient, jazz), predictions may be unreliable. The backend normalizes these cases:
Detection criteria:
- All mood values > 0.7 with low variance
- All mood values clustered around 0.5
UI Recommendation: Show a subtle indicator when analysisMode is "standard" or when predictions seem unreliable.
Handling Missing Data
Always provide fallback values:
const safeFeatures = {
energy: track.energy ?? 0.5,
valence: track.valence ?? 0.5,
bpm: track.bpm ?? 120,
// ... etc
};
Analysis Status States
| Status | UI Treatment |
|---|---|
pending |
Show "Analyzing..." with spinner |
processing |
Show progress indicator |
completed |
Show full vibe data |
failed |
Show fallback/retry option |
Quick Reference: Value Ranges
| Metric | Min | Max | Neutral |
|---|---|---|---|
| All mood* | 0 | 1 | 0.5 |
| energy | 0 | 1 | 0.5 |
| valence | 0 | 1 | 0.5 |
| arousal | 0 | 1 | 0.5 |
| danceability | 0 | 1 | 0.5 |
| bpm | 60 | 200 | 120 |
| keyStrength | 0 | 1 | - |
File Locations
| Component | Path |
|---|---|
| Audio Analyzer (Python) | services/audio-analyzer/analyzer.py |
| Vibe Matching Logic | backend/src/routes/library.ts |
| Database Schema | backend/prisma/schema.prisma |
| Frontend Vibe Overlay | frontend/components/player/VibeOverlay.tsx |
| Frontend Vibe Graph | frontend/components/player/VibeGraph.tsx |
| Mood Mixer | frontend/components/MoodMixer.tsx |
| Audio State Context | frontend/lib/audio-state-context.tsx |
Research Background
The Vibe System's valence and arousal calculations are informed by music psychology research:
Valence (Emotional Positivity)
Key Finding: Mode/tonality is the strongest predictor of perceived valence in music.
- Lee et al. (ICASSP 2020) - Demonstrated that musical mode (major vs. minor) has the highest correlation with listener-reported valence
- Major keys contribute positively (+0.3 in our formula), minor keys negatively (-0.2)
- This aligns with centuries of music theory and empirical psychology research
Arousal (Energy/Excitement)
Key Finding: The "electronic" mood prediction from ML models is unreliable for arousal calculation.
- Grekow (2018) - Found that direct energy and tempo features outperform genre-based predictions for arousal
- Our implementation replaces the "electronic" mood with explicit energy and BPM contributions
- This provides more consistent arousal predictions across diverse genres
Feature Weights
The specific weights in our formulas (e.g., 0.35 for happy mood, 0.25 for energy) were tuned through:
- Initial values from published research
- Empirical testing on a diverse music library
- User feedback on vibe matching accuracy
References
- Lee, J., et al. (2020). "Music Emotion Recognition Using Valence-Arousal Regression." ICASSP 2020.
- Grekow, J. (2018). "Music Emotion Maps in Arousal-Valence Space." IFIP International Conference on Computer Information Systems and Industrial Management.