652 lines
18 KiB
Markdown
652 lines
18 KiB
Markdown
# Lidify Vibe System Documentation
|
|
|
|
This document provides comprehensive documentation of the Vibe System - how Lidify analyzes tracks, collects audio metrics, and compares them for vibe matching. Use this as a reference for building frontend interfaces.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Metrics Collected](#metrics-collected)
|
|
3. [Data Structures](#data-structures)
|
|
4. [Vibe Matching Algorithm](#vibe-matching-algorithm)
|
|
5. [API Endpoints](#api-endpoints)
|
|
6. [Frontend Integration Guide](#frontend-integration-guide)
|
|
7. [Existing Components Reference](#existing-components-reference)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The Vibe System uses a combination of **audio signal analysis** and **ML-based mood prediction** to understand the "feel" of a track. It operates in two modes:
|
|
|
|
| Mode | Description | Accuracy |
|
|
|------|-------------|----------|
|
|
| **Standard** | Heuristic-based analysis using audio signal features (BPM, key, energy) | Good |
|
|
| **Enhanced** | ML-based analysis using MusiCNN neural network for mood prediction | Best |
|
|
|
|
The system enables:
|
|
- Finding tracks with similar vibes to a source track
|
|
- Generating mood-based playlists
|
|
- Visualizing track characteristics in real-time
|
|
|
|
---
|
|
|
|
## Metrics Collected
|
|
|
|
### Core Audio Features (Always Available)
|
|
|
|
These are extracted directly from audio signal analysis at 44.1kHz:
|
|
|
|
| Metric | Type | Range | Description |
|
|
|--------|------|-------|-------------|
|
|
| `bpm` | Float | 60-200 | Tempo in beats per minute |
|
|
| `beatsCount` | Int | 0+ | Total number of beats detected |
|
|
| `key` | String | "C", "F#", etc. | Musical key |
|
|
| `keyScale` | String | "major" \| "minor" | Major or minor tonality |
|
|
| `keyStrength` | Float | 0-1 | Confidence of key detection |
|
|
| `energy` | Float | 0-1 | RMS-based intensity level |
|
|
| `loudness` | Float | dB | Average loudness |
|
|
| `dynamicRange` | Float | dB | Difference between quietest and loudest |
|
|
| `danceability` | Float | 0-1 | Rhythm regularity and groove potential |
|
|
|
|
### ML Mood Predictions (Enhanced Mode)
|
|
|
|
Seven core mood dimensions predicted by the MusiCNN model:
|
|
|
|
| Metric | Type | Range | Description | Icon Suggestion |
|
|
|--------|------|-------|-------------|-----------------|
|
|
| `moodHappy` | Float | 0-1 | Happiness/cheerfulness probability | Smile |
|
|
| `moodSad` | Float | 0-1 | Sadness/melancholy probability | Frown |
|
|
| `moodRelaxed` | Float | 0-1 | Calm/peaceful probability | Coffee |
|
|
| `moodAggressive` | Float | 0-1 | Intensity/aggression probability | Flame |
|
|
| `moodParty` | Float | 0-1 | Upbeat/party probability | PartyPopper |
|
|
| `moodAcoustic` | Float | 0-1 | Acoustic instrumentation probability | Guitar |
|
|
| `moodElectronic` | Float | 0-1 | Electronic/synthetic probability | Radio |
|
|
|
|
### Derived Features (Computed)
|
|
|
|
These are calculated from the ML predictions:
|
|
|
|
#### Valence (Emotional Positivity)
|
|
|
|
```typescript
|
|
// Formula:
|
|
valence = (
|
|
moodHappy * 0.5 + // Happy mood (50% weight)
|
|
moodParty * 0.3 + // Party mood (30% weight)
|
|
(1 - moodSad) * 0.2 // Inverse of sadness (20% weight)
|
|
)
|
|
```
|
|
|
|
| Value | Interpretation |
|
|
|-------|----------------|
|
|
| 0.0 - 0.3 | Melancholic, sad |
|
|
| 0.3 - 0.6 | Neutral, balanced |
|
|
| 0.6 - 1.0 | Happy, positive |
|
|
|
|
#### Arousal (Energy/Excitement Level)
|
|
|
|
```typescript
|
|
// Formula:
|
|
arousal = (
|
|
moodAggressive * 0.35 + // Aggressive mood (35% weight)
|
|
moodParty * 0.25 + // Party mood (25% weight)
|
|
moodElectronic * 0.2 + // Electronic sound (20% weight)
|
|
(1 - moodRelaxed) * 0.1 + // Inverse of relaxation (10% weight)
|
|
(1 - moodAcoustic) * 0.1 // Inverse of acoustic (10% weight)
|
|
)
|
|
```
|
|
|
|
| Value | Interpretation |
|
|
|-------|----------------|
|
|
| 0.0 - 0.3 | Calm, peaceful |
|
|
| 0.3 - 0.6 | Moderate energy |
|
|
| 0.6 - 1.0 | High energy, intense |
|
|
|
|
### Additional Features
|
|
|
|
| Metric | Type | Range | Description |
|
|
|--------|------|-------|-------------|
|
|
| `instrumentalness` | Float | 0-1 | Voice presence (0=vocal, 1=instrumental) |
|
|
| `acousticness` | Float | 0-1 | Acoustic vs. processed sound |
|
|
| `speechiness` | Float | 0-1 | Spoken word detection |
|
|
| `danceabilityMl` | Float | 0-1 | ML-based danceability (more accurate) |
|
|
|
|
### Metadata & Tags
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `moodTags` | String[] | Derived mood labels (e.g., ["chill", "happy"]) |
|
|
| `essentiaGenres` | String[] | ML-predicted genres (e.g., ["rock", "electronic"]) |
|
|
| `lastfmTags` | String[] | User-generated tags from Last.fm |
|
|
| `analysisStatus` | String | "pending" \| "processing" \| "completed" \| "failed" |
|
|
| `analysisMode` | String | "standard" \| "enhanced" |
|
|
| `analyzedAt` | DateTime | When analysis was performed |
|
|
|
|
---
|
|
|
|
## Data Structures
|
|
|
|
### TypeScript Interface
|
|
|
|
```typescript
|
|
interface AudioFeatures {
|
|
// Core audio features
|
|
bpm?: number | null;
|
|
beatsCount?: number | null;
|
|
key?: string | null;
|
|
keyScale?: string | null;
|
|
keyStrength?: number | null;
|
|
energy?: number | null;
|
|
loudness?: number | null;
|
|
dynamicRange?: number | null;
|
|
danceability?: number | null;
|
|
|
|
// Derived features
|
|
valence?: number | null;
|
|
arousal?: number | null;
|
|
|
|
// Additional features
|
|
instrumentalness?: number | null;
|
|
acousticness?: number | null;
|
|
speechiness?: number | null;
|
|
danceabilityMl?: number | null;
|
|
|
|
// ML Mood predictions (Enhanced mode)
|
|
moodHappy?: number | null;
|
|
moodSad?: number | null;
|
|
moodRelaxed?: number | null;
|
|
moodAggressive?: number | null;
|
|
moodParty?: number | null;
|
|
moodAcoustic?: number | null;
|
|
moodElectronic?: number | null;
|
|
|
|
// Metadata
|
|
analysisStatus?: string | null;
|
|
analysisMode?: string | null;
|
|
analyzedAt?: string | null;
|
|
|
|
// Tags
|
|
moodTags?: string[];
|
|
essentiaGenres?: string[];
|
|
lastfmTags?: string[];
|
|
}
|
|
```
|
|
|
|
### Feature Display Configuration
|
|
|
|
Recommended configuration for displaying features in UI:
|
|
|
|
```typescript
|
|
const FEATURE_CONFIG = [
|
|
{
|
|
key: "energy",
|
|
label: "Energy",
|
|
icon: "Zap", // lucide-react icon
|
|
min: 0,
|
|
max: 1,
|
|
lowLabel: "Calm",
|
|
highLabel: "Intense",
|
|
},
|
|
{
|
|
key: "valence",
|
|
label: "Mood",
|
|
icon: "Heart",
|
|
min: 0,
|
|
max: 1,
|
|
lowLabel: "Melancholic",
|
|
highLabel: "Happy",
|
|
},
|
|
{
|
|
key: "danceability",
|
|
label: "Groove",
|
|
icon: "Footprints",
|
|
min: 0,
|
|
max: 1,
|
|
lowLabel: "Freeform",
|
|
highLabel: "Danceable",
|
|
},
|
|
{
|
|
key: "bpm",
|
|
label: "Tempo",
|
|
icon: "Gauge",
|
|
min: 60,
|
|
max: 180,
|
|
lowLabel: "Slow",
|
|
highLabel: "Fast",
|
|
unit: "BPM",
|
|
},
|
|
{
|
|
key: "arousal",
|
|
label: "Arousal",
|
|
icon: "AudioWaveform",
|
|
min: 0,
|
|
max: 1,
|
|
lowLabel: "Peaceful",
|
|
highLabel: "Energetic",
|
|
},
|
|
];
|
|
|
|
const ML_MOOD_CONFIG = [
|
|
{ key: "moodHappy", label: "Happy", icon: "Smile", color: "yellow-400" },
|
|
{ key: "moodSad", label: "Sad", icon: "Frown", color: "blue-400" },
|
|
{ key: "moodRelaxed", label: "Relaxed", icon: "Coffee", color: "green-400" },
|
|
{ key: "moodAggressive", label: "Aggressive", icon: "Flame", color: "red-400" },
|
|
{ key: "moodParty", label: "Party", icon: "PartyPopper", color: "pink-400" },
|
|
{ key: "moodAcoustic", label: "Acoustic", icon: "Guitar", color: "amber-400" },
|
|
{ key: "moodElectronic", label: "Electronic", icon: "Radio", color: "purple-400" },
|
|
];
|
|
```
|
|
|
|
---
|
|
|
|
## Vibe Matching Algorithm
|
|
|
|
### Feature Vector Construction
|
|
|
|
The system builds a **13-dimensional feature vector** for each track:
|
|
|
|
```typescript
|
|
const buildFeatureVector = (track: AudioFeatures) => [
|
|
// ML Mood predictions (7 features) - 1.3x weight for semantic importance
|
|
getMoodValue(track.moodHappy, 0.5) * 1.3,
|
|
getMoodValue(track.moodSad, 0.5) * 1.3,
|
|
getMoodValue(track.moodRelaxed, 0.5) * 1.3,
|
|
getMoodValue(track.moodAggressive, 0.5) * 1.3,
|
|
getMoodValue(track.moodParty, 0.5) * 1.3,
|
|
getMoodValue(track.moodAcoustic, 0.5) * 1.3,
|
|
getMoodValue(track.moodElectronic, 0.5) * 1.3,
|
|
|
|
// Audio features (5 features)
|
|
track.energy ?? 0.5,
|
|
calculateEnhancedArousal(track),
|
|
track.danceabilityMl ?? track.danceability ?? 0.5,
|
|
track.instrumentalness ?? 0.5,
|
|
|
|
// BPM (octave-aware normalization)
|
|
1 - octaveAwareBPMDistance(track.bpm ?? 120, 120),
|
|
|
|
// Valence
|
|
calculateEnhancedValence(track),
|
|
];
|
|
|
|
// Helper: Get mood value with fallback
|
|
const getMoodValue = (value: number | null | undefined, fallback: number) =>
|
|
value ?? fallback;
|
|
```
|
|
|
|
### Cosine Similarity Calculation
|
|
|
|
Tracks are compared using cosine similarity:
|
|
|
|
```typescript
|
|
const cosineSimilarity = (vectorA: number[], vectorB: number[]): number => {
|
|
let dotProduct = 0;
|
|
let magA = 0;
|
|
let magB = 0;
|
|
|
|
for (let i = 0; i < vectorA.length; i++) {
|
|
dotProduct += vectorA[i] * vectorB[i];
|
|
magA += vectorA[i] * vectorA[i];
|
|
magB += vectorB[i] * vectorB[i];
|
|
}
|
|
|
|
return dotProduct / (Math.sqrt(magA) * Math.sqrt(magB));
|
|
};
|
|
```
|
|
|
|
### Tag/Genre Bonus
|
|
|
|
Additional boost for shared tags:
|
|
|
|
```typescript
|
|
const computeTagBonus = (
|
|
sourceTags: string[],
|
|
sourceGenres: string[],
|
|
trackTags: string[],
|
|
trackGenres: string[]
|
|
): number => {
|
|
const sourceSet = new Set(
|
|
[...sourceTags, ...sourceGenres].map(t => t.toLowerCase())
|
|
);
|
|
const trackSet = new Set(
|
|
[...trackTags, ...trackGenres].map(t => t.toLowerCase())
|
|
);
|
|
|
|
const overlap = [...sourceSet].filter(tag => trackSet.has(tag)).length;
|
|
return Math.min(0.05, overlap * 0.01); // Max 5% bonus
|
|
};
|
|
```
|
|
|
|
### Final Score
|
|
|
|
```typescript
|
|
const finalScore = cosineSimilarity(sourceVector, targetVector) * 0.95 + tagBonus;
|
|
```
|
|
|
|
### Matching Thresholds
|
|
|
|
| Mode | Minimum Similarity |
|
|
|------|-------------------|
|
|
| Enhanced | 40% |
|
|
| Standard | 50% |
|
|
|
|
Lower threshold for Enhanced mode because ML predictions provide more nuanced differentiation.
|
|
|
|
### Octave-Aware BPM Matching
|
|
|
|
Treats harmonically related tempos as similar (60 BPM ≈ 120 BPM ≈ 240 BPM):
|
|
|
|
```typescript
|
|
const octaveAwareBPMDistance = (bpm1: number, bpm2: number): number => {
|
|
const normalizeToOctave = (bpm: number): number => {
|
|
while (bpm < 77) bpm *= 2;
|
|
while (bpm > 154) bpm /= 2;
|
|
return bpm;
|
|
};
|
|
|
|
const norm1 = normalizeToOctave(bpm1);
|
|
const norm2 = normalizeToOctave(bpm2);
|
|
|
|
const logDistance = Math.abs(Math.log2(norm1) - Math.log2(norm2));
|
|
return Math.min(logDistance, 1);
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### Get Track Audio Features
|
|
|
|
```
|
|
GET /api/tracks/:id/features
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"bpm": 128.5,
|
|
"energy": 0.78,
|
|
"valence": 0.65,
|
|
"arousal": 0.72,
|
|
"danceability": 0.85,
|
|
"key": "C",
|
|
"keyScale": "major",
|
|
"moodHappy": 0.72,
|
|
"moodSad": 0.15,
|
|
"moodRelaxed": 0.28,
|
|
"moodAggressive": 0.45,
|
|
"moodParty": 0.68,
|
|
"moodAcoustic": 0.12,
|
|
"moodElectronic": 0.78,
|
|
"analysisMode": "enhanced",
|
|
"analysisStatus": "completed"
|
|
}
|
|
```
|
|
|
|
### Find Similar Tracks (Vibe Match)
|
|
|
|
```
|
|
GET /api/library/vibe-match?trackId=:id&limit=20
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"source": { /* track with features */ },
|
|
"matches": [
|
|
{
|
|
"track": { /* track data */ },
|
|
"similarity": 0.87,
|
|
"features": { /* audio features */ }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Generate Mood Mix
|
|
|
|
```
|
|
POST /api/mixes/mood
|
|
```
|
|
|
|
Request:
|
|
```json
|
|
{
|
|
"valence": { "min": 0.6, "max": 1.0 },
|
|
"energy": { "min": 0.5, "max": 0.8 },
|
|
"danceability": { "min": 0.7, "max": 1.0 },
|
|
"bpm": { "min": 100, "max": 140 },
|
|
"limit": 15
|
|
}
|
|
```
|
|
|
|
### Get Mood Presets
|
|
|
|
```
|
|
GET /api/mixes/mood-presets
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
[
|
|
{
|
|
"id": "chill",
|
|
"name": "Chill Vibes",
|
|
"color": "from-blue-600 to-purple-600",
|
|
"params": {
|
|
"valence": { "min": 0.3, "max": 0.7 },
|
|
"energy": { "min": 0.1, "max": 0.4 }
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
## Frontend Integration Guide
|
|
|
|
### Displaying Feature Values
|
|
|
|
Normalize values for consistent display:
|
|
|
|
```typescript
|
|
function normalizeValue(
|
|
value: number | null | undefined,
|
|
min: number,
|
|
max: number
|
|
): number {
|
|
if (value === null || value === undefined) return 0;
|
|
return Math.max(0, Math.min(1, (value - min) / (max - min)));
|
|
}
|
|
|
|
// Usage
|
|
const normalizedBpm = normalizeValue(track.bpm, 60, 180);
|
|
const normalizedEnergy = normalizeValue(track.energy, 0, 1);
|
|
```
|
|
|
|
### Calculating Match Scores
|
|
|
|
```typescript
|
|
function calculateFeatureMatch(
|
|
sourceVal: number | null,
|
|
currentVal: number | null,
|
|
min: number,
|
|
max: number
|
|
): { diff: number; match: number } {
|
|
const sourceNorm = normalizeValue(sourceVal, min, max);
|
|
const currentNorm = normalizeValue(currentVal, min, max);
|
|
const diff = Math.abs(sourceNorm - currentNorm);
|
|
const match = Math.round((1 - diff) * 100);
|
|
|
|
return { diff, match };
|
|
}
|
|
```
|
|
|
|
### Match Score Color Coding
|
|
|
|
```typescript
|
|
function getMatchColor(matchPercent: number): string {
|
|
if (matchPercent >= 80) return "text-green-400"; // Excellent
|
|
if (matchPercent >= 60) return "text-yellow-400"; // Good
|
|
return "text-red-400"; // Different
|
|
}
|
|
|
|
function getMatchDescription(matchPercent: number): string {
|
|
if (matchPercent >= 80) return "Excellent match - very similar vibe";
|
|
if (matchPercent >= 60) return "Good match - similar energy";
|
|
return "Different vibe - exploring variety";
|
|
}
|
|
```
|
|
|
|
### Visualization Recommendations
|
|
|
|
#### 1. Radar Chart (Spider Graph)
|
|
Best for comparing multiple features at once. Shows source track (dashed line) vs current track (solid fill).
|
|
|
|
#### 2. Progress Bars
|
|
Best for individual feature comparison with source marker overlay.
|
|
|
|
#### 3. Mood Grid
|
|
4x2 or 4x4 grid of ML mood indicators with percentage matches.
|
|
|
|
#### 4. Valence-Arousal Quadrant
|
|
2D scatter plot with:
|
|
- X-axis: Valence (sad → happy)
|
|
- Y-axis: Arousal (calm → energetic)
|
|
|
|
Quadrants:
|
|
- Top-right: Happy + Energetic (Party)
|
|
- Top-left: Sad + Energetic (Angry/Tense)
|
|
- Bottom-right: Happy + Calm (Peaceful)
|
|
- Bottom-left: Sad + Calm (Melancholic)
|
|
|
|
---
|
|
|
|
## Existing Components Reference
|
|
|
|
### VibeOverlay
|
|
Location: `frontend/components/player/VibeOverlay.tsx`
|
|
|
|
Full-featured overlay showing:
|
|
- Overall match percentage
|
|
- Feature-by-feature comparison bars
|
|
- ML mood grid (enhanced mode)
|
|
- Source vs current legend
|
|
|
|
### VibeGraph
|
|
Location: `frontend/components/player/VibeGraph.tsx`
|
|
|
|
Compact radar chart for:
|
|
- 4-feature comparison (Energy, Mood, Dance, BPM)
|
|
- Match score badge
|
|
- Inline display in player
|
|
|
|
### MoodMixer
|
|
Location: `frontend/components/MoodMixer.tsx`
|
|
|
|
Modal for:
|
|
- Quick mood presets
|
|
- Custom range sliders
|
|
- Generating mood-based playlists
|
|
|
|
---
|
|
|
|
## Special Considerations
|
|
|
|
### Out-of-Distribution (OOD) Detection
|
|
|
|
The MusiCNN model was trained on pop/rock music. For other genres (classical, ambient, jazz), predictions may be unreliable. The backend normalizes these cases:
|
|
|
|
**Detection criteria:**
|
|
- All mood values > 0.7 with low variance
|
|
- All mood values clustered around 0.5
|
|
|
|
**UI Recommendation:** Show a subtle indicator when `analysisMode` is "standard" or when predictions seem unreliable.
|
|
|
|
### Handling Missing Data
|
|
|
|
Always provide fallback values:
|
|
|
|
```typescript
|
|
const safeFeatures = {
|
|
energy: track.energy ?? 0.5,
|
|
valence: track.valence ?? 0.5,
|
|
bpm: track.bpm ?? 120,
|
|
// ... etc
|
|
};
|
|
```
|
|
|
|
### Analysis Status States
|
|
|
|
| Status | UI Treatment |
|
|
|--------|--------------|
|
|
| `pending` | Show "Analyzing..." with spinner |
|
|
| `processing` | Show progress indicator |
|
|
| `completed` | Show full vibe data |
|
|
| `failed` | Show fallback/retry option |
|
|
|
|
---
|
|
|
|
## Quick Reference: Value Ranges
|
|
|
|
| Metric | Min | Max | Neutral |
|
|
|--------|-----|-----|---------|
|
|
| All mood* | 0 | 1 | 0.5 |
|
|
| energy | 0 | 1 | 0.5 |
|
|
| valence | 0 | 1 | 0.5 |
|
|
| arousal | 0 | 1 | 0.5 |
|
|
| danceability | 0 | 1 | 0.5 |
|
|
| bpm | 60 | 200 | 120 |
|
|
| keyStrength | 0 | 1 | - |
|
|
|
|
---
|
|
|
|
## File Locations
|
|
|
|
| Component | Path |
|
|
|-----------|------|
|
|
| Audio Analyzer (Python) | `services/audio-analyzer/analyzer.py` |
|
|
| Vibe Matching Logic | `backend/src/routes/library.ts` |
|
|
| Database Schema | `backend/prisma/schema.prisma` |
|
|
| Frontend Vibe Overlay | `frontend/components/player/VibeOverlay.tsx` |
|
|
| Frontend Vibe Graph | `frontend/components/player/VibeGraph.tsx` |
|
|
| Mood Mixer | `frontend/components/MoodMixer.tsx` |
|
|
| Audio State Context | `frontend/lib/audio-state-context.tsx` |
|
|
|
|
---
|
|
|
|
## Research Background
|
|
|
|
The Vibe System's valence and arousal calculations are informed by music psychology research:
|
|
|
|
### Valence (Emotional Positivity)
|
|
|
|
**Key Finding:** Mode/tonality is the strongest predictor of perceived valence in music.
|
|
|
|
- **Lee et al. (ICASSP 2020)** - Demonstrated that musical mode (major vs. minor) has the highest correlation with listener-reported valence
|
|
- Major keys contribute positively (+0.3 in our formula), minor keys negatively (-0.2)
|
|
- This aligns with centuries of music theory and empirical psychology research
|
|
|
|
### Arousal (Energy/Excitement)
|
|
|
|
**Key Finding:** The "electronic" mood prediction from ML models is unreliable for arousal calculation.
|
|
|
|
- **Grekow (2018)** - Found that direct energy and tempo features outperform genre-based predictions for arousal
|
|
- Our implementation replaces the "electronic" mood with explicit energy and BPM contributions
|
|
- This provides more consistent arousal predictions across diverse genres
|
|
|
|
### Feature Weights
|
|
|
|
The specific weights in our formulas (e.g., 0.35 for happy mood, 0.25 for energy) were tuned through:
|
|
1. Initial values from published research
|
|
2. Empirical testing on a diverse music library
|
|
3. User feedback on vibe matching accuracy
|
|
|
|
### References
|
|
|
|
- Lee, J., et al. (2020). "Music Emotion Recognition Using Valence-Arousal Regression." ICASSP 2020.
|
|
- Grekow, J. (2018). "Music Emotion Maps in Arousal-Valence Space." IFIP International Conference on Computer Information Systems and Industrial Management.
|