Files

Kevin O'Neill 021aec7a63 Initial release v1.0.0

2025-12-25 18:58:06 -06:00

18 KiB

Raw Blame History

Lidify Vibe System Documentation

This document provides comprehensive documentation of the Vibe System - how Lidify analyzes tracks, collects audio metrics, and compares them for vibe matching. Use this as a reference for building frontend interfaces.

Overview
Metrics Collected
Data Structures
Vibe Matching Algorithm
API Endpoints
Frontend Integration Guide
Existing Components Reference

Overview

The Vibe System uses a combination of audio signal analysis and ML-based mood prediction to understand the "feel" of a track. It operates in two modes:

Mode	Description	Accuracy
Standard	Heuristic-based analysis using audio signal features (BPM, key, energy)	Good
Enhanced	ML-based analysis using MusiCNN neural network for mood prediction	Best

The system enables:

Finding tracks with similar vibes to a source track
Generating mood-based playlists
Visualizing track characteristics in real-time

Metrics Collected

Core Audio Features (Always Available)

These are extracted directly from audio signal analysis at 44.1kHz:

Metric	Type	Range	Description
`bpm`	Float	60-200	Tempo in beats per minute
`beatsCount`	Int	0+	Total number of beats detected
`key`	String	"C", "F#", etc.	Musical key
`keyScale`	String	"major" \| "minor"	Major or minor tonality
`keyStrength`	Float	0-1	Confidence of key detection
`energy`	Float	0-1	RMS-based intensity level
`loudness`	Float	dB	Average loudness
`dynamicRange`	Float	dB	Difference between quietest and loudest
`danceability`	Float	0-1	Rhythm regularity and groove potential

ML Mood Predictions (Enhanced Mode)

Seven core mood dimensions predicted by the MusiCNN model:

Metric	Type	Range	Description	Icon Suggestion
`moodHappy`	Float	0-1	Happiness/cheerfulness probability	Smile
`moodSad`	Float	0-1	Sadness/melancholy probability	Frown
`moodRelaxed`	Float	0-1	Calm/peaceful probability	Coffee
`moodAggressive`	Float	0-1	Intensity/aggression probability	Flame
`moodParty`	Float	0-1	Upbeat/party probability	PartyPopper
`moodAcoustic`	Float	0-1	Acoustic instrumentation probability	Guitar
`moodElectronic`	Float	0-1	Electronic/synthetic probability	Radio

Derived Features (Computed)

These are calculated from the ML predictions:

Valence (Emotional Positivity)

// Formula:
valence = (
    moodHappy * 0.5 +           // Happy mood (50% weight)
    moodParty * 0.3 +           // Party mood (30% weight)
    (1 - moodSad) * 0.2         // Inverse of sadness (20% weight)
)

Value	Interpretation
0.0 - 0.3	Melancholic, sad
0.3 - 0.6	Neutral, balanced
0.6 - 1.0	Happy, positive

Arousal (Energy/Excitement Level)

// Formula:
arousal = (
    moodAggressive * 0.35 +     // Aggressive mood (35% weight)
    moodParty * 0.25 +          // Party mood (25% weight)
    moodElectronic * 0.2 +      // Electronic sound (20% weight)
    (1 - moodRelaxed) * 0.1 +   // Inverse of relaxation (10% weight)
    (1 - moodAcoustic) * 0.1    // Inverse of acoustic (10% weight)
)

Value	Interpretation
0.0 - 0.3	Calm, peaceful
0.3 - 0.6	Moderate energy
0.6 - 1.0	High energy, intense

Additional Features

Metric	Type	Range	Description
`instrumentalness`	Float	0-1	Voice presence (0=vocal, 1=instrumental)
`acousticness`	Float	0-1	Acoustic vs. processed sound
`speechiness`	Float	0-1	Spoken word detection
`danceabilityMl`	Float	0-1	ML-based danceability (more accurate)

Metadata & Tags

Field	Type	Description
`moodTags`	String[]	Derived mood labels (e.g., ["chill", "happy"])
`essentiaGenres`	String[]	ML-predicted genres (e.g., ["rock", "electronic"])
`lastfmTags`	String[]	User-generated tags from Last.fm
`analysisStatus`	String	"pending" \| "processing" \| "completed" \| "failed"
`analysisMode`	String	"standard" \| "enhanced"
`analyzedAt`	DateTime	When analysis was performed

Data Structures

TypeScript Interface

interface AudioFeatures {
    // Core audio features
    bpm?: number | null;
    beatsCount?: number | null;
    key?: string | null;
    keyScale?: string | null;
    keyStrength?: number | null;
    energy?: number | null;
    loudness?: number | null;
    dynamicRange?: number | null;
    danceability?: number | null;

    // Derived features
    valence?: number | null;
    arousal?: number | null;

    // Additional features
    instrumentalness?: number | null;
    acousticness?: number | null;
    speechiness?: number | null;
    danceabilityMl?: number | null;

    // ML Mood predictions (Enhanced mode)
    moodHappy?: number | null;
    moodSad?: number | null;
    moodRelaxed?: number | null;
    moodAggressive?: number | null;
    moodParty?: number | null;
    moodAcoustic?: number | null;
    moodElectronic?: number | null;

    // Metadata
    analysisStatus?: string | null;
    analysisMode?: string | null;
    analyzedAt?: string | null;

    // Tags
    moodTags?: string[];
    essentiaGenres?: string[];
    lastfmTags?: string[];
}

Feature Display Configuration

Recommended configuration for displaying features in UI:

const FEATURE_CONFIG = [
    {
        key: "energy",
        label: "Energy",
        icon: "Zap",           // lucide-react icon
        min: 0,
        max: 1,
        lowLabel: "Calm",
        highLabel: "Intense",
    },
    {
        key: "valence",
        label: "Mood",
        icon: "Heart",
        min: 0,
        max: 1,
        lowLabel: "Melancholic",
        highLabel: "Happy",
    },
    {
        key: "danceability",
        label: "Groove",
        icon: "Footprints",
        min: 0,
        max: 1,
        lowLabel: "Freeform",
        highLabel: "Danceable",
    },
    {
        key: "bpm",
        label: "Tempo",
        icon: "Gauge",
        min: 60,
        max: 180,
        lowLabel: "Slow",
        highLabel: "Fast",
        unit: "BPM",
    },
    {
        key: "arousal",
        label: "Arousal",
        icon: "AudioWaveform",
        min: 0,
        max: 1,
        lowLabel: "Peaceful",
        highLabel: "Energetic",
    },
];

const ML_MOOD_CONFIG = [
    { key: "moodHappy", label: "Happy", icon: "Smile", color: "yellow-400" },
    { key: "moodSad", label: "Sad", icon: "Frown", color: "blue-400" },
    { key: "moodRelaxed", label: "Relaxed", icon: "Coffee", color: "green-400" },
    { key: "moodAggressive", label: "Aggressive", icon: "Flame", color: "red-400" },
    { key: "moodParty", label: "Party", icon: "PartyPopper", color: "pink-400" },
    { key: "moodAcoustic", label: "Acoustic", icon: "Guitar", color: "amber-400" },
    { key: "moodElectronic", label: "Electronic", icon: "Radio", color: "purple-400" },
];

Vibe Matching Algorithm

Feature Vector Construction

The system builds a 13-dimensional feature vector for each track:

const buildFeatureVector = (track: AudioFeatures) => [
    // ML Mood predictions (7 features) - 1.3x weight for semantic importance
    getMoodValue(track.moodHappy, 0.5) * 1.3,
    getMoodValue(track.moodSad, 0.5) * 1.3,
    getMoodValue(track.moodRelaxed, 0.5) * 1.3,
    getMoodValue(track.moodAggressive, 0.5) * 1.3,
    getMoodValue(track.moodParty, 0.5) * 1.3,
    getMoodValue(track.moodAcoustic, 0.5) * 1.3,
    getMoodValue(track.moodElectronic, 0.5) * 1.3,

    // Audio features (5 features)
    track.energy ?? 0.5,
    calculateEnhancedArousal(track),
    track.danceabilityMl ?? track.danceability ?? 0.5,
    track.instrumentalness ?? 0.5,

    // BPM (octave-aware normalization)
    1 - octaveAwareBPMDistance(track.bpm ?? 120, 120),

    // Valence
    calculateEnhancedValence(track),
];

// Helper: Get mood value with fallback
const getMoodValue = (value: number | null | undefined, fallback: number) =>
    value ?? fallback;

Cosine Similarity Calculation

Tracks are compared using cosine similarity:

const cosineSimilarity = (vectorA: number[], vectorB: number[]): number => {
    let dotProduct = 0;
    let magA = 0;
    let magB = 0;

    for (let i = 0; i < vectorA.length; i++) {
        dotProduct += vectorA[i] * vectorB[i];
        magA += vectorA[i] * vectorA[i];
        magB += vectorB[i] * vectorB[i];
    }

    return dotProduct / (Math.sqrt(magA) * Math.sqrt(magB));
};

Tag/Genre Bonus

Additional boost for shared tags:

const computeTagBonus = (
    sourceTags: string[],
    sourceGenres: string[],
    trackTags: string[],
    trackGenres: string[]
): number => {
    const sourceSet = new Set(
        [...sourceTags, ...sourceGenres].map(t => t.toLowerCase())
    );
    const trackSet = new Set(
        [...trackTags, ...trackGenres].map(t => t.toLowerCase())
    );

    const overlap = [...sourceSet].filter(tag => trackSet.has(tag)).length;
    return Math.min(0.05, overlap * 0.01);  // Max 5% bonus
};

Final Score

const finalScore = cosineSimilarity(sourceVector, targetVector) * 0.95 + tagBonus;

Matching Thresholds

Mode	Minimum Similarity
Enhanced	40%
Standard	50%

Lower threshold for Enhanced mode because ML predictions provide more nuanced differentiation.

Octave-Aware BPM Matching

Treats harmonically related tempos as similar (60 BPM ≈ 120 BPM ≈ 240 BPM):

const octaveAwareBPMDistance = (bpm1: number, bpm2: number): number => {
    const normalizeToOctave = (bpm: number): number => {
        while (bpm < 77) bpm *= 2;
        while (bpm > 154) bpm /= 2;
        return bpm;
    };

    const norm1 = normalizeToOctave(bpm1);
    const norm2 = normalizeToOctave(bpm2);

    const logDistance = Math.abs(Math.log2(norm1) - Math.log2(norm2));
    return Math.min(logDistance, 1);
};

API Endpoints

Get Track Audio Features

GET /api/tracks/:id/features

Response:

{
    "bpm": 128.5,
    "energy": 0.78,
    "valence": 0.65,
    "arousal": 0.72,
    "danceability": 0.85,
    "key": "C",
    "keyScale": "major",
    "moodHappy": 0.72,
    "moodSad": 0.15,
    "moodRelaxed": 0.28,
    "moodAggressive": 0.45,
    "moodParty": 0.68,
    "moodAcoustic": 0.12,
    "moodElectronic": 0.78,
    "analysisMode": "enhanced",
    "analysisStatus": "completed"
}

Find Similar Tracks (Vibe Match)

GET /api/library/vibe-match?trackId=:id&limit=20

Response:

{
    "source": { /* track with features */ },
    "matches": [
        {
            "track": { /* track data */ },
            "similarity": 0.87,
            "features": { /* audio features */ }
        }
    ]
}

Generate Mood Mix

POST /api/mixes/mood

Request:

{
    "valence": { "min": 0.6, "max": 1.0 },
    "energy": { "min": 0.5, "max": 0.8 },
    "danceability": { "min": 0.7, "max": 1.0 },
    "bpm": { "min": 100, "max": 140 },
    "limit": 15
}

Get Mood Presets

GET /api/mixes/mood-presets

Response:

[
    {
        "id": "chill",
        "name": "Chill Vibes",
        "color": "from-blue-600 to-purple-600",
        "params": {
            "valence": { "min": 0.3, "max": 0.7 },
            "energy": { "min": 0.1, "max": 0.4 }
        }
    }
]

Frontend Integration Guide

Displaying Feature Values

Normalize values for consistent display:

function normalizeValue(
    value: number | null | undefined,
    min: number,
    max: number
): number {
    if (value === null || value === undefined) return 0;
    return Math.max(0, Math.min(1, (value - min) / (max - min)));
}

// Usage
const normalizedBpm = normalizeValue(track.bpm, 60, 180);
const normalizedEnergy = normalizeValue(track.energy, 0, 1);

Calculating Match Scores

function calculateFeatureMatch(
    sourceVal: number | null,
    currentVal: number | null,
    min: number,
    max: number
): { diff: number; match: number } {
    const sourceNorm = normalizeValue(sourceVal, min, max);
    const currentNorm = normalizeValue(currentVal, min, max);
    const diff = Math.abs(sourceNorm - currentNorm);
    const match = Math.round((1 - diff) * 100);

    return { diff, match };
}

Match Score Color Coding

function getMatchColor(matchPercent: number): string {
    if (matchPercent >= 80) return "text-green-400";  // Excellent
    if (matchPercent >= 60) return "text-yellow-400"; // Good
    return "text-red-400";                            // Different
}

function getMatchDescription(matchPercent: number): string {
    if (matchPercent >= 80) return "Excellent match - very similar vibe";
    if (matchPercent >= 60) return "Good match - similar energy";
    return "Different vibe - exploring variety";
}

Visualization Recommendations

1. Radar Chart (Spider Graph)

Best for comparing multiple features at once. Shows source track (dashed line) vs current track (solid fill).

2. Progress Bars

Best for individual feature comparison with source marker overlay.

3. Mood Grid

4x2 or 4x4 grid of ML mood indicators with percentage matches.

4. Valence-Arousal Quadrant

2D scatter plot with:

X-axis: Valence (sad → happy)
Y-axis: Arousal (calm → energetic)

Quadrants:

Top-right: Happy + Energetic (Party)
Top-left: Sad + Energetic (Angry/Tense)
Bottom-right: Happy + Calm (Peaceful)
Bottom-left: Sad + Calm (Melancholic)

Existing Components Reference

VibeOverlay

Location: frontend/components/player/VibeOverlay.tsx

Full-featured overlay showing:

Overall match percentage
Feature-by-feature comparison bars
ML mood grid (enhanced mode)
Source vs current legend

VibeGraph

Location: frontend/components/player/VibeGraph.tsx

Compact radar chart for:

4-feature comparison (Energy, Mood, Dance, BPM)
Match score badge
Inline display in player

MoodMixer

Location: frontend/components/MoodMixer.tsx

Modal for:

Quick mood presets
Custom range sliders
Generating mood-based playlists

Special Considerations

Out-of-Distribution (OOD) Detection

The MusiCNN model was trained on pop/rock music. For other genres (classical, ambient, jazz), predictions may be unreliable. The backend normalizes these cases:

Detection criteria:

All mood values > 0.7 with low variance
All mood values clustered around 0.5

UI Recommendation: Show a subtle indicator when analysisMode is "standard" or when predictions seem unreliable.

Handling Missing Data

Always provide fallback values:

const safeFeatures = {
    energy: track.energy ?? 0.5,
    valence: track.valence ?? 0.5,
    bpm: track.bpm ?? 120,
    // ... etc
};

Analysis Status States

Status	UI Treatment
`pending`	Show "Analyzing..." with spinner
`processing`	Show progress indicator
`completed`	Show full vibe data
`failed`	Show fallback/retry option

Quick Reference: Value Ranges

Metric	Min	Max	Neutral
All mood*	0	1	0.5
energy	0	1	0.5
valence	0	1	0.5
arousal	0	1	0.5
danceability	0	1	0.5
bpm	60	200	120
keyStrength	0	1	-

File Locations

Component	Path
Audio Analyzer (Python)	`services/audio-analyzer/analyzer.py`
Vibe Matching Logic	`backend/src/routes/library.ts`
Database Schema	`backend/prisma/schema.prisma`
Frontend Vibe Overlay	`frontend/components/player/VibeOverlay.tsx`
Frontend Vibe Graph	`frontend/components/player/VibeGraph.tsx`
Mood Mixer	`frontend/components/MoodMixer.tsx`
Audio State Context	`frontend/lib/audio-state-context.tsx`

Research Background

The Vibe System's valence and arousal calculations are informed by music psychology research:

Valence (Emotional Positivity)

Key Finding: Mode/tonality is the strongest predictor of perceived valence in music.

Lee et al. (ICASSP 2020) - Demonstrated that musical mode (major vs. minor) has the highest correlation with listener-reported valence
Major keys contribute positively (+0.3 in our formula), minor keys negatively (-0.2)
This aligns with centuries of music theory and empirical psychology research

Arousal (Energy/Excitement)

Key Finding: The "electronic" mood prediction from ML models is unreliable for arousal calculation.

Grekow (2018) - Found that direct energy and tempo features outperform genre-based predictions for arousal
Our implementation replaces the "electronic" mood with explicit energy and BPM contributions
This provides more consistent arousal predictions across diverse genres

Feature Weights

The specific weights in our formulas (e.g., 0.35 for happy mood, 0.25 for energy) were tuned through:

Initial values from published research
Empirical testing on a diverse music library
User feedback on vibe matching accuracy

References

Lee, J., et al. (2020). "Music Emotion Recognition Using Valence-Arousal Regression." ICASSP 2020.
Grekow, J. (2018). "Music Emotion Maps in Arousal-Valence Space." IFIP International Conference on Computer Information Systems and Industrial Management.

18 KiB Raw Blame History

Lidify Vibe System Documentation

Table of Contents

Overview

Metrics Collected

Core Audio Features (Always Available)

ML Mood Predictions (Enhanced Mode)

Derived Features (Computed)

Valence (Emotional Positivity)

Arousal (Energy/Excitement Level)

Additional Features

Metadata & Tags

Data Structures

TypeScript Interface

Feature Display Configuration

Vibe Matching Algorithm

Feature Vector Construction

Cosine Similarity Calculation

Tag/Genre Bonus

Final Score

Matching Thresholds

Octave-Aware BPM Matching

API Endpoints

Get Track Audio Features

Find Similar Tracks (Vibe Match)

Generate Mood Mix

Get Mood Presets

Frontend Integration Guide

Displaying Feature Values

Calculating Match Scores

Match Score Color Coding

Visualization Recommendations

1. Radar Chart (Spider Graph)

2. Progress Bars

3. Mood Grid

4. Valence-Arousal Quadrant

Existing Components Reference

VibeOverlay

VibeGraph

MoodMixer

Special Considerations

Out-of-Distribution (OOD) Detection

Handling Missing Data

Analysis Status States

Quick Reference: Value Ranges

File Locations

Research Background

Valence (Emotional Positivity)

Arousal (Energy/Excitement)

Feature Weights

References

18 KiB

Raw Blame History