Files
intercept/GSM_SPY_ZOMBIE_PROCESS_FIX.md

8.6 KiB

GSM Spy Zombie Process Fix

Issue Description

When starting GSM Spy, grgsm_scanner and grgsm_livemon processes were becoming zombies (defunct processes):

root  12488  5.1  0.0      0     0 pts/2    Z+   14:29   0:01 [grgsm_scanner] <defunct>

Root Cause

Zombie processes occur when a child process terminates but the parent process doesn't call wait() or waitpid() to collect the exit status. The process remains in the process table as a zombie until the parent reaps it.

In the GSM Spy implementation, there were three issues:

Issue 1: scanner_thread not reaping grgsm_scanner process

  • The scanner_thread function reads from grgsm_scanner stdout
  • When the process terminates (either normally or due to error), the thread exits
  • But it never calls process.wait() to reap the child process
  • Result: zombie grgsm_scanner process

Issue 2: monitor_thread not reaping tshark process

  • The monitor_thread function reads from tshark stdout
  • Same problem as Issue 1
  • Result: zombie tshark process

Issue 3: grgsm_livemon process not tracked at all

  • When starting monitoring, two processes are created:
    1. grgsm_livemon - captures GSM traffic and feeds it to tshark
    2. tshark - filters and parses GSM data
  • Only tshark was being tracked in gsm_spy_monitor_process
  • grgsm_livemon was started but never stored or cleaned up
  • Result: zombie grgsm_livemon process

Solution

Fix 1: Reap processes in scanner_thread

File: /opt/intercept/routes/gsm_spy.py Function: scanner_thread() (line ~1026)

Changes:

finally:
    # Reap the process to prevent zombie
    try:
        if process.poll() is None:
            # Process still running, terminate it
            process.terminate()
            process.wait(timeout=5)
        else:
            # Process already terminated, just collect exit status
            process.wait()
        logger.info(f"Scanner process terminated with exit code {process.returncode}")
    except Exception as e:
        logger.error(f"Error cleaning up scanner process: {e}")
        try:
            process.kill()
            process.wait()
        except Exception:
            pass
    logger.info("Scanner thread terminated")

How it works:

  1. Check if process is still running with poll()
  2. If running, terminate gracefully with terminate() then wait()
  3. If already terminated, just call wait() to collect exit status
  4. If anything fails, try kill() then wait()
  5. This ensures the child process is always reaped

Fix 2: Reap processes in monitor_thread

File: /opt/intercept/routes/gsm_spy.py Function: monitor_thread() (line ~1089)

Changes: Same cleanup logic as Fix 1, applied to the monitor thread.

Fix 3: Track and clean up grgsm_livemon process

3a. Add global variable for grgsm_livemon

File: /opt/intercept/app.py (line ~185)

Changes:

# GSM Spy
gsm_spy_process = None
gsm_spy_livemon_process = None  # For grgsm_livemon process
gsm_spy_monitor_process = None  # For tshark monitoring process

3b. Update global declarations

File: /opt/intercept/app.py (line ~677)

Changes:

global gsm_spy_process, gsm_spy_livemon_process, gsm_spy_monitor_process

3c. Clean up grgsm_livemon in reset function

File: /opt/intercept/app.py (line ~755)

Changes:

if gsm_spy_livemon_process:
    try:
        safe_terminate(gsm_spy_livemon_process, 'grgsm_livemon')
        killed.append('grgsm_livemon')
    except Exception:
        pass
gsm_spy_livemon_process = None

3d. Store grgsm_livemon process when starting

File: /opt/intercept/routes/gsm_spy.py

Changes in /monitor endpoint (line ~212):

app_module.gsm_spy_livemon_process = grgsm_proc
app_module.gsm_spy_monitor_process = tshark_proc

Changes in auto_start_monitor() function (line ~997):

app_module.gsm_spy_livemon_process = grgsm_proc
app_module.gsm_spy_monitor_process = tshark_proc

3e. Stop grgsm_livemon when stopping scanner

File: /opt/intercept/routes/gsm_spy.py (line ~254)

Changes:

if app_module.gsm_spy_livemon_process:
    try:
        app_module.gsm_spy_livemon_process.terminate()
        app_module.gsm_spy_livemon_process.wait(timeout=5)
        killed.append('livemon')
    except Exception:
        try:
            app_module.gsm_spy_livemon_process.kill()
        except Exception:
            pass
    app_module.gsm_spy_livemon_process = None

Files Modified

  1. /opt/intercept/routes/gsm_spy.py

    • scanner_thread() - Added process reaping in finally block
    • monitor_thread() - Added process reaping in finally block
    • /monitor endpoint - Store grgsm_livemon process
    • auto_start_monitor() - Store grgsm_livemon process
    • /stop endpoint - Clean up grgsm_livemon process
  2. /opt/intercept/app.py

    • Added gsm_spy_livemon_process global variable
    • Updated global declarations in reset_decoder_processes()
    • Added cleanup for gsm_spy_livemon_process

Testing

Before Fix

# Start GSM Spy
# Check processes
ps aux | grep grgsm

# You would see:
root  12488  0.0  0.0      0     0 pts/2    Z+   14:29   0:00 [grgsm_scanner] <defunct>
root  12489  0.0  0.0      0     0 pts/2    Z+   14:29   0:00 [grgsm_livemon] <defunct>

After Fix

# Start GSM Spy
# Check processes
ps aux | grep grgsm

# Active processes (no zombies):
root  12488  1.2  0.5  12345  5678 pts/2    S+   14:29   0:01 grgsm_scanner -d 0 --freq-range...
root  12489  0.8  0.4  10234  4567 pts/2    S+   14:29   0:01 grgsm_livemon -a 123 -d 0

# Stop GSM Spy
# Check processes
ps aux | grep grgsm

# No processes (all cleaned up properly)

Verification Commands

  1. Check for zombie processes:
ps aux | grep defunct
# Should return nothing after fix
  1. Monitor process lifecycle:
# In one terminal, watch processes
watch -n 1 'ps aux | grep grgsm'

# In another terminal, start/stop GSM Spy
# Verify:
# - Processes start properly (S or R state, not Z)
# - Processes disappear when stopped (not left as zombies)
  1. Check process tree:
pstree -p | grep grgsm
# Should show proper parent-child relationships
# No defunct/zombie entries

Process Lifecycle

Normal Operation

  1. Scanner Start:

    • grgsm_scanner spawned → stored in gsm_spy_process
    • scanner_thread reads output
    • Process running normally
  2. Monitor Start (auto or manual):

    • grgsm_livemon spawned → stored in gsm_spy_livemon_process
    • tshark spawned → stored in gsm_spy_monitor_process
    • monitor_thread reads tshark output
    • Both processes running normally
  3. Stop:

    • All three processes terminated gracefully
    • wait() called on each to collect exit status
    • Process handles set to None
    • No zombies remain

Error Handling

  1. Process crashes during operation:

    • Thread's stdout loop exits
    • finally block executes
    • process.wait() collects exit status
    • No zombie created
  2. Process hangs:

    • terminate() called
    • wait(timeout=5) gives 5 seconds to exit
    • If timeout, kill() is called
    • wait() collects exit status
  3. Exception during cleanup:

    • Fallback to kill() + wait()
    • Ensures zombie is always prevented

Best Practices Applied

  1. Always reap child processes: Call wait() or waitpid() after child process terminates
  2. Use process.poll(): Check if process is still running before terminating
  3. Graceful shutdown: Try terminate() before kill()
  4. Timeout handling: Use wait(timeout=N) to prevent hanging
  5. Error recovery: Multiple fallback levels in try/except blocks
  6. Track all processes: Store handles for all spawned processes, not just the primary one
  7. Cleanup in finally: Ensures cleanup happens even if exceptions occur

This fix prevents:

  • Zombie processes accumulating over time
  • Process table filling up
  • System resource leaks
  • Confusing process listings for users

Implementation Date

2026-02-06

Additional Notes

  • The fix follows the same patterns used in other INTERCEPT decoders
  • Compatible with existing SDR device selection implementation
  • No breaking changes to API or user interface
  • Applies to both manual monitoring and auto-monitoring