8.6 KiB
GSM Spy Zombie Process Fix
Issue Description
When starting GSM Spy, grgsm_scanner and grgsm_livemon processes were becoming zombies (defunct processes):
root 12488 5.1 0.0 0 0 pts/2 Z+ 14:29 0:01 [grgsm_scanner] <defunct>
Root Cause
Zombie processes occur when a child process terminates but the parent process doesn't call wait() or waitpid() to collect the exit status. The process remains in the process table as a zombie until the parent reaps it.
In the GSM Spy implementation, there were three issues:
Issue 1: scanner_thread not reaping grgsm_scanner process
- The
scanner_threadfunction reads fromgrgsm_scannerstdout - When the process terminates (either normally or due to error), the thread exits
- But it never calls
process.wait()to reap the child process - Result: zombie
grgsm_scannerprocess
Issue 2: monitor_thread not reaping tshark process
- The
monitor_threadfunction reads fromtsharkstdout - Same problem as Issue 1
- Result: zombie
tsharkprocess
Issue 3: grgsm_livemon process not tracked at all
- When starting monitoring, two processes are created:
grgsm_livemon- captures GSM traffic and feeds it to tsharktshark- filters and parses GSM data
- Only
tsharkwas being tracked ingsm_spy_monitor_process grgsm_livemonwas started but never stored or cleaned up- Result: zombie
grgsm_livemonprocess
Solution
Fix 1: Reap processes in scanner_thread
File: /opt/intercept/routes/gsm_spy.py
Function: scanner_thread() (line ~1026)
Changes:
finally:
# Reap the process to prevent zombie
try:
if process.poll() is None:
# Process still running, terminate it
process.terminate()
process.wait(timeout=5)
else:
# Process already terminated, just collect exit status
process.wait()
logger.info(f"Scanner process terminated with exit code {process.returncode}")
except Exception as e:
logger.error(f"Error cleaning up scanner process: {e}")
try:
process.kill()
process.wait()
except Exception:
pass
logger.info("Scanner thread terminated")
How it works:
- Check if process is still running with
poll() - If running, terminate gracefully with
terminate()thenwait() - If already terminated, just call
wait()to collect exit status - If anything fails, try
kill()thenwait() - This ensures the child process is always reaped
Fix 2: Reap processes in monitor_thread
File: /opt/intercept/routes/gsm_spy.py
Function: monitor_thread() (line ~1089)
Changes: Same cleanup logic as Fix 1, applied to the monitor thread.
Fix 3: Track and clean up grgsm_livemon process
3a. Add global variable for grgsm_livemon
File: /opt/intercept/app.py (line ~185)
Changes:
# GSM Spy
gsm_spy_process = None
gsm_spy_livemon_process = None # For grgsm_livemon process
gsm_spy_monitor_process = None # For tshark monitoring process
3b. Update global declarations
File: /opt/intercept/app.py (line ~677)
Changes:
global gsm_spy_process, gsm_spy_livemon_process, gsm_spy_monitor_process
3c. Clean up grgsm_livemon in reset function
File: /opt/intercept/app.py (line ~755)
Changes:
if gsm_spy_livemon_process:
try:
safe_terminate(gsm_spy_livemon_process, 'grgsm_livemon')
killed.append('grgsm_livemon')
except Exception:
pass
gsm_spy_livemon_process = None
3d. Store grgsm_livemon process when starting
File: /opt/intercept/routes/gsm_spy.py
Changes in /monitor endpoint (line ~212):
app_module.gsm_spy_livemon_process = grgsm_proc
app_module.gsm_spy_monitor_process = tshark_proc
Changes in auto_start_monitor() function (line ~997):
app_module.gsm_spy_livemon_process = grgsm_proc
app_module.gsm_spy_monitor_process = tshark_proc
3e. Stop grgsm_livemon when stopping scanner
File: /opt/intercept/routes/gsm_spy.py (line ~254)
Changes:
if app_module.gsm_spy_livemon_process:
try:
app_module.gsm_spy_livemon_process.terminate()
app_module.gsm_spy_livemon_process.wait(timeout=5)
killed.append('livemon')
except Exception:
try:
app_module.gsm_spy_livemon_process.kill()
except Exception:
pass
app_module.gsm_spy_livemon_process = None
Files Modified
-
/opt/intercept/routes/gsm_spy.pyscanner_thread()- Added process reaping in finally blockmonitor_thread()- Added process reaping in finally block/monitorendpoint - Store grgsm_livemon processauto_start_monitor()- Store grgsm_livemon process/stopendpoint - Clean up grgsm_livemon process
-
/opt/intercept/app.py- Added
gsm_spy_livemon_processglobal variable - Updated global declarations in
reset_decoder_processes() - Added cleanup for
gsm_spy_livemon_process
- Added
Testing
Before Fix
# Start GSM Spy
# Check processes
ps aux | grep grgsm
# You would see:
root 12488 0.0 0.0 0 0 pts/2 Z+ 14:29 0:00 [grgsm_scanner] <defunct>
root 12489 0.0 0.0 0 0 pts/2 Z+ 14:29 0:00 [grgsm_livemon] <defunct>
After Fix
# Start GSM Spy
# Check processes
ps aux | grep grgsm
# Active processes (no zombies):
root 12488 1.2 0.5 12345 5678 pts/2 S+ 14:29 0:01 grgsm_scanner -d 0 --freq-range...
root 12489 0.8 0.4 10234 4567 pts/2 S+ 14:29 0:01 grgsm_livemon -a 123 -d 0
# Stop GSM Spy
# Check processes
ps aux | grep grgsm
# No processes (all cleaned up properly)
Verification Commands
- Check for zombie processes:
ps aux | grep defunct
# Should return nothing after fix
- Monitor process lifecycle:
# In one terminal, watch processes
watch -n 1 'ps aux | grep grgsm'
# In another terminal, start/stop GSM Spy
# Verify:
# - Processes start properly (S or R state, not Z)
# - Processes disappear when stopped (not left as zombies)
- Check process tree:
pstree -p | grep grgsm
# Should show proper parent-child relationships
# No defunct/zombie entries
Process Lifecycle
Normal Operation
-
Scanner Start:
grgsm_scannerspawned → stored ingsm_spy_processscanner_threadreads output- Process running normally
-
Monitor Start (auto or manual):
grgsm_livemonspawned → stored ingsm_spy_livemon_processtsharkspawned → stored ingsm_spy_monitor_processmonitor_threadreads tshark output- Both processes running normally
-
Stop:
- All three processes terminated gracefully
wait()called on each to collect exit status- Process handles set to None
- No zombies remain
Error Handling
-
Process crashes during operation:
- Thread's stdout loop exits
finallyblock executesprocess.wait()collects exit status- No zombie created
-
Process hangs:
terminate()calledwait(timeout=5)gives 5 seconds to exit- If timeout,
kill()is called wait()collects exit status
-
Exception during cleanup:
- Fallback to
kill()+wait() - Ensures zombie is always prevented
- Fallback to
Best Practices Applied
- Always reap child processes: Call
wait()orwaitpid()after child process terminates - Use process.poll(): Check if process is still running before terminating
- Graceful shutdown: Try
terminate()beforekill() - Timeout handling: Use
wait(timeout=N)to prevent hanging - Error recovery: Multiple fallback levels in try/except blocks
- Track all processes: Store handles for all spawned processes, not just the primary one
- Cleanup in finally: Ensures cleanup happens even if exceptions occur
Related Issues
This fix prevents:
- Zombie processes accumulating over time
- Process table filling up
- System resource leaks
- Confusing process listings for users
Implementation Date
2026-02-06
Additional Notes
- The fix follows the same patterns used in other INTERCEPT decoders
- Compatible with existing SDR device selection implementation
- No breaking changes to API or user interface
- Applies to both manual monitoring and auto-monitoring