Skip to content

SWE-1.6 triggers unbounded worker spawning → load >1100, ~42GB RAM, SSH instability, ECONNREFUSED (stable with GPT-5.4 Mini) #322

@oib

Description

@oib

Title

SWE-1.6 triggers unbounded worker spawning → load >1100, ~42GB RAM, ECONNREFUSED (stable with GPT-5.4 Mini)


Summary

Using SWE-1.6 in Windsurf causes extreme resource usage after sending a prompt. The system becomes unstable and the Windsurf backend enters a reconnect loop with ECONNREFUSED. Switching to GPT-5.4 Mini on the same setup resolves the issue entirely.


Environment

  • OS: Debian (Trixie)
  • CPU: AMD 5950X (16C/32T)
  • Container CPU allocation: 12 threads (6 cores)
  • RAM: 64 GB (48 GB assigned to the container)
  • Runtime: Incus container
  • Filesystem: BTRFS
  • Concurrent workload: CI runner in another container

Steps to Reproduce

  1. Start Windsurf with SWE-1.6 selected
  2. Open a moderately sized project
  3. Send a prompt (e.g., code query / analysis)
  4. Observe process spawning and system metrics

Observed Behavior

  • Dozens of processes spawn:
    language_server_linux_x64
    --enable_index_service
    --enable_local_search
    
  • Load average spikes dramatically (observed >1100 on 32-thread CPU)
  • Memory usage grows to ~42 GB (container assigned 48 GB)
  • Windsurf logs:
    Connection to server got closed. Server will restart.
    windsurf client: couldn't create connection to server.
    Error: connect ECONNREFUSED 127.0.0.1:<port>
    Restarting server failed
    
  • Requires manual window reload to recover

Comparison (Same System, Same Project)

Model Load RAM Usage Stability
SWE-1.6 100–1100+ up to 42GB ❌ unstable
GPT-5.4 Mini <10 ~7GB ✅ stable

Additional Observations

  • Trigger occurs after prompt, not at startup
  • Behavior resembles unbounded parallelism / worker spawning
  • System instability correlates with process explosion
  • Increasing RAM improves stability but does not fix the root cause
  • Limiting container CPU/processes mitigates but does not eliminate the issue
  • Updating SSH client and server resolved connection drops, indicating prior issues were caused by system starvation rather than network failure

Expected Behavior

  • Bounded worker pool
  • Graceful degradation under load
  • Stable backend connection
  • Resource usage proportional to workload

Actual Behavior

  • Unbounded worker spawning
  • Extreme CPU and memory usage
  • Backend becomes unreachable (ECONNREFUSED)
  • Requires manual recovery

Related Issues

This report identifies a likely root cause: resource explosion triggered by SWE-1.6 after a prompt.


Workarounds

  • Switching to GPT-5.4 Mini (fully stable)
  • Increasing available RAM (partial mitigation)
  • Limiting container CPU / processes (partial mitigation)
  • Updating SSH configuration prevents disconnects but does not address the root cause

SSH Mitigations Applied

  • Client (~/.ssh/config):
    Host *
        ServerAliveInterval 10
        ServerAliveCountMax 10
        TCPKeepAlive yes
        ConnectTimeout 10
    
  • Server (/etc/ssh/sshd_config):
    ClientAliveInterval 15
    ClientAliveCountMax 10
    TCPKeepAlive yes
    UseDNS no
    MaxStartups 10:30:60
    LoginGraceTime 30
    MaxSessions 4
    
  • Result: SSH disconnects no longer occur under load; prior disconnects were due to scheduler starvation rather than network issues

Notes

This appears to be a scaling/control issue specific to SWE-1.6 runtime behavior. High-core systems amplify the problem, but the lack of concurrency limits likely affects general use as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions