[Bug]

by ADMIN 6 views

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [ ] 5. Please use English, otherwise it will be closed.

Describe the Bug

Bug Overview

We are experiencing a bug in our system that causes the memory capacity to be unbalanced, resulting in some GPUs being occupied by other processes. This bug is preventing us from running our application smoothly.

Bug Details

The bug is causing the following error messages to appear:

[2025-03-09 14:40:47 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1816, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 252, in __init__
    self.draft_worker = EAGLEWorker(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/speculative/eagle_worker.py", line 47, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 187, in __init__
    min_per_gpu_memory = self.init_torch_distributed()
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 280, in init_torch_distributed
    raise ValueError(
ValueError: The memory capacity is unbalanced. Some GPUs may be occupied by other processes.

Reproduction

To reproduce the bug, follow these steps:

  1. Run the following command:
python3 -m sglang.launch_server --model-path /models/deepseek --tp 16 --dist-init-addr $HEAD_IP:20000 --nnodes 2 --node-rank ${INDEX} --trust-remote-code --context-length 131072 --host 0.0.0.0 --port 8080 --enable-torch-compile --torch-compile-max-bs 16 --speculative-algo NEXTN --speculative-draft /data04/DeepSeek-R1-NextN --speculative-num-steps 2 --speculative-eagle-topk 4 --speculative-num-draft-tokens 4 --disable-radix
  1. Observe the error messages that appear.

Environment

The environment in which the bug is occurring is as follows:

  • Python: 3.10.12
  • CUDA: available
  • GPU: NVIDIA H20
  • Compute Capability: 9.0
  • CUDA_HOME: /usr/local/cuda
  • NVCC: Cuda compilation tools, release 12.4, V12.4.131
  • CUDA Driver Version: 535.161.08
  • PyTorch: 2.5.1+cu124
  • sglang: 0.4.3.post2
  • sgl_kernel: 0.0.3.post6
  • flashinfer: 0.2.1.post2
  • triton: 3.1.0
  • transformers: 4.48.2
  • torchao: 0.8.0
  • numpy: 1.26.4
  • aiohttp: 3.9.3
  • fastapi: 0.115.8
  • hf_transfer: 0.1.9
  • huggingface_hub: 0.28.1
  • interegular: 0.3.3
  • modelscope: 1.22.3
  • orjson: 3.10.15
  • packaging: 23.2
  • psutil: 5.9.4
  • pydantic: 2.10.6
  • multipart: 0.0.20
  • zmq: 25.1.2
  • uvicorn: 0.34.0
  • uvloop: 0.21.0
  • vllm: 0.6.4.post1
  • openai: 1.60.2
  • anthropic: 0.45.2
  • decord: 0.6.0

Additional Information

The bug is likely caused by the fact that some GPUs are being occupied by other processes, resulting in an unbalanced memory capacity. This is preventing us from running our application smoothly.

Expected Behavior

We expect the memory capacity to be balanced, allowing us to run our application smoothly.

Actual Behavior

The actual behavior is that the memory capacity is unbalanced, resulting in some GPUs being occupied by other processes.

Steps to Reproduce

To reproduce the bug, follow the steps outlined in the "Reproduction" section above.

Additional Notes

The bug is likely caused by the fact that some GPUs are being occupied by other processes, resulting in an unbalanced memory capacity. This is preventing us from running our application smoothly.

Requested Help

Q: What is the bug that is being reported?

A: The bug is causing the memory capacity to be unbalanced, resulting in some GPUs being occupied by other processes. This is preventing us from running our application smoothly.

Q: What is the expected behavior?

A: We expect the memory capacity to be balanced, allowing us to run our application smoothly.

Q: What is the actual behavior?

A: The actual behavior is that the memory capacity is unbalanced, resulting in some GPUs being occupied by other processes.

Q: How can the bug be reproduced?

A: To reproduce the bug, follow these steps:

  1. Run the following command:
python3 -m sglang.launch_server --model-path /models/deepseek --tp 16 --dist-init-addr $HEAD_IP:20000 --nnodes 2 --node-rank ${INDEX} --trust-remote-code --context-length 131072 --host 0.0.0.0 --port 8080 --enable-torch-compile --torch-compile-max-bs 16 --speculative-algo NEXTN --speculative-draft /data04/DeepSeek-R1-NextN --speculative-num-steps 2 --speculative-eagle-topk 4 --speculative-num-draft-tokens 4 --disable-radix
  1. Observe the error messages that appear.

Q: What is the environment in which the bug is occurring?

A: The environment in which the bug is occurring is as follows:

  • Python: 3.10.12
  • CUDA: available
  • GPU: NVIDIA H20
  • Compute Capability: 9.0
  • CUDA_HOME: /usr/local/cuda
  • NVCC: Cuda compilation tools, release 12.4, V12.4.131
  • CUDA Driver Version: 535.161.08
  • PyTorch: 2.5.1+cu124
  • sglang: 0.4.3.post2
  • sgl_kernel: 0.0.3.post6
  • flashinfer: 0.2.1.post2
  • triton: 3.1.0
  • transformers: 4.48.2
  • torchao: 0.8.0
  • numpy: 1.26.4
  • aiohttp: 3.9.3
  • fastapi: 0.115.8
  • hf_transfer: 0.1.9
  • huggingface_hub: 0.28.1
  • interegular: 0.3.3
  • modelscope: 1.22.3
  • orjson: 3.10.15
  • packaging: 23.2
  • psutil: 5.9.4
  • pydantic: 2.10.6
  • multipart: 0.0.20
  • zmq: 25.1.2
  • uvicorn: 0.34.0
  • uvloop: 0.21.0
  • vllm: 0.6.4.post1
  • openai: 1.60.2
  • anthropic: 0.45.2
  • decord: 0.6.0

Q: What is the cause of the bug?

A: The bug is likely caused by the fact that some GPUs are being occupied by other processes, resulting in an unbalanced memory capacity.

Q: How can the bug be fixed?

A: We are still investigating the cause of the bug and are working on a fix. In the meantime, we recommend avoiding running multiple processes on the same GPU to prevent the memory capacity from becoming unbalanced.

Q: Is there any additional information that can be provided to help resolve the bug?

A: Yes, any additional information that can be provided to help resolve the bug is greatly appreciated. This can include:

  • Any error messages that appear
  • Any relevant logs or output
  • Any additional details about the environment in which the bug is occurring

We will do our best to provide a fix for the bug as soon as possible.