1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
| #执行如下命令时报错,报错信息的部分如下: root@deepseek1:/vllm-workspace# vllm serve /root/.cache/huggingface/hub/models/deepseek-ai/DeepSeek-R1 \ --tensor-parallel-size 8 \ --pipeline-parallel-size 4 \ --trust-remote-code \ --host 0.0.0.0 \ --port 8000 \ --enforce-eager ... Loading safetensors checkpoint shards: 95% Completed | 155/163 [00:10<00:00, 22.69it/s] Loading safetensors checkpoint shards: 97% Completed | 158/163 [00:11<00:00, 18.17it/s] Loading safetensors checkpoint shards: 99% Completed | 161/163 [00:11<00:00, 18.87it/s] Loading safetensors checkpoint shards: 100% Completed | 163/163 [00:11<00:00, 14.15it/s]
INFO 02-22 06:50:13 model_runner.py:1115] Loading model weights took 18.1361 GB (RayWorkerWrapper pid=5507) INFO 02-22 06:50:18 model_runner.py:1115] Loading model weights took 18.1361 GB (pid=1451, ip=10.119.165.141) INFO 02-22 06:49:57 __init__.py:190] Automatically detected platform cuda. [repeated 31x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.) (RayWorkerWrapper pid=1447, ip=10.119.165.138) INFO 02-22 06:50:01 cuda.py:161] Using Triton MLA backend. [repeated 61x across cluster] (RayWorkerWrapper pid=1451, ip=10.119.165.141) WARNING 02-22 06:49:59 triton_decode_attention.py:44] The following error message 'operation scheduled before its operands' can be ignored. [repeated 30x across cluster] (RayWorkerWrapper pid=5486) INFO 02-22 06:50:01 utils.py:950] Found nccl from library libnccl.so.2 [repeated 61x across cluster] (RayWorkerWrapper pid=5486) INFO 02-22 06:50:01 pynccl.py:69] vLLM is using nccl==2.21.5 [repeated 61x across cluster] (RayWorkerWrapper pid=1447, ip=10.119.165.138) INFO 02-22 06:50:01 custom_all_reduce_utils.py:244] reading GPU P2P access cache from /root/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json [repeated 30x across cluster] (RayWorkerWrapper pid=1444, ip=10.119.165.141) INFO 02-22 06:50:01 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_9dec0a20'), local_subscribe_port=44443, remote_subscribe_port=None) [repeated 2x across cluster] (RayWorkerWrapper pid=1451, ip=10.119.165.141) INFO 02-22 06:50:01 model_runner.py:1110] Starting to load model /root/.cache/huggingface/hub/models/deepseek-ai/DeepSeek-R1... [repeated 30x across cluster] (RayWorkerWrapper pid=1451, ip=10.119.165.141) WARNING 02-22 06:50:01 utils.py:159] The model class DeepseekV3ForCausalLM has not defined `packed_modules_mapping`, this may lead to incorrect mapping of quantized or ignored modules [repeated 30x across cluster] (RayWorkerWrapper pid=1450, ip=10.119.165.141) INFO 02-22 06:50:23 model_runner.py:1115] Loading model weights took 23.1804 GB [repeated 23x across cluster] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] Error executing method 'determine_num_available_blocks'. This might cause deadlock in distributed execution. (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] Traceback (most recent call last): (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 35, in wrapper (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return fn(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/language/core.py", line 993, in to (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return semantic.cast(self, dtype, _builder, fp_downcast_rounding) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/language/semantic.py", line 759, in cast (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89" (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] AssertionError: fp8e4nv data type is not supported on CUDA arch < 89 (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] The above exception was the direct cause of the following exception: (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] Traceback (most recent call last): (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return run_method(target, method, args, kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return func(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return func(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 229, in determine_num_available_blocks (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] self.model_runner.profile_run() (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return func(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1235, in profile_run (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] self._dummy_run(max_num_batched_tokens, max_num_seqs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1346, in _dummy_run (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] self.execute_model(model_input, kv_caches, intermediate_tensors) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return func(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1719, in execute_model (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] hidden_or_intermediate_states = model_executable( (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 687, in forward (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] hidden_states = self.model(input_ids, positions, kv_caches, (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 172, in __call__ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return self.forward(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 643, in forward (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] hidden_states, residual = layer(positions, hidden_states, (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 561, in forward (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] hidden_states = self.self_attn( (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 473, in forward (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ckq = self.q_a_proj(hidden_states)[0] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return self._call_impl(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return forward_call(*args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 248, in forward (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] output = self.quant_method.apply(self, x, bias) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 359, in apply (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return apply_w8a8_block_fp8_linear( (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 70, in apply_w8a8_block_fp8_linear (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] q_input, x_scale = per_token_group_quant_fp8(input_2d, (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 307, in per_token_group_quant_fp8 (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] _per_token_group_quant_fp8[(M, )]( (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 345, in <lambda> (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 662, in run (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] kernel = self.compile( (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 276, in compile (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] module = src.make_ir(options, codegen_fns, context) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 113, in make_ir (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] triton.compiler.errors.CompilationError: at 32:10: (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y_ptr += g_id * group_size (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y_q_ptr += g_id * group_size (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y_s_ptr += g_id (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] cols = tl.arange(0, BLOCK) # N <= BLOCK (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] mask = cols < group_size (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y = tl.load(y_ptr + cols, mask=mask, other=0.0).to(tl.float32) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] # Quant (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] _absmax = tl.maximum(tl.max(tl.abs(y)), eps) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y_s = _absmax / fp8_max (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] y_q = tl.clamp(y / y_s, fp8_min, fp8_max).to(y_q_ptr.dtype.element_ty) (RayWorkerWrapper pid=1444, ip=10.119.165.140) ERROR 02-22 06:50:24 worker_base.py:574] ^ ERROR 02-22 06:50:25 worker_base.py:574] Error executing method 'determine_num_available_blocks'. This might cause deadlock in distributed execution. ...
|