LAST_SEEN | FIRST_SEEN | NAME | SUBOBJECT | TYPE | REASON | MESSAGE | 0001-01-01 00:00:00 | 0001-01-01 00:00:00 | test-trainjob-9vkfq-node-0-0-mm8ds.18a905d50d22bcdc | - | Normal | Scheduled | Successfully assigned test-ns-slrp6/test-trainjob-9vkfq-node-0-0-mm8ds to ip-10-0-141-16.ec2.internal | 2026-04-23 15:25:41 | 2026-04-23 15:25:41 | test-trainjob-9vkfq-node-0-0-mm8ds.18a905d527a189ec | - | Normal | AddedInterface | Add eth0 [10.134.0.30/23] from ovn-kubernetes | 2026-04-23 15:37:40 | 2026-04-23 15:25:41 | test-trainjob-9vkfq-node-0-0-mm8ds.18a905d52945803b | spec.containers{node} | Normal | Pulling | Pulling image "quay.io/opendatahub/odh-training-rocm64-torch29-py312@sha256:8a053c8ee3a4c326b745b2516a291c6b8a6e92defc5406ac2e9590bb742153f6" | 2026-04-23 15:37:24 | 2026-04-23 15:37:24 | test-trainjob-9vkfq-node-0-0-mm8ds.18a90678e605abae | spec.containers{node} | Warning | Failed | Failed to pull image "quay.io/opendatahub/odh-training-rocm64-torch29-py312@sha256:8a053c8ee3a4c326b745b2516a291c6b8a6e92defc5406ac2e9590bb742153f6": unable to pull image or OCI artifact: pull image err: copying system image from manifest list: writing blob: adding layer with blob "sha256:56a6a09b03e81131ca690210efc814701da032525ada07591ffe5d6c5d5a4906"/""/"sha256:0832a7269a80e5ed5e1c5c749a8d30a6b248ce9b970d81f61a4f93a3b72673f9": unpacking failed (error: exit status 1; output: write /opt/rocm-6.4.3/lib/rocblas/library/TensileLibrary_Type_HH_HPA_Fp16Alt_Contraction_l_Ailk_Bjlk_Cijk_Dijk_CU104_gfx90a.co: no space left on device); artifact err: provided artifact is a container image | 2026-04-23 15:37:24 | 2026-04-23 15:37:24 | test-trainjob-9vkfq-node-0-0-mm8ds.18a90678e6063132 | spec.containers{node} | Warning | Failed | Error: ErrImagePull | 2026-04-23 15:37:28 | 2026-04-23 15:37:28 | test-trainjob-9vkfq-node-0-0-mm8ds.18a90679daa4cd23 | spec.containers{node} | Normal | BackOff | Back-off pulling image "quay.io/opendatahub/odh-training-rocm64-torch29-py312@sha256:8a053c8ee3a4c326b745b2516a291c6b8a6e92defc5406ac2e9590bb742153f6" | 2026-04-23 15:37:28 | 2026-04-23 15:37:28 | test-trainjob-9vkfq-node-0-0-mm8ds.18a90679daa54a5a | spec.containers{node} | Warning | Failed | Error: ImagePullBackOff | 2026-04-23 15:46:45 | 2026-04-23 15:46:45 | test-trainjob-9vkfq-node-0-0-mm8ds.18a906fb850b4b7e | spec.containers{node} | Normal | Pulled | Successfully pulled image "quay.io/opendatahub/odh-training-rocm64-torch29-py312@sha256:8a053c8ee3a4c326b745b2516a291c6b8a6e92defc5406ac2e9590bb742153f6" in 9m5.22s (9m5.22s including waiting). Image size: 37114220853 bytes. | 2026-04-23 15:46:45 | 2026-04-23 15:46:45 | test-trainjob-9vkfq-node-0-0-mm8ds.18a906fb8c412ef9 | spec.containers{node} | Normal | Created | Created container: node | 2026-04-23 15:46:45 | 2026-04-23 15:46:45 | test-trainjob-9vkfq-node-0-0-mm8ds.18a906fb8d4ee6b3 | spec.containers{node} | Normal | Started | Started container node | 2026-04-23 15:25:40 | 2026-04-23 15:25:40 | test-trainjob-9vkfq-node-0.18a905d50cb50674 | - | Normal | SuccessfulCreate | Created pod: test-trainjob-9vkfq-node-0-0-mm8ds | 2026-04-23 15:47:09 | 2026-04-23 15:47:09 | test-trainjob-9vkfq-node-0.18a907012c99238b | - | Normal | Completed | Job completed | 2026-04-23 15:47:09 | 2026-04-23 15:47:09 | test-trainjob-9vkfq.18a907012d5a01f9 | - | Normal | AllJobsCompleted | jobset completed successfully |