Skip to content

单机多卡跑gpt2_pretrain.py遇到如下问题 #534

@treestreamymw

Description

@treestreamymw

F20240306 12:52:30.421669 11024 ctrl_client.cpp:54] Check failed: rpc_client_.GetStubAt(i)->CallMethodCtrlMethod::kLoadServer( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 0 lost
*** Check failure stack trace: ***
@ 0x7fa53f8039ca google::LogMessage::Fail()
@ 0x7fa53f803cb2 google::LogMessage::SendToLog()
@ 0x7fa53f803537 google::LogMessage::Flush()
@ 0x7fa53f8060a9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fa535118195 _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv
@ 0x7fa53f81840f execute_native_thread_routine
@ 0x7fa6292476db start_thread
@ 0x7fa62882861f clone

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions