Skip to content

How to enable support for larger sequence lengths? #2

@QAZWSX0827

Description

@QAZWSX0827

Hello,

I encountered an issue while running a script and would appreciate your help.

I need to generate longer token sequences, so I modified the MAX_NUM_TOKENS value in include/flexflow/batch_config.h from its default(1024) to 16384. However, when I run the script and try to increase the sequence length to 1024, the program crashes with the following error:

{gpu}: CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS)
python: /home/wutong/SpecInfer_24_new2/deps/legion/runtime/realm/cuda/cuda_module.cc:343: bool Realm::Cuda::GPUStream::reap_events(Realm::TimeLimit): Assertion `0' failed.

After debugging, I found that the issue originates in the serve_spec_infer function in src/runtime/request_manager.cc, where the call to get_void_result() causes the crash.

Steps I have tried:
1.Modified MAX_NUM_TOKENS and recompiled the program.
2.Checked GPU memory availability.
3.Reduced the sequence length—shorter sequences run without errors.
Despite these efforts, the error persists.

My questions:
1.Are there additional configurations or files that need to be updated to support longer sequence lengths?(such as 1024,2048)
2.Could this error be related to GPU memory allocation or CUDA kernel settings? How can I further debug or resolve this issue?
Any guidance would be greatly appreciated. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions