How to enable support for larger sequence lengths?

Hello,

I encountered an issue while running a script and would appreciate your help.

I need to generate longer token sequences, so I modified the **MAX_NUM_TOKENS** value in **include/flexflow/batch_config.h** from its default（1024） to **16384**. However, when I run the script and try to increase the sequence length to 1024, the program crashes with the following error:

{gpu}: CUDA error reported on GPU 0: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS)
python: /home/wutong/SpecInfer_24_new2/deps/legion/runtime/realm/cuda/cuda_module.cc:343: bool Realm::Cuda::GPUStream::reap_events(Realm::TimeLimit): Assertion `0' failed.

After debugging, I found that the issue originates in the **serve_spec_infer** function in **src/runtime/request_manager.cc,** where the call to **get_void_result()** causes the crash.

Steps I have tried:
1.Modified MAX_NUM_TOKENS and recompiled the program.
2.Checked GPU memory availability.
3.Reduced the sequence length—shorter sequences run without errors.
Despite these efforts, the error persists.

My questions:
1.Are there additional configurations or files that need to be updated to support longer sequence lengths?（such as 1024,2048）
2.Could this error be related to GPU memory allocation or CUDA kernel settings? How can I further debug or resolve this issue?
Any guidance would be greatly appreciated. Thank you!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to enable support for larger sequence lengths? #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to enable support for larger sequence lengths? #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions