-
Notifications
You must be signed in to change notification settings - Fork 426
Description
What I observed
- From synthesis and bitstream-generation logs (and by reproducing the Xilinx examples), it was found that Vitis’ default logic maps all four IP memory I/O ports to the same U250 memory bank — even though the U250 supports four distinct memory banks.
- Vitis’ default mapping also starts allocation from bank 1 of U250 rather than bank 0, which is shown in
v++_vortex_afu.log
INFO: [SYSTEM_LINK 82-38] [11:29:02] cfgen started: /opt/xilinx/Vitis/2024.2/bin/cfgen -dpa_mem_offload false -dpa_aie_mem_offload false -dmclkid 0 -r /home/steve/git/mule_ray/vortex/build/hw/syn/xilinx/xrt/Init_9_11_xilinx_u250_gen3x16_xdma_4_1_202210_1_hw/_x/link/sys_link/_sysl/.cdb/xd_ip_db.xml -o /home/steve/git/mule_ray/vortex/build/hw/syn/xilinx/xrt/Init_9_11_xilinx_u250_gen3x16_xdma_4_1_202210_1_hw/_x/link/sys_link/cfgraph/cfgen_cfgraph.xml
INFO: [CFGEN 83-0] Kernel Specs:
INFO: [CFGEN 83-0] kernel: vortex_afu, num: 1 {vortex_afu_1}
INFO: [CFGEN 83-2226] Inferring mapping for argument vortex_afu_1.MEM_0 to DDR[1]
INFO: [CFGEN 83-2226] Inferring mapping for argument vortex_afu_1.MEM_1 to DDR[1]
INFO: [CFGEN 83-2226] Inferring mapping for argument vortex_afu_1.MEM_2 to DDR[1]
INFO: [CFGEN 83-2226] Inferring mapping for argument vortex_afu_1.MEM_3 to DDR[1]
INFO: [SYSTEM_LINK 82-37] [11:29:06] cfgen finished successfully
- In contrast, Vortex’s C++ compatibility layer by default allocates memory starting from bank 0, incrementally.
Why this causes a problem
- Under default settings, the Vortex example tries to allocate memory in U250 bank 0. However, bank 0 on the target was not enabled, so the allocation is illegal; the driver prevents the operation and throws an error.
...
./demo -n64
open device connection
[Debug]: device name: xilinx_u250_gen3x16_xdma_shell_4_1
[Debug]: device bdf: 0000:01:00.1
[Debug]: bank_size = 0x400000000, lg2_bank_size_ = 34
info: device name=xilinx_u250_gen3x16_xdma_shell_4_1, memory_capacity=0x1000000000 bytes, memory_banks=4.
data type: integer
number of points: 1024
buffer size: 4096 bytes
allocate device memory
[Debug]: get_bank_info addr=0x10000
[Debug]: calculated idx=0, offset=0x10000
[Debug]: get_bank_info(addr=0x10000, bank=0, offset=0x10000)
allocating bank0...
terminate called after throwing an instance of 'xrt_core::system_error'
what(): failed to allocate userptr bo: Operation not permitted
Aborted (core dumped)
make: *** [../common.mk:112: run-xrt] Error 134Message from dmesg also indicated the same error:
...
[248620.336640] xocl 0000:01:00.1: ffff97b1c248c0c8 check_bo_user_reqs: Bank 0 is marked as unused in axlf
[248620.336644] [drm:xocl_userptr_bo_ioctl [xocl]] *ERROR* object creation failed user_flags 0, size 0x400000000What I tried
- I fixed the mapping by explicitly providing a configuration file that declares the memory connectivity. I attempted synthesis with two mapping schemes:
- One-to-one mapping: map the IP’s four memory ports to the U250’s four banks respectively, but the synthesis reported timing violations (did not meet timing).
[connectivity]
sp=vortex_afu_1.MEM_0:DDR[0]
sp=vortex_afu_1.MEM_1:DDR[1]
sp=vortex_afu_1.MEM_2:DDR[2]
sp=vortex_afu_1.MEM_3:DDR[3]- All-ports-to-bank0: map all IP memory ports to U250 bank0. in this case, synthesis completed and produced a bitstream.
[connectivity]
sp=vortex_afu_1.MEM_0:DDR[0]
sp=vortex_afu_1.MEM_1:DDR[0]
sp=vortex_afu_1.MEM_2:DDR[0]
sp=vortex_afu_1.MEM_3:DDR[0]- Note: the all-to-bank0 mapping is a coarse workaround and may risk later bank operations overwriting previous banks. Given the small memory footprint of the examples, I accepted this trade-off for now to get the examples running.
Run-time behavior & current blocker
- I configured the Vortex demo program to use the all-to-one bitstream, but execution still fails to complete, it hanged after it started the kernel and waited for the kernel to finish until it timeouted.
- From the dumped waveform, all data vortex fetched from memory returns
0xdec0dee3, which appears to be the DECERR indicator of Xilinx / AMD SmartConnect, which means the requested memory address is not mapped to any of the slave memory devices.
Conclusion & request for help
Short summary: Vitis’ cfgen/link step is inferring all four IP memory ports as DDR[1] (i.e. mapping everything to U250 bank 1) while Vortex’s C++ layer allocates starting from bank 0. On my target board bank 0 was not enabled, so the Vortex example’s default allocations fail (driver rejects allocation) and my fallback bitstream (all ports mapped to bank0) either causes a hang or returns 0xdec0dee3 (SmartConnect DECERR — address not mapped). A one-to-one mapping to four banks fails place-and-route due to timing violations; the all-to-one mapping produces a bitstream but runtime still fails.
What I’m asking for: if you’ve seen anything like this, please help with any of the items below:
- How to make a correct one-to-one mapping without timing failures — any suggestions to get MEM_0..MEM_3 to DDR[0..3] while meeting timing?
- DECERR (0xdec0dee3) root causes — as the default address of kernel binary starts from
0x80000000, it should be in the same bank of data buffers of the demo program and cause no error even with all-to-bank0 bitstream. Then what causes such DECERR? - Useful logs / artifacts to inspect — if you want to help debug I can attach/share:
v++_vortex_afu.log,cfgen_cfgraph.xml, produced xclbin/axlf, the connectivity cfgs I used, dmesg output, and the waveform dump. Tell me which files are most useful and I’ll post them.
Thanks in advance — any pointers, example configs, or Xilinx/Vitis options you can share would be really appreciated. If helpful I can post the v++ and cfgen logs and the failing waveform here.