Commit 5229cf4
committed
Fix deadlock in results_queue.join() during training
Add a 10-second timeout to results_queue.join() to prevent indefinite
hangs when lingering results aren't properly consumed. If a timeout
occurs, drain any remaining items from the queue to allow training to
continue.
This fixes an issue where training could deadlock between steps if
results from a previous step remained unprocessed in the queue.1 parent c0365f0 commit 5229cf4
1 file changed
+15
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
92 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
93 | 106 | | |
94 | 107 | | |
95 | 108 | | |
| |||
0 commit comments