Regarding the code logic:
- As I understand it, the DEQ model uses a neural network to perform the entire iterative optimization process. The f_solver should provide the equilibrium point, as I understand it. Why is there another forward pass during training, and why is the resulting new_z1s and the original z1s used to calculate the JAC_loss? I'm currently working on a vehicle state estimation task. Doesn't this mean that the previous estimated value z1s is used to solve the fixed-point equilibrium equation to obtain new_z1s, and then the dynamic prediction model is run again?
- I see the item() and detach() operations in the Anderson and Broyden solvers. I don't quite understand the basic principles and functions of these two functions. If item() and detach() are used, the gradient is broken. How can the network be effectively trained?
- How should the overall loss function be designed? For a state estimation task, can a weighted fusion of the estimated state MSE and JAC_loss be used? How should the deep equilibrium model be designed to effectively utilize its capabilities?
Regarding the code logic: