-
Notifications
You must be signed in to change notification settings - Fork 97
Implemented Dead-Oil model working on device #1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| // 2. Water phase: use the constant formation volume factor and compressibility provided by the user | ||
|
|
||
| using PT = DeadOilFluid::PhaseType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may benefit from some using PhaseType = DeadOilFluid::PhaseType; as a nested type?
klevzoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Just a few comments
|
We can do the following trick to allow this model to run on device: in each model class that derives from class BlackOilFluid : public MultiFluidPVTPackageWrapper
{
using exec_policy = parallelHostPolicy; // according to Thomas, OpenMP is fine for PVTPackage
}
class DeadOilFluid : public MultiFluidBase
{
using exec_policy = parallelDevicePolicy<>;
}and then in constitutive::constitutiveUpdatePassThru( fluid, [&] ( auto & castedFluid )
{
using FluidType = TYPEOFREF( castedFluid );
using ExecPolicy = typename FluidType::exec_policy;
typename FluidType::KernelWrapper fluidWrapper = castedFluid.createKernelWrapper();
FluidUpdateKernel::launch< ExecPolicy >( dataGroup.size(),
fluidWrapper,
pres,
dPres,
m_temperature,
compFrac );
} );With this little change, you should be able to uncomment all |
What I've more or less checked is that the side effect which prevents parallelism is clearly (?) identified. |
@klevzoff this would be great! I am trying the code you wrote above in the CO2 PR right now, but there is one issue that I don't know how to resolve. After adding the piece of code above, I add and I uncomment the
Do you know if there is a workaround? If there is no workaround, it is ok for now, I can just keep working on a separate branch to do the tests of Lassen |
|
@francoishamon oof, that is unfortunate. Technically, There's one simple thing to try before though - hide the call to |
|
Great it's working. I removed the And another If this is ok as a temporary solution, this is going to save me a lot of time :) I also tried the |
You could also just |
|
Ok good I missed that point, then it is easier. I just gave it a try quickly (but there is a high possibility that I made a mistake somewhere, so I will double-check tomorrow): if I keep the It is as if the |
|
Seems that If you don't want to spend more time on these build issues, feel free to push the current state and I'll try to take a look. Alternatively, let's merge as is and pursue device compilation in a separate PR. |
|
@klevzoff I implemented your suggestion (#ifndef ...) in the CO2-brine on GPU branch (#1325 ), it seems to work great. I was not able to overcome the compilation error that I mentioned above, even with the addition of a destructor. So I figured that the best use of our time was to just remove the old implementation of So to summarize I added the One thing that broke: if I use For this Dead-Oil branch, I think it is best to try to merge it (after addressing the comments above) with the |
Yes, I was wrong on that one. Thomas' tests verified that there are no data races between different instances of PVTPackage fluid objects, but the way we currently use it, there's a single stateful fluid object shared between all cells, and there's most certainly a data race there (as Thomas mentions above). It should remain serial for now.
Great, not objections from me. Feel free to merge whenever. |
| { | ||
| // check that the entry is large enough to enforce the boundary condition | ||
| if( entries[j] >= 0 && entries[j] < minDiagonal ) | ||
| { | ||
| entries[ j ] = minDiagonal; | ||
| } | ||
| else if( entries[j] < 0 && entries[j] > -minDiagonal ) | ||
| { | ||
| entries[ j ] = -minDiagonal; | ||
| } | ||
| diagonal = entries[j]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klevzoff One thing that did not work well with the analytical derivatives of the new Dead-Oil model was in this function: with analytical Dead-Oil the derivatives of the mass conservation equation of component ic wrt the component density of jc (jc \neq ic) are extremely small (~ 1e-18), and for the first iteration of the first time step, they are simply equal to zero (dR_ic/dRho_jc = 0). So, in the equation/variable ordering of the compositional solver (below), entries[dof] is equal to zero. As a result, when we apply the Dirichlet BC, the matrix becomes singular because we zero out the other entries on this row.
I added a check for the value of entries[dof], and that fixes the issue. However, I did not do anything for the other SpecifyFieldValue function (using LAI::ParallelMatrix) just above because it is not entirely clear to me that we still use it (and where?) since we moved to CRSMatrix for the assembly.
Ordering for the compositional solver
dPres dRho_{ic = 0} dRho_{ic = 1} dRho_{ic = 2}
Mass Cons ic = 0
Mass Cons ic = 1
Mass Cons ic = 2
Vol Balance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@francoishamon I don't think we use the other version of SpecifyFieldValue (in fact, it needs to be removed), so this fix should be just fine.
As for the equation/variable alignment, fixing this has been on the radar since the beginning but I never got around to implementing it. The plan was to have the first equation be the total mass conservation (which would align with pressure DoF), followed by component mass cons from ic=0 to NC-2; and also have the last DoF be total density instead of last component's density (which would align with vol balance). This ordering would guarantee nonzero diagonals and would probably lead to slightly more stable simulations.
I was initially hoping to achieve this with kernel-usable dense LA approach (pre- and post-multiplying each cell's local jacobian by small matrices to rearrange rows and columns into other linear combinations), which we didn't have at the time; now we could try something with tensorOps. However, there's a lot of zeros in these permutation matrices, so a hand-rolled reordering routine might be better.
The equation lineup isn't too difficult to change by hand at the final stage of the kernel where we assemble component mass balances. Changing the last DOF is a more annoying task. Both changes will propagate to other kernels (residual norm, apply/check solution, etc). Reservoir-well solvers would be affected as well, so would future multiphase poromechanics. Ultimately, it's a fairly extensive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, this seems to be an equation/variable alignment that would be nice to have.
Regarding the change in equation lineup, it would be really useful to have the choice for the first equation between 1) total mass conservation, and 2) the pressure equation of the sequential implicit method. If I understood correctly the way the SI pressure equation is typically constructed, we need to compute weights (and their derivatives wrt pressure) that are used to form a linear combination of the mass conservation equations that eliminates the derivatives wrt to the non-pressure dofs. I think this needs to happen at the nonlinear level, since the derivatives of the weights wrt pressure need to be introduced (but maybe there is a smarter way to do it entirely at the linear level, I am not sure yet). I am saying all this because @jafranc will be implementing his research on the SI method in GEOSX, and this capability will be very useful to have at that point.
Just to make sure I understand, when you suggest the tensorOps way and/or the hand-rolled reordering, would you do that A) after each compute function (accumulation, flux, wells, etc) when we call the RAJA atomic adds, or B) just once when the CRSMatrix is fully assembled and it is time to apply the boundary conditions? If the answer is B), maybe we could try this first, less invasive step before addressing the change in the last dof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially thinking about A (specifically because a dense pre/post-multiply would be really easy and fast to do when you have a local dense block on the thread stack). However B sounds quite feasible for re-arranging equations, in particular because we can apply it to entire assembled rows as you mention, so it works effortlessly for coupled solvers. The downside is we need to do an extra kernel launch and operate on global memory again, but it's probably not too bad. This approach also seems less convenient for forming SI pressure equation because you need to manipulate individual row entries (pressure derivatives) in CSR as opposed to just adding entire rows (which you can do blindly, provided that sparsity patterns are the same between rows). But we should definitely explore this idea.
This PR implements a Dead-Oil model for the GPU using
TableFunction. The newDeadOilFluidclass inherits from theMultiFluidBaseclass. Thecomputefunction usesTableFunctionlookups to update densities and viscosities, and performs the (trivial) assignment of the phase fractions and phase component fractions. This is just a prototype to open the discussion with @TotoGaz (and everyone interested) on a design for the fluid updates on GPU.Note:
CompositionalMultiphaseFlowKernels(changing toparallelDevicePolicy<>, addingGEOSX_HOST_DEVICEin the RAJAforAllloop, etc) that are needed to actually compute the updates on device. For now I am just doing that locally in a separate branch to profile the assembly.Fixes #1069
Related to https://github.com/GEOSX/integratedTests/pull/123