Skip to content

Conversation

@francoishamon
Copy link
Contributor

@francoishamon francoishamon commented Feb 9, 2021

This PR implements a Dead-Oil model for the GPU using TableFunction. The new DeadOilFluid class inherits from the MultiFluidBase class. The compute function uses TableFunction lookups to update densities and viscosities, and performs the (trivial) assignment of the phase fractions and phase component fractions. This is just a prototype to open the discussion with @TotoGaz (and everyone interested) on a design for the fluid updates on GPU.

Note:

  • This PR does not contain the modifications to CompositionalMultiphaseFlowKernels (changing to parallelDevicePolicy<>, adding GEOSX_HOST_DEVICE in the RAJA forAll loop, etc) that are needed to actually compute the updates on device. For now I am just doing that locally in a separate branch to profile the assembly.

Fixes #1069

Related to https://github.com/GEOSX/integratedTests/pull/123

@francoishamon francoishamon changed the title implemented Dead-Oil model working on device Implemented Dead-Oil model working on device Feb 9, 2021

// 2. Water phase: use the constant formation volume factor and compressibility provided by the user

using PT = DeadOilFluid::PhaseType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may benefit from some using PhaseType = DeadOilFluid::PhaseType; as a nested type?

@francoishamon francoishamon marked this pull request as ready for review February 23, 2021 19:24
@francoishamon francoishamon added flag: ready for review flag: requires rebaseline Requires rebaseline branch in integratedTests labels Feb 25, 2021
Copy link
Contributor

@klevzoff klevzoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Just a few comments

@klevzoff
Copy link
Contributor

klevzoff commented Feb 25, 2021

We can do the following trick to allow this model to run on device: in each model class that derives from MultiFluidBase, add an exec_policy alias that specifies the maximum allowed policy. Something like:

class BlackOilFluid : public MultiFluidPVTPackageWrapper
{
  using exec_policy = parallelHostPolicy; // according to Thomas, OpenMP is fine for PVTPackage
}
class DeadOilFluid : public MultiFluidBase
{
  using exec_policy = parallelDevicePolicy<>;
}

and then in CompositionalMultiphaseBase::updateFluidModel:

  constitutive::constitutiveUpdatePassThru( fluid, [&] ( auto & castedFluid )
  {
    using FluidType = TYPEOFREF( castedFluid );
    using ExecPolicy = typename FluidType::exec_policy;
    typename FluidType::KernelWrapper fluidWrapper = castedFluid.createKernelWrapper();

    FluidUpdateKernel::launch< ExecPolicy >( dataGroup.size(),
                                             fluidWrapper,
                                             pres,
                                             dPres,
                                             m_temperature,
                                             compFrac );
  } );

With this little change, you should be able to uncomment all GEOSX_HOST_DEVICE markers prior to merging this PR. It's up to you whether you want to do it or leave until a separate PR.

@TotoGaz
Copy link
Contributor

TotoGaz commented Feb 26, 2021

// according to Thomas, OpenMP is fine for PVTPackage

What I've more or less checked is that the side effect which prevents parallelism is clearly (?) identified.
But it's most probably still here (one should double check this first).

@francoishamon
Copy link
Contributor Author

With this little change, you should be able to uncomment all GEOSX_HOST_DEVICE markers prior to merging this PR.

@klevzoff this would be great! I am trying the code you wrote above in the CO2 PR right now, but there is one issue that I don't know how to resolve. After adding the piece of code above, I add GEOSX_HOST_DEVICE in the lambda of the RAJA loop:

forAll< POLICY >( size, [=] GEOSX_HOST_DEVICE ( localIndex const k )                                                                      
{                                                                                                                                         
    for( localIndex q = 0; q < fluidWrapper.numGauss(); ++q )                                                                               
    {                                                                                                                                       
        fluidWrapper.update( k, q, pres[k], temp, compFrac[k] );                                                                              
    }                                                                                                                                       
} );   

and I uncomment the GEOSX_HOST_DEVICE markers for compute and update in the derived fluid model (here, that is DeadOilFluidUpdate and in the other PR, it is NewMultiPhaseMultiComponentFluidUpdate). The remaining question is what to do with the pure virtual compute and update functions of the base MultiFluidBaseUpdate class:

  • If I don't add GEOSX_HOST_DEVICE to these functions, the CUDA compiler complains that the pure virtual function is __host__ and the overriding function in DeadOilFluidUpdate or NewMultiPhaseMultiComponentFluidUpdate is __host_device__
  • If I add GEOSX_HOST_DEVICE to these functions, then the compiler forces me to propagate the GEOSX_HOST_DEVICE markers to MultiFluidPVTPackageWrapperUpdate, and if I do that, it complains about the call to Update from PVTPackage...

Do you know if there is a workaround? If there is no workaround, it is ok for now, I can just keep working on a separate branch to do the tests of Lassen

@klevzoff
Copy link
Contributor

klevzoff commented Feb 26, 2021

@francoishamon oof, that is unfortunate. Technically, compute() and update() methods in wrapper classes needn't be virtual at all - we never call them through dynamic dispatch. The only thing virtual does for us is enforce (and document) a common interface with better compiler diagnostics in case a new model violates it. I'd argue this is a rare case and new models aren't added all that frequently, and so would be in favor of removing the virtual and override keywords, which would also force us to remove method declarations from the base class (or face possible shadowing warnings).

There's one simple thing to try before though - hide the call to m_fluid.Update(...) in MultiFluidPVTPackageWrapperUpdate behind #ifndef __CUDACC__ and see if it helps. If it does, we can postpone the virtual discussion until later.

@francoishamon
Copy link
Contributor Author

francoishamon commented Feb 26, 2021

Great it's working. I removed the update and compute functions from MultiFluidBaseUpdate, I removed the virtual and override keywords in the derived classes (except the GPU-ready one). I also had to duplicate the implementation of the launch in CompositionaMultiphaseBaseKernels.cpp to be able to compile. Specifically, I have to have a CPU launch without GEOSX_HOST_DEVICE like:

FluidUpdateKernel::                                                                                                                           
launch< parallelHostPolicy,                                                                                                                   
        MultiFluidPVTPackageWrapper::KernelWrapper >( localIndex const size,                                                                  
                                                      MultiFluidPVTPackageWrapper::KernelWrapper const & fluidWrapper,                        
                                                      arrayView1d< real64 const > const & pres,                                               
                                                      arrayView1d< real64 const > const & dPres,                                              
                                                      real64 const temp,                                                                      
                                                      arrayView2d< real64 const > const & compFrac )                                          
{                                                                                                                                             
  forAll< parallelHostPolicy >( size, [=] ( localIndex const k )                                                                              
  {                                                                                                                                           
    for( localIndex q = 0; q < fluidWrapper.numGauss(); ++q )                                                                                 
    {                                                                                                                                         
      fluidWrapper.update( k, q, pres[k] + dPres[k], temp, compFrac[k] );                                                                     
    }                                                                                                                                         
  } );                                                                                                                                        
}

And another launch for NewMultiPhaseMultiComponentFluid that looks like (name of the class is temporary):

template<>                                                                                                                                    
void                                                                                                                                          
FluidUpdateKernel::                                                                                                                           
launch< parallelDevicePolicy<>,                                                                                                               
        NewMultiPhaseMultiComponentFluid::KernelWrapper >( localIndex const size,                                                             
                                                           NewMultiPhaseMultiComponentFluid::KernelWrapper const & fluidWrapper,              
                                                           arrayView1d< real64 const > const & pres,                                          
                                                           arrayView1d< real64 const > const & dPres,                                         
                                                           real64 const temp,                                                                 
                                                           arrayView2d< real64 const > const & compFrac )                                     
{                                                                                                                                             
  forAll< parallelDevicePolicy<> >( size, [=] GEOSX_HOST_DEVICE ( localIndex const k )                                                        
  {                                                                                                                                           
    for( localIndex q = 0; q < fluidWrapper.numGauss(); ++q )                                                                                 
    {                                                                                                                                         
      fluidWrapper.update( k, q, pres[k] + dPres[k], temp, compFrac[k] );                                                                     
    }                                                                                                                                         
  } );                                                                                                                                        
}

If this is ok as a temporary solution, this is going to save me a lot of time :)

I also tried the #ifndef __CUDACC__ strategy, but I realized that I would have ended up ifndef'ing a lot of code in the MultiFluidPVTPackageWrapperUpdate (a lot of getter calls after the Update) and (for the present branch) in MultiPhaseMultiComponentFluidUpdate.

@klevzoff
Copy link
Contributor

I also tried the #ifndef __CUDACC__ strategy, but I realized that I would have ended up ifndef'ing a lot of code in the MultiFluidPVTPackageWrapperUpdate (a lot of getter calls after the Update) and (for the present branch) in MultiPhaseMultiComponentFluidUpdate.

You could also just #ifdef out the entire body of that function. The point is it should never be called from the device anyway (if we specify execution policies correctly), so it doesn't matter what's inside as long as it compiles :)

@francoishamon
Copy link
Contributor Author

Ok good I missed that point, then it is easier. I just gave it a try quickly (but there is a high possibility that I made a mistake somewhere, so I will double-check tomorrow): if I keep the MultiFluidBase interface, with GEOSX_HOST_DEVICE markers for all the compute and update functions (base and derived classes) and I ifndef out the body of the update and compute functions for all the non-GPU ready models, I get a compilation error on Lassen:

src/coreComponents/constitutive/fluid/MultiPhaseMultiComponentFluid.hpp(204): 
error: calling a __host__ function("std::__shared_count<( ::__gnu_cxx::_Lock_policy)2> ::~__shared_count") from a __host__ __device__ function("geosx::constitutive::MultiPhaseMultiComponentFluidUpdate::~MultiPhaseMultiComponentFluidUpdate") is not allowed                                                                 
                                                                                                                                              
src/coreComponents/constitutive/fluid/MultiPhaseMultiComponentFluid.hpp(204): 
error: calling a __host__ function("std::vector<    ::std::shared_ptr<const  ::geosx::PVTProps::PVTFunction> , ::std::allocator<    ::std::shared_ptr<const  ::geosx::PVTProps::PVTFunction> > > ::~vector") from a __host__ __device__ function("geosx::constitutive::MultiPhaseMultiComponentFluidUpdate::~MultiPhaseMultiComponentFluidUpdate") is not allowed 

It is as if the update function was still calling some kind of function (destructor?) related to the shared pointers present in the class, even with an ifndef'ed out body... I am stopping here, and I will try again tomorrow. Thanks for the help!

@klevzoff
Copy link
Contributor

Seems that MultiPhaseMultiComponentFluidUpdate doesn't have a user-provided destructor; perhaps nvcc is generating one and makes it __host__ __device__ by default? Maybe adding one (and not making it host-device) would fix it - as long as it's never called from a device kernel (which it shouldn't be). Also, the destructors would also have to be made non-virtual - they have the same problems as other methods (w.r.t. host-device markers). This goes against common advice for class hierarchies, but in this case it's legitimate: we're never going to delete these objects via pointer-to-base.

If you don't want to spend more time on these build issues, feel free to push the current state and I'll try to take a look. Alternatively, let's merge as is and pursue device compilation in a separate PR.

@francoishamon
Copy link
Contributor Author

francoishamon commented Feb 26, 2021

@klevzoff I implemented your suggestion (#ifndef ...) in the CO2-brine on GPU branch (#1325 ), it seems to work great. I was not able to overcome the compilation error that I mentioned above, even with the addition of a destructor. So I figured that the best use of our time was to just remove the old implementation of MultiPhaseMultiComponentFluidUpdate (that I was only keeping for comparison purposes) and with that everything compiles and works.

So to summarize I added the GEOSX_HOST_DEVICE markers to all the compute and update functions in the base class and the derived classes, I used #ifndef __CUDA_ACC__ in the compute and update functions of MultiFluidPVTPackageWrapper, and I removed the old implementation of MultiPhaseMultiComponentFluidUpdate (which I was planning to remove at the end anyway).

One thing that broke: if I use parallelHostPolicy for CompositionalMultiphaseFluid, the compositional unit tests (testCompMultiphaseFlow) return garbage results and break (I did not invest time to understand why to be honest). So I kept serialPolicy for the BlackOilFluid and CompositionalMultiphaseFluid models.

For this Dead-Oil branch, I think it is best to try to merge it (after addressing the comments above) with the GEOSX_HOST_DEVICEmarkers commented out, and wait for the CO2-brine branch to be merged to uncomment everything.

@klevzoff
Copy link
Contributor

One thing that broke: if I use parallelHostPolicy for CompositionalMultiphaseFluid, the compositional unit tests (testCompMultiphaseFlow) return garbage results and break (I did not invest time to understand why to be honest). So I kept serialPolicy for the BlackOilFluid and CompositionalMultiphaseFluid.

Yes, I was wrong on that one. Thomas' tests verified that there are no data races between different instances of PVTPackage fluid objects, but the way we currently use it, there's a single stateful fluid object shared between all cells, and there's most certainly a data race there (as Thomas mentions above). It should remain serial for now.

For this Dead-Oil branch, I think it is best to try to merge it (after addressing the comments above) with the GEOSX_HOST_DEVICEmarkers commented out, and wait for the CO2-brine branch to be merged to uncomment everything.

Great, not objections from me. Feel free to merge whenever.

@klevzoff klevzoff mentioned this pull request Feb 27, 2021
5 tasks
Comment on lines +485 to +496
{
// check that the entry is large enough to enforce the boundary condition
if( entries[j] >= 0 && entries[j] < minDiagonal )
{
entries[ j ] = minDiagonal;
}
else if( entries[j] < 0 && entries[j] > -minDiagonal )
{
entries[ j ] = -minDiagonal;
}
diagonal = entries[j];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klevzoff One thing that did not work well with the analytical derivatives of the new Dead-Oil model was in this function: with analytical Dead-Oil the derivatives of the mass conservation equation of component ic wrt the component density of jc (jc \neq ic) are extremely small (~ 1e-18), and for the first iteration of the first time step, they are simply equal to zero (dR_ic/dRho_jc = 0). So, in the equation/variable ordering of the compositional solver (below), entries[dof] is equal to zero. As a result, when we apply the Dirichlet BC, the matrix becomes singular because we zero out the other entries on this row.

I added a check for the value of entries[dof], and that fixes the issue. However, I did not do anything for the other SpecifyFieldValue function (using LAI::ParallelMatrix) just above because it is not entirely clear to me that we still use it (and where?) since we moved to CRSMatrix for the assembly.

Ordering for the compositional solver

                  dPres dRho_{ic = 0} dRho_{ic = 1} dRho_{ic = 2}
Mass Cons ic = 0
Mass Cons ic = 1
Mass Cons ic = 2
Vol Balance

Copy link
Contributor

@klevzoff klevzoff Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francoishamon I don't think we use the other version of SpecifyFieldValue (in fact, it needs to be removed), so this fix should be just fine.

As for the equation/variable alignment, fixing this has been on the radar since the beginning but I never got around to implementing it. The plan was to have the first equation be the total mass conservation (which would align with pressure DoF), followed by component mass cons from ic=0 to NC-2; and also have the last DoF be total density instead of last component's density (which would align with vol balance). This ordering would guarantee nonzero diagonals and would probably lead to slightly more stable simulations.

I was initially hoping to achieve this with kernel-usable dense LA approach (pre- and post-multiplying each cell's local jacobian by small matrices to rearrange rows and columns into other linear combinations), which we didn't have at the time; now we could try something with tensorOps. However, there's a lot of zeros in these permutation matrices, so a hand-rolled reordering routine might be better.

The equation lineup isn't too difficult to change by hand at the final stage of the kernel where we assemble component mass balances. Changing the last DOF is a more annoying task. Both changes will propagate to other kernels (residual norm, apply/check solution, etc). Reservoir-well solvers would be affected as well, so would future multiphase poromechanics. Ultimately, it's a fairly extensive change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, this seems to be an equation/variable alignment that would be nice to have.

Regarding the change in equation lineup, it would be really useful to have the choice for the first equation between 1) total mass conservation, and 2) the pressure equation of the sequential implicit method. If I understood correctly the way the SI pressure equation is typically constructed, we need to compute weights (and their derivatives wrt pressure) that are used to form a linear combination of the mass conservation equations that eliminates the derivatives wrt to the non-pressure dofs. I think this needs to happen at the nonlinear level, since the derivatives of the weights wrt pressure need to be introduced (but maybe there is a smarter way to do it entirely at the linear level, I am not sure yet). I am saying all this because @jafranc will be implementing his research on the SI method in GEOSX, and this capability will be very useful to have at that point.

Just to make sure I understand, when you suggest the tensorOps way and/or the hand-rolled reordering, would you do that A) after each compute function (accumulation, flux, wells, etc) when we call the RAJA atomic adds, or B) just once when the CRSMatrix is fully assembled and it is time to apply the boundary conditions? If the answer is B), maybe we could try this first, less invasive step before addressing the change in the last dof?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking about A (specifically because a dense pre/post-multiply would be really easy and fast to do when you have a local dense block on the thread stack). However B sounds quite feasible for re-arranging equations, in particular because we can apply it to entire assembled rows as you mention, so it works effortlessly for coupled solvers. The downside is we need to do an extra kernel launch and operate on global memory again, but it's probably not too bad. This approach also seems less convenient for forming SI pressure equation because you need to manipulate individual row entries (pressure derivatives) in CSR as opposed to just adding entire rows (which you can do blindly, provided that sparsity patterns are the same between rows). But we should definitely explore this idea.

@francoishamon francoishamon merged commit 80ab48f into develop Mar 10, 2021
@francoishamon francoishamon deleted the feature/hamon/do branch March 10, 2021 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flag: ready for review flag: requires rebaseline Requires rebaseline branch in integratedTests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set up tutorials 4, 5, and 6 to run faster

4 participants