Skip to content

Conversation

@francoishamon
Copy link
Contributor

@francoishamon francoishamon commented Feb 15, 2022

As suggested here, this PR combines the multifluid derivatives (pressure, temperature, component fractions) into a single array to reduce the size of the fluid kernel wrapper and make sure that we can compile on GPU platforms in the thermal PRs #1755 and #1225.

I am getting the following result here:

  • develop: sizeof(fluidWrapper) = 3096 bytes
  • this branch: sizeof(fluidWrapper) = 2136 bytes

So I am confident that I will be able to compile with the thermal option activated.

Requires rebaseline, but no numerical diffs: https://github.com/GEOSX/integratedTests/pull/193

@TotoGaz
Copy link
Contributor

TotoGaz commented Feb 15, 2022

develop: sizeof(fluidWrapper) = 3096 bytes
this branch: sizeof(fluidWrapper) = 2136 bytes

😨 🎉 👍 🥇

Copy link
Contributor

@TotoGaz TotoGaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job so far, can't wait to see if anything can be done about #1225 (comment) 🤞

{
dPhaseVolFrac_dComp[ip][jc] =
(dPhaseFrac_dComp[ip][jc] - phaseVolFrac[ip] * dPhaseDens_dComp[ip][jc]) * phaseDensInv;
(dPhaseFrac[ip][Deriv::dC+jc] - phaseVolFrac[ip] * dPhaseDens[ip][Deriv::dC+jc]) * phaseDensInv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet this Deriv::dC+jc is a trick to put keep the same dimension for all your arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the current version of the PR, the derivatives are grouped and ordered as follows:

  • 0: derivative wrt pressure
  • 1: derivative wrt temperature
  • 2: first derivative wrt component fraction

With this approach, there is only one array storing derivatives for a given physical quantity. For instance, for the phase component fractions (of size: numPhases x numComps), there is one array of derivatives of size numPhases x numComps x numDofs, with numDofs = numComps + 2.

@francoishamon francoishamon marked this pull request as ready for review February 17, 2022 00:38
@francoishamon
Copy link
Contributor Author

The code in the PR is now working and ready for review.

The new implementation is a little more difficult to read than before and introduces a different treatment between the derivatives living in the solver (phaseMobility for instance) and in the constitutive model (phaseFraction), but it reduces a lot the size of the arguments passed in the kernel wrapper, which was the point of this PR.

I would be glad to try implementing this approach: #1225 (comment) and #1225 (comment), if we want something cleaner.

Copy link
Contributor

@klevzoff klevzoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that that, in addition to solving a technical problem, this PR is a net decrease in code size, and readability is not affected much. In fact, I'm glad to get rid of dX_dY names - there was always something unsettling about those underscores violating our naming rules.

internal::ArraySliceOrRef< T, DIM, USD > dPres; /// derivative w.r.t. pressure
internal::ArraySliceOrRef< T, DIM, USD > dTemp; /// derivative w.r.t. temperature
internal::ArraySliceOrRef< T, DIM + 1, USD_DC > dComp; /// derivative w.r.t. composition
internal::ArraySliceOrRef< T, DIM + 1, USD_DC > derivs; /// derivative w.r.t. pressure, temperature, compositions
Copy link
Contributor

@klevzoff klevzoff Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where (along with View class below) where we would potentially add dP, dT and dC methods as described in #1225 (comment) and #1225 (comment) if we agree on that idea. The raw slice derivs need to remain accessible though, because it must be passed directly to applyChainRule.

This would somewhat decrease the flexibility of these classes however. For example, currently they are usable both with and without temperature derivatives - the owner decides whether or not it is present by resizing and then using the derivs array accordingly. By adding a dT() method we are forcing the temperature derivative to be always present. These classes could also be used outside of MultiFluid family, anywhere one need to bundle values with a bunch of derivatives - for example in RelativePermeabilityBase and CapillaryPressureBase, physics solvers, etc.

For that reason I'd propose a refactoring in which this family of classes is moved into some shared place (common probably?) under a different name (ValuesAndDerivatives[View|Slice]?), while here we create MultiFluid-specific wrappers that either inherit from or contain the above classes and add the specialized access methods. I would propose maybe for this to be a separate PR.

I am also not opposed to just keeping the use of Deriv::dP, etc. as explicit offsets for the foreseeable future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I perceive this pattern as really similar (in nature) to the extrinsicMeshData wrappings of #1750 (the current situation brings additional complexity).

There would be an extremely carefully designed implementation (in the current case, some CompositionalVarView, in the case of #1750, the *StencilAccessors), that may be "specialized" depending on the context. That way, if you do not want the dT, then don't ask your smart implementation for space to store dT, and don't provide the dT accessors. Therefore why would we lose flexibility?

For example, it's fine for the underlying implementation to provide variadic accessors real64 & dT( INDICES ... indices ) but providing a more constrained real64 & dT( int a, int b, int c ) member function will be simpler in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the following plan:

  1. We merge this PR so that I can then finish and merge Activated thermal option in CO2-brine fluid model #1755 using the derivative format implemented in this PR
  2. Once Activated thermal option in CO2-brine fluid model #1755 is merged, we work on implementing the ValuesAndDerivatives getters for the constitutive models and solvers

Even though this plan implies a transition period during which Deriv::dP etc will be used, it will allow me to start testing #1225 soon, which is important.

Regarding the idea of implementing getters (dP(), dT(), dC()) to get the derivatives in the kernels, will it slow down the code a little bit? Or not significantly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, if you do not want the dT, then don't ask your smart implementation for space to store dT, and don't provide the dT accessors. Therefore why would we lose flexibility?

What I meant is, if we just added dT() getter to this class, it would become unusable in context where there is no temperature dependence. But of course as you said, if we provide appropriately specialized wrappers for different contexts, there will be no issues 👍

I would vote for the following plan:

Sounds good to me!

Regarding the idea of implementing getters (dP(), dT(), dC()) to get the derivatives in the kernels, will it slow down the code a little bit? Or not significantly?

I expect them to be inlined 100% of the time, so no slowdown, if only a tiny compilation time hit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go! 🚗 💨

I would vote for the following plan:

If I were you, I would open an issue with multiple tick boxes, references to meaningful comments, etc.

Copy link
Contributor

@TotoGaz TotoGaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving forward!

template< typename VEC1, typename VEC2 >
GEOSX_HOST_DEVICE
void copy( integer const N, VEC1 const & v1, VEC2 const & v2 )
void copy( integer const N,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bother you with this, but documenting why we need this firstDeriv could be helpful (in which context we need this, e.g.)

More important, instead of providing firstDeriv, couldn't v2 be a range/slice instead? That would keep this function more mathematical and less implementation aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as we discussed, it would be nice to passe a slice or range, but I don't know how to do it.
Here for example:
https://github.com/GEOSX/GEOSX/blob/3e72aadf62b7083dd7f434052f3d90c8f3470230/src/coreComponents/constitutive/fluid/MultiFluidBase.hpp#L722
my phaseFrac.derivs[ip] is already a slice of size numComp+2, but I don't need first two entries (derivatives with respect to pressure and temperature). I would need a way to skip them, maybe by doing:

applyChainRuleInPlace( numComp, dCompMoleFrac_dCompMassFrac, phaseFrac.derivs[ip][Deriv::dC], work );

Now that I am writing this, I should probably try that to see if it works.

For now, I added the Doxygen documentation in the applyChainRule function to explain why we have to pass Deriv::dC.

Copy link
Contributor

@klevzoff klevzoff Feb 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case v2 represents a general single-subscriptable entity, which may be a local array or an ArraySlice. The latter may be not contiguous in memory, and in that case precomputing an offset pointer is not safe or correct. So we have to rely on the type's operator[] to do the job at all times.

Theoretically it's possible to write a SubSlice type which represents a sub-selection of an ArraySlice along one or more dimensions. The type would store the offset(s) internally and apply as needed. But we don't have it yet.

Comment on lines +1268 to +1273
integer counter = 0;
for( integer iComp = 0; iComp < 3; ++iComp )
{
for( localIndex iPres = 0; iPres < 3; ++iPres )
for( integer iPres = 0; iPres < 3; ++iPres )
{
for( localIndex iTemp = 0; iTemp < 3; ++iTemp )
for( integer iTemp = 0; iTemp < 3; ++iTemp )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that you are changing these localIndex into integer but I do think that you are wasting your energy.
If we really want to do this, then a carefully written search/replace on the whole code base will do the duty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heard the discussion this morning about integer loops but I still do not fully understand. Take this loop

for( integer ip = 0; ip < numPhase; ++ip )
{
  phaseMolecularWeight[ip] = props.getMolecularWeight( m_phaseTypes[ip] ).value;
  phaseMolecularWeight[ip][Deriv::dP] = props.getMolecularWeight( m_phaseTypes[ip] ).dP;
  [...]

I thing the indexing type of phaseMolecularWeight or phaseMolecularWeight is localIndex. So what is the point at using integer instead of phaseMolecularWeight::IndexType which appear to be a localIndex? Performances really get improved there?

Copy link
Contributor

@klevzoff klevzoff Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is not about performance but semantics. A localIndex is a counterpart to globalIndex where "local" and "global" refer, generally speaking, to parallel partition of the physical domain and mesh. Therefore these terms make sense when indexing elements, nodes, faces, etc. which can be either local or global. Model properties like number of phases or components are in a different category - they are much smaller, and the notion of local/global does not apply to them. Ideally we could introduce separate aliases for them, e.g.

using phaseIndex = /* some type */;
using componentIndex = /* some type */;

but I thought this might be a bit overkill. Therefore I started changing, and suggesting to others in reviews (mainly @francoishamon who did most of the work), those indices to a generic integer. As a secondary concern, we also get to save some bytes in kernel wrapper objects, which sometimes need to carry phase/component indices as members. But it's a really minor saving compared to what this PR achieves at scale.

Now, phaseMolecularWeight::IndexType is indeed currently a localIndex. But that is also wrong in my opinion! For certain "small" arrays (e.g. phaseMolecularWeight or componentMolecularWeight) we could be using a smaller type such as integer as index type, and it would also help a bit with reducing the size of kernel arguments. I don't advocate necessarily that we implement this change, because the increasing complexity in the type system has cognitive costs (there would be array1d<T> and some array1dSmallIndex<T> to keep in mind). But neither am I concerned about integer promotions taking place, just like I'd never force someone to change

std::vector< double > v;
v[0] ...

to

v[ std::vector< double >::size_type(0) ] ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the explanations. Some kind of Hungarian stuff on types, in a nutshell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can think of it that way. A type name that conveys meaning makes a certain type of errors less likely. But like everything (and like Hungarian), it's good in moderation.

@francoishamon francoishamon added the flag: requires rebaseline Requires rebaseline branch in integratedTests label Feb 19, 2022
@francoishamon francoishamon force-pushed the refactor/hamon/multifluidDerivatives branch from 211fda5 to b9b88e3 Compare February 23, 2022 05:54
@francoishamon francoishamon added ci: run CUDA builds Allows to triggers (costly) CUDA jobs and removed flag: ready for review labels Feb 23, 2022
@francoishamon francoishamon merged commit 1414a7d into develop Feb 23, 2022
@francoishamon francoishamon deleted the refactor/hamon/multifluidDerivatives branch February 23, 2022 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci: run CUDA builds Allows to triggers (costly) CUDA jobs flag: requires rebaseline Requires rebaseline branch in integratedTests type: cleanup / refactor Non-functional change (NFC)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants