Skip to content

Conversation

@leoraba
Copy link
Contributor

@leoraba leoraba commented Jun 24, 2025

Summary

Fix issue where processing a submission with multiple files falls in a race condition where some data get lost.

Note: This PR is part of the PR #146. In order to split larger PRs and make it easier for code review, this PR only include fixes for the Insert New Data process.

Issues

Description of Changes

Data Provider Package

The Submission Service now handles batch processing of multiple files or entity datasets, updating submission data accordingly where each submission is updated in the database only once, and the entire dataset is validated a single time per submission.
Making this process in a batch based approach ensures consistency and helps to prevent race conditions.

Extras included:

  • Added Unit tests to new functions, to make sure response is as expected.

Readiness Checklist

  • Self Review
    • I have performed a self review of code
    • I have run the application locally and manually tested the feature
    • I have checked all updates to correct typos and misspellings
  • Formatting
    • Code follows the project style guide
    • Autmated code formatters (ie. Prettier) have been run
  • Local Testing
    • Successfully built all packages locally
    • Successfully ran all test suites, all unit and integration tests pass
  • Updated Tests
    • Unit and integration tests have been added that describe the bug that was fixed or the features that were added
  • Documentation
    • All new environment variables added to .env.schema file and documented in the README
    • All changes to server HTTP endpoints have open-api documentation
    • All new functions exported from their module have TSDoc comment documentation

const resultSubmission = await service.submit({
records: payload,
entityName,
data: { [entityName]: payload },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is significant change, where submit function can handle multiple entities at the same time, allowing the payload to be treated as a batch and being validated only once, avoiding raicing condition.

Copy link
Contributor

@joneubank joneubank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider improving that one test description that I left a comment on. The rest of this change is simple and effective.

Comment on lines 118 to 121
export const isNotNull = <T>(value: T): value is Exclude<T, null> => {
return value !== null;
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function hardly seems necessary... but we can add it since a lot of the methods here are also not needed but seem convenient.

In the newer versions of TS you don't need to declare value is ... in the type signature if the function returns a boolean after narrowing the type. The TS compiler will infer the type narrowing correctly. This is the preferred way to write these function signatures since developers can write the type narrowing wrong, or the function can be modified later causing the type narrowing to be invalid. Best to let the compiler infer it. Generally, consider value is X similar to asserting a type as X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function in fact was used in one occasion, and decided to remove it in favor of notEmpty to avoid maintaining unnecessary code.

},
],
};
it('should build insert records correctly for valid input', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unclear. If this failed I would not know what the test is attempting to do and likely wouldn't be able to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased this sentence to : should convert a collection of raw entity records into typed batches ready for insertion

* @returns An array of valid typed data records.
*/
export const convertToTypedRecords = (dataRecords: Record<string, unknown>[], schema: Schema): DataRecord[] => {
return Object.values(dataRecords).map(convertRecordToString).map(getSchemaParser(schema)).filter(notEmpty);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for .filter(notEmpty)

@leoraba leoraba merged commit 2b8320c into main Jul 8, 2025
2 checks passed
@leoraba leoraba deleted the fix-26-data-loss-submission-insert branch July 8, 2025 19:14
@leoraba leoraba mentioned this pull request Jul 9, 2025
leoraba added a commit that referenced this pull request Jul 9, 2025
* feat: export migration scripts (#113)

* export migration scripts

* export DbConfig from data-model

Export models

* export dist path (#116)

* Feat #120 - Get previous submissions paginated (#121)

* get submission by id

* retrieve submissions by category

* get submission by organization

* relocate submission unit tests

* Add customizable callback for post-commit (#126)

* on finish commit callback

* check record data changes

* fix bytes parsing number (#128)

* Submit data using JSON format (#133)

* remove file reading

* Update types.ts

* fix typescript error & remove deprecated endpoint

* Fix issue comparing Submitted Data property names (#136)

* remove invalid keys

* refactoring update entity data function

* split submitted data unit test file

* logging improvement

* Fix Delete submitted data issue (#137)

* return when record has no data dependencies

* filter unique records to delete

* eslint rule to enforce curly braces

* make curly linting a suggestion warning

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* New Feature - Authentication Middleware Injection (#123)

* retrieve username from userSession

* rename config file

* rename auth middleware

* remove unused code

* custom auth handler

* auth middleware

* fix readme typos

* auth custom handle error codes

* auth write privilege

* log module

* update auth handler readme

* auth configuration

* update auth README

* updated docker compose file

* Update packages/data-provider/src/utils/authUtils.ts

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* Lyric Dev Documentation (#106)

* initializing branch for documentation site

* templated overview page

* minor fix

* minor update

* minor change

* template update

* minor + test

* updated readme

* testing branch commit issue in build

* updating contributing & code of conduct

* minor update

* updated overview page

* links

* updated image

* Updated inline with PR feedback

* minor update

* netlify link

* updated cross referenced urls to docs.overture.bio

* removed code of conduct (.github covers this) removed repository structure in readme (it is in the overview)

* updated submission system diagram

* Update README.md

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* Request user authentication (#141)

* auth required on modification endpoints

* get submissions by user name

* lowercase username variable

* update auth custom handler readme

* configure protected methods

* bypass auth function

* Upgrade Lectern Client to 2.0.0-beta.4 (#144)

* fix(auth): update customAuthHandler to accpet async functions

* #26: Fix Submit new data - Data loss on Submission with multiple files (#148)

* fix submit data

* small refactor code

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation (#124)

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation

* validator query endpoint

* external validation config

* update documentation

* validator exists endpoint

* refactor code for readability

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* 0.10.0

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Co-authored-by: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com>
Co-authored-by: Jon Eubank <joneubank@gmail.com>
Co-authored-by: James Lopez <jamestlopez.code@gmail.com>
Co-authored-by: Azher2Ali <121898125+Azher2Ali@users.noreply.github.com>
leoraba added a commit that referenced this pull request Jul 11, 2025
* feat: export migration scripts (#113)

* export migration scripts

* export DbConfig from data-model

Export models

* export dist path (#116)

* Feat #120 - Get previous submissions paginated (#121)

* get submission by id

* retrieve submissions by category

* get submission by organization

* relocate submission unit tests

* Add customizable callback for post-commit (#126)

* on finish commit callback

* check record data changes

* fix bytes parsing number (#128)

* Submit data using JSON format (#133)

* remove file reading

* Update types.ts

* fix typescript error & remove deprecated endpoint

* Fix issue comparing Submitted Data property names (#136)

* remove invalid keys

* refactoring update entity data function

* split submitted data unit test file

* logging improvement

* Fix Delete submitted data issue (#137)

* return when record has no data dependencies

* filter unique records to delete

* eslint rule to enforce curly braces

* make curly linting a suggestion warning

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* New Feature - Authentication Middleware Injection (#123)

* retrieve username from userSession

* rename config file

* rename auth middleware

* remove unused code

* custom auth handler

* auth middleware

* fix readme typos

* auth custom handle error codes

* auth write privilege

* log module

* update auth handler readme

* auth configuration

* update auth README

* updated docker compose file

* Update packages/data-provider/src/utils/authUtils.ts

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* Lyric Dev Documentation (#106)

* initializing branch for documentation site

* templated overview page

* minor fix

* minor update

* minor change

* template update

* minor + test

* updated readme

* testing branch commit issue in build

* updating contributing & code of conduct

* minor update

* updated overview page

* links

* updated image

* Updated inline with PR feedback

* minor update

* netlify link

* updated cross referenced urls to docs.overture.bio

* removed code of conduct (.github covers this) removed repository structure in readme (it is in the overview)

* updated submission system diagram

* Update README.md

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* Request user authentication (#141)

* auth required on modification endpoints

* get submissions by user name

* lowercase username variable

* update auth custom handler readme

* configure protected methods

* bypass auth function

* Upgrade Lectern Client to 2.0.0-beta.4 (#144)

* feat(provider): add data file template download route by data category

* Updating the lectern dependency and fixing imports

* Changes related to the feedback

* Refactoring the request validation

* Refactoring the data-model and code cleanup

* fix: Resolving Build failures

* Refactor: Minor code fixes and adding routes to files

* Refactor: Simplify type annotation

* fix(auth): update customAuthHandler to accpet async functions

* #26: Fix Submit new data - Data loss on Submission with multiple files (#148)

* fix submit data

* small refactor code

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation (#124)

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation

* validator query endpoint

* external validation config

* update documentation

* validator exists endpoint

* refactor code for readability

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* refactor(dictionary): simplify routes and params to use categoryId

* Resolving built issues in schemas.ts

* refactor: move categoryId to path param and clean up controller imports and request parsing

* 0.11.0

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Co-authored-by: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com>
Co-authored-by: Jon Eubank <joneubank@gmail.com>
Co-authored-by: Azher2Ali <121898125+Azher2Ali@users.noreply.github.com>
Co-authored-by: James Lopez <jamestlopez.code@gmail.com>
leoraba added a commit that referenced this pull request Sep 16, 2025
* feat: export migration scripts (#113)

* export migration scripts

* export DbConfig from data-model

Export models

* export dist path (#116)

* Feat #120 - Get previous submissions paginated (#121)

* get submission by id

* retrieve submissions by category

* get submission by organization

* relocate submission unit tests

* Add customizable callback for post-commit (#126)

* on finish commit callback

* check record data changes

* fix bytes parsing number (#128)

* Submit data using JSON format (#133)

* remove file reading

* Update types.ts

* fix typescript error & remove deprecated endpoint

* Fix issue comparing Submitted Data property names (#136)

* remove invalid keys

* refactoring update entity data function

* split submitted data unit test file

* logging improvement

* Fix Delete submitted data issue (#137)

* return when record has no data dependencies

* filter unique records to delete

* eslint rule to enforce curly braces

* make curly linting a suggestion warning

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* New Feature - Authentication Middleware Injection (#123)

* retrieve username from userSession

* rename config file

* rename auth middleware

* remove unused code

* custom auth handler

* auth middleware

* fix readme typos

* auth custom handle error codes

* auth write privilege

* log module

* update auth handler readme

* auth configuration

* update auth README

* updated docker compose file

* Update packages/data-provider/src/utils/authUtils.ts

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* Lyric Dev Documentation (#106)

* initializing branch for documentation site

* templated overview page

* minor fix

* minor update

* minor change

* template update

* minor + test

* updated readme

* testing branch commit issue in build

* updating contributing & code of conduct

* minor update

* updated overview page

* links

* updated image

* Updated inline with PR feedback

* minor update

* netlify link

* updated cross referenced urls to docs.overture.bio

* removed code of conduct (.github covers this) removed repository structure in readme (it is in the overview)

* updated submission system diagram

* Update README.md

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* Request user authentication (#141)

* auth required on modification endpoints

* get submissions by user name

* lowercase username variable

* update auth custom handler readme

* configure protected methods

* bypass auth function

* Upgrade Lectern Client to 2.0.0-beta.4 (#144)

* feat(provider): add data file template download route by data category

* Updating the lectern dependency and fixing imports

* Changes related to the feedback

* Refactoring the request validation

* Refactoring the data-model and code cleanup

* fix: Resolving Build failures

* Refactor: Minor code fixes and adding routes to files

* Refactor: Simplify type annotation

* fix(auth): update customAuthHandler to accpet async functions

* #26: Fix Submit new data - Data loss on Submission with multiple files (#148)

* fix submit data

* small refactor code

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation (#124)

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation

* validator query endpoint

* external validation config

* update documentation

* validator exists endpoint

* refactor code for readability

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* refactor(dictionary): simplify routes and params to use categoryId

* Resolving built issues in schemas.ts

* refactor: move categoryId to path param and clean up controller imports and request parsing

* fix(docker): resolve docker issues not connecting auth

* feat(data): add stream endpoint

* feat(view): add view to stream endpoint

* feat(transformer): add transformer config

* fix(stream): cleanup stream logic and use totalRecords

* fix(stream): fix variable increment logic

* feat(swagger): add swagger for stream

* fix(stream): resolve issues with content-type

* fix(readme): add transformer documentation

* refactor(stream): remove transformer function

* feat(helper): create isDataRecordValue helper function

* fix(util): incorrect return boolean

* chore(import): remove unused import

* chore(logger): update logger message

* feat(entityNames): add entity names filter option and comments

* feat(swagger): add entityName param

* fix(stream): bail if no data is returned

* Update Dockerfile

* 0.12.0

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Co-authored-by: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com>
Co-authored-by: Jon Eubank <joneubank@gmail.com>
Co-authored-by: Azher2Ali <121898125+Azher2Ali@users.noreply.github.com>
Co-authored-by: James Lopez <jamestlopez.code@gmail.com>
leoraba added a commit that referenced this pull request Sep 25, 2025
* feat: export migration scripts (#113)

* export migration scripts

* export DbConfig from data-model

Export models

* export dist path (#116)

* Feat #120 - Get previous submissions paginated (#121)

* get submission by id

* retrieve submissions by category

* get submission by organization

* relocate submission unit tests

* Add customizable callback for post-commit (#126)

* on finish commit callback

* check record data changes

* fix bytes parsing number (#128)

* Submit data using JSON format (#133)

* remove file reading

* Update types.ts

* fix typescript error & remove deprecated endpoint

* Fix issue comparing Submitted Data property names (#136)

* remove invalid keys

* refactoring update entity data function

* split submitted data unit test file

* logging improvement

* Fix Delete submitted data issue (#137)

* return when record has no data dependencies

* filter unique records to delete

* eslint rule to enforce curly braces

* make curly linting a suggestion warning

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* New Feature - Authentication Middleware Injection (#123)

* retrieve username from userSession

* rename config file

* rename auth middleware

* remove unused code

* custom auth handler

* auth middleware

* fix readme typos

* auth custom handle error codes

* auth write privilege

* log module

* update auth handler readme

* auth configuration

* update auth README

* updated docker compose file

* Update packages/data-provider/src/utils/authUtils.ts

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>

* Lyric Dev Documentation (#106)

* initializing branch for documentation site

* templated overview page

* minor fix

* minor update

* minor change

* template update

* minor + test

* updated readme

* testing branch commit issue in build

* updating contributing & code of conduct

* minor update

* updated overview page

* links

* updated image

* Updated inline with PR feedback

* minor update

* netlify link

* updated cross referenced urls to docs.overture.bio

* removed code of conduct (.github covers this) removed repository structure in readme (it is in the overview)

* updated submission system diagram

* Update README.md

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* Request user authentication (#141)

* auth required on modification endpoints

* get submissions by user name

* lowercase username variable

* update auth custom handler readme

* configure protected methods

* bypass auth function

* Upgrade Lectern Client to 2.0.0-beta.4 (#144)

* feat(provider): add data file template download route by data category

* Updating the lectern dependency and fixing imports

* Changes related to the feedback

* Refactoring the request validation

* Refactoring the data-model and code cleanup

* fix: Resolving Build failures

* Refactor: Minor code fixes and adding routes to files

* Refactor: Simplify type annotation

* fix(auth): update customAuthHandler to accpet async functions

* #26: Fix Submit new data - Data loss on Submission with multiple files (#148)

* fix submit data

* small refactor code

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation (#124)

* Add Validation Endpoint with Configurable Category, Entity, and Field Validation

* validator query endpoint

* external validation config

* update documentation

* validator exists endpoint

* refactor code for readability

---------

Co-authored-by: Leonardo Rivera <leorivera_88@hotmail.com>

* refactor(dictionary): simplify routes and params to use categoryId

* Resolving built issues in schemas.ts

* refactor: move categoryId to path param and clean up controller imports and request parsing

* fix(docker): resolve docker issues not connecting auth

* feat(data): add stream endpoint

* feat(view): add view to stream endpoint

* feat(transformer): add transformer config

* fix(stream): cleanup stream logic and use totalRecords

* fix(stream): fix variable increment logic

* feat(swagger): add swagger for stream

* fix(stream): resolve issues with content-type

* fix(readme): add transformer documentation

* refactor(stream): remove transformer function

* feat(helper): create isDataRecordValue helper function

* fix(util): incorrect return boolean

* chore(import): remove unused import

* chore(logger): update logger message

* feat(entityNames): add entity names filter option and comments

* feat(swagger): add entityName param

* fix(stream): bail if no data is returned

* Update Dockerfile

* #79: Enable auth to read data (#157)

* enable auth to read data

* read access validation

* filter repository by organization

* custom request user session

* code refactoring

* 0.13.0

* version 0.13.0

---------

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Co-authored-by: Mitchell Shiell <59712867+MitchellShiell@users.noreply.github.com>
Co-authored-by: Jon Eubank <joneubank@gmail.com>
Co-authored-by: Azher2Ali <121898125+Azher2Ali@users.noreply.github.com>
Co-authored-by: James Lopez <jamestlopez.code@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants