Skip to content

Add Import-DbaParquet command#10309

Open
jovanpop-msft wants to merge 14 commits into
dataplat:developmentfrom
jovanpop-msft:copilot/import-dba-parquet-coverage
Open

Add Import-DbaParquet command#10309
jovanpop-msft wants to merge 14 commits into
dataplat:developmentfrom
jovanpop-msft:copilot/import-dba-parquet-coverage

Conversation

@jovanpop-msft
Copy link
Copy Markdown

Introduce the Import-DbaParquet command along with enhancements for schema management, error handling, and stability. Implement exact size logic for varchar and varbinary types, and add a parameter to store strings as UTF-8. Fix issues related to binary type mapping and automatic table creation.

jovanpop-msft and others added 9 commits April 1, 2026 19:22
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardrails

(do Import-DbaParquet)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Author

@jovanpop-msft jovanpop-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WIP] Added Import-DbaParquet

Copy link
Copy Markdown
Author

@jovanpop-msft jovanpop-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Import-DbaParquet

@jovanpop-msft jovanpop-msft changed the title Add Import-DbaParquet command [DRAFT] Add Import-DbaParquet command Apr 3, 2026
@jovanpop-msft jovanpop-msft marked this pull request as draft April 3, 2026 22:39
@niphlod
Copy link
Copy Markdown
Contributor

niphlod commented Apr 3, 2026

hum, not sure we want to add 1MB of added dependency here (also, dependency hell is already ... hellish).
Can those DLLs be externalized, and downloaded if needed ?
Also, do they work for both PS and PSCore ?

@potatoqualitee
Copy link
Copy Markdown
Member

yes, this is a great candidate to be externalized like we did with SqlPackage. I'll take a closer look soon! Thank you so much @jovanpop-msft 🙇🏼

@potatoqualitee
Copy link
Copy Markdown
Member

Still appreciate this, just gotta get some time to take a close look and treat it well 🙏🏼

@potatoqualitee
Copy link
Copy Markdown
Member

working on this now 👍🏼

(do Install-DbaParquet, Get-DbaParquetPath, Import-DbaParquet)
@potatoqualitee potatoqualitee marked this pull request as ready for review May 3, 2026 17:43
@potatoqualitee potatoqualitee changed the title [DRAFT] Add Import-DbaParquet command Add Import-DbaParquet command May 3, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Code Review: Add Import-DbaParquet command

Thanks for this substantial contribution! Parquet import is a highly requested capability and the overall architecture is solid. Here's my review:


Overview

This PR adds two new public commands (Import-DbaParquet, Install-DbaParquet) and one private helper (Get-DbaParquetPath) to support importing Parquet files into SQL Server via Parquet.NET. Commands are correctly registered in both dbatools.psd1 and dbatools.psm1.


Strengths

  • External dependency managed cleanly via Install-DbaParquet with NuGet resolution and offline (-LocalFile) support
  • Good transaction safety with commit/rollback logic
  • Optimize-ColumnSize post-import optimization is a smart touch
  • Parameterized queries used for schema/table existence checks — avoids SQL injection
  • $shouldMapCorrectTypes / Get-TableDefinitionFromInfoSchema pattern mirrors Import-DbaCsv nicely
  • Tests cover a wide range of paths (AutoCreate, Truncate, UTF-8, ordinal mapping, static columns, schema-from-filename)

Issues and Suggestions

Critical / Bugs

1. Get-DbaParquetPathStop-Function + throw double-fires on error

Stop-Function -Message $message -Target "Parquet.NET"
if ($EnableException) {
    throw "Parquet.NET not found"
}

Stop-Function -EnableException:$EnableException already handles the throw internally. The explicit throw after it is redundant and will cause double exceptions. Use the standard pattern:

Stop-Function -Message $message -Target "Parquet.NET" -EnableException $EnableException
return $null

2. New-SqlTable uses backtick for line continuation in verbose message

Write-Message -Level Verbose -Message "Successfully created table $schema.$table with the following column definitions:`n $($sqldatatypes -join "`n ")"

Backtick inside a double-quoted string for \n is fine here (it's not line continuation), but it's worth noting that the Stop-Function -Continue in New-SqlTable silently swallows errors — consider passing -ErrorRecord $_.

3. Get-TableDefinitionFromInfoSchema silently swallows exceptions

} catch {
    # callers report back the error if $result is empty
}

The catch block is entirely empty. Callers only check if $result is empty, so genuine connection errors are silently discarded. At minimum, Write-Message -Level Debug the exception.

4. $transaction referenced before assignment in Get-TableDefinitionFromInfoSchema

$sqlcmd = New-Object Microsoft.Data.SqlClient.SqlCommand($query, $sqlconn, $transaction)

This inner function is called from begin {} scope but $transaction is only set in process {}. In PowerShell's scoping rules this will pick up $null from an outer scope rather than failing cleanly.

5. $shouldMapCorrectTypes set but never used
The flag is set to $true when a table already exists (line ~1053) and $false on auto-create, but it is never checked in the bulk copy / column mapping logic below. The variable appears to be dead code carried over from Import-DbaCsv.

6. Final end {} block references $sqlconn/$bulkCopy out of scope

end {
    try {
        if (-not $startedWithAnOpenConnection) {
            $null = $sqlconn.Close()   # $sqlconn is set inside process foreach

When processing multiple files or instances, $sqlconn/$bulkCopy/$parquetReader in end {} will reference whatever the last loop iteration left behind. This cleanup is already done inside process {} in the finally block — the end {} cleanup is a redundant duplicate and could close a connection that was intentionally left open (user-provided connection).


Style / Convention Issues

7. [bool]$StoreStringAsUtf8 = $true should be a switch
Per CLAUDE.md: "Use [switch] for boolean flags, not [bool] parameters". This creates an awkward calling pattern (-StoreStringAsUtf8:$false) which is mentioned in the docs but is non-standard for dbatools. Consider making it -NoUtf8 as a switch instead, or keeping [bool] with a note explaining the conscious deviation.

8. $sqldatatypes = @(); — trailing semicolon
Minor: @(); has an unnecessary semicolon.

9. .NOTES author inconsistency
Import-DbaParquet credits Chrissy LeMaire (@cl) with a 2018 copyright (copy-paste from Import-DbaCsv), while Install-DbaParquet and Get-DbaParquetPath credit dbatools team with 2026. The actual author (jovanpop-msft) should be credited. Per CLAUDE.md conventions: "Include Claude as author — List 'the dbatools team + Claude' in .NOTES when creating commands."

10. Parallel parameters exposed but not implemented
-Parallel, -ThrottleLimit, -ParallelBatchSize are declared and immediately short-circuit with Stop-Function. Consider either removing them entirely until implemented, or at least not documenting them as "reserved for future use" in user-facing help — it creates confusion.

11. Missing commit message (do ...) CI pattern
Per CLAUDE.md: "Always include the (do ...) pattern" in commit messages to limit CI test runs. The PR title doesn't follow this convention. Suggested: Add Import-DbaParquet and Install-DbaParquet (do *Parquet*).


Performance

12. Row-by-row DataTable construction in PowerShell
Get-ParquetDataTable constructs a DataTable by iterating every row in PowerShell script, which will be slow for large files. For very large Parquet row groups (millions of rows) this will be the bottleneck. Consider adding a note in the docs about this limitation. A future improvement could be a native .NET IDataReader wrapper around the Parquet column arrays.

13. Optimize-ColumnSize runs outside the transaction
This is intentional (noted in comments) but it means a table can be left in a half-optimized state if the optimization is interrupted. Not a blocking issue, just worth a doc note.


Minor

  • $periodFound variable is used without initialization — if $PSBoundParameters.Table is not set and the filename has no period, $periodFound will be $null/$false, which is the correct fallback behavior, but it's fragile. Initialize to $false explicitly.
  • Convert-ParquetTypeToSqlType uses $ClrType (capital C) while the parameter is $clrType (lowercase) — PowerShell is case-insensitive, so this works, but it's inconsistent.
  • The $file variable is reused for both the resolved path AND the filename stem ($filename = [IO.Path]::GetFileNameWithoutExtension($file)), then $file is passed to Get-ParquetReader which expects the full path. This works because $file is set before $filename is reassigned, but it's confusing naming.

Summary

The core functionality is well-implemented and the test coverage is good. The main things to address before merge:

  1. Fix the double-exception pattern in Get-DbaParquetPath
  2. Remove the dead $shouldMapCorrectTypes variable or wire it up
  3. Remove or guard the duplicate cleanup in end {}
  4. Fix the .NOTES author/copyright fields
  5. Consider removing the unimplemented -Parallel family of parameters

Great addition to dbatools!

(do Install-DbaParquet, Import-DbaParquet)
Rename the AutoCreateTable string flag from StoreStringAsUtf8 to a clearer switch NoUtf8 (invert logic by setting StoreStringAsUtf8 = -not $NoUtf8). Update public docs, parameter help, and tests to reflect the new switch. Remove unused parallel-related parameters and associated tests, tighten error handling (pass EnableException to Stop-Function, propagate ErrorRecord), fix naming/casing and quoting inconsistencies, and improve SQL parameter usage and switch literal consistency. Also add a transaction parameter to GetTableDefinition and update authorship/copyright metadata to 2026.

(do Install-DbaParquet, Import-DbaParquet)
(do Install-DbaParquet, Import-DbaParquet)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants