Skip to content

Commit 2e8701d

Browse files
author
Max Wang
committed
consolidate to one upload_file method
1 parent eb458b3 commit 2e8701d

4 files changed

Lines changed: 137 additions & 111 deletions

File tree

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ A minimal Python SDK to use Microsoft Dataverse as a database for Azure AI Found
77
- Bulk create — Pass a list of records to `create(...)` to invoke the bound `CreateMultiple` action; returns `list[str]` of GUIDs. If `@odata.type` is absent the SDK resolves the logical name from metadata (cached).
88
- Bulk update — Call `update_multiple(entity_set, records)` to invoke the bound `UpdateMultiple` action; returns nothing. Each record must include the real primary key attribute (e.g. `accountid`).
99
- Retrieve multiple (paging) — Generator-based `get_multiple(...)` that yields pages, supports `$top` and Prefer: `odata.maxpagesize` (`page_size`).
10-
- Upload files — 3 methods to upload files to file column. See https://learn.microsoft.com/en-us/power-apps/developer/data-platform/file-column-data?tabs=sdk#upload-files
10+
- Upload files — Call `upload_file(entity_set, ...)` and a upload method will be auto picked (user can also overwrite the upload mode). See https://learn.microsoft.com/en-us/power-apps/developer/data-platform/file-column-data?tabs=sdk#upload-files
1111
- Metadata helpers — Create/inspect/delete simple custom tables (EntityDefinitions + Attributes).
1212
- Pandas helpers — Convenience DataFrame oriented wrappers for quick prototyping/notebooks.
1313
- Auth — Azure Identity (`TokenCredential`) injection.
@@ -20,7 +20,7 @@ A minimal Python SDK to use Microsoft Dataverse as a database for Azure AI Found
2020
- Bulk create via `CreateMultiple` (collection-bound) by passing `list[dict]` to `create(entity_set, payloads)`; returns list of created IDs.
2121
- Bulk update via `UpdateMultiple` by calling `update_multiple(entity_set, records)` with primary key attribute present in each record; returns nothing.
2222
- Retrieve multiple with server-driven paging: `get_multiple(...)` yields lists (pages) following `@odata.nextLink`. Control total via `$top` and per-page via `page_size` (Prefer: `odata.maxpagesize`).
23-
- Upload files either with dv message blocks, a single request (supports file size up to 128 MB), or in chunks
23+
- Upload files, using either dv message blocks, a single request (supports file size up to 128 MB), or chunk upload under the hood
2424
- Optional pandas integration (`PandasODataClient`) for DataFrame based create / get / query.
2525

2626
Auth:
@@ -171,17 +171,17 @@ Notes:
171171
3 methods are supported: `upload_file(entity_set, ...)`, `upload_file_small(entity_set, ...)`, `upload_file_chunk(entity_set, ...)`. All returns `None`.
172172

173173
```python
174-
client.upload_file('account', record_id, 'accountid', 'sample_filecolumn', 'test.pdf')
174+
client.upload_file('account', record_id, 'sample_filecolumn', 'test.pdf')
175175

176-
client.upload_file_small('account', record_id, 'sample_filecolumn', 'test.pdf')
176+
client.upload_file('account', record_id, 'sample_filecolumn', 'test.pdf', mode='chunk', if_none_match=True)
177177

178-
client.upload_file_chunk('account', record_id, 'sample_filecolumn', 'test.pdf')
179178
```
180179

181180
Notes:
182-
- upload_file uses Dataverse messages and upload the file in Base64 encoded blocks (size limit is 4 MB for the Base64 encoded string), it consists of 3 stages: InitializeFileBlocksUpload, UploadBlock, and CommitFileBlocksUpload. Total number of Web API calls is number of blocks + 2.
181+
- upload_file picks one of the three methods to use based on file size: if file is less than 128 MB uses upload_file_small, otherwise uses upload_file_chunk. upload_file_block is used when explicitly requested
183182
- upload_file_small makes a single Web API call and only supports file size < 128 MB
184183
- upload_file_chunk uses PATCH with Content-Range to upload the file (more aligned with HTTP standard compared to Dataverse messages). It consists of 2 stages 1. PATCH request to get the headers used for actual upload. 2. Actual upload in chunks. It uses x-ms-chunk-size returned in the first stage to determine chunk size (normally 4 MB), and use Content-Range and Content-Length as metadata for the upload.
184+
- upload_file_block uses Dataverse messages and upload the file in Base64 encoded blocks (size limit is 4 MB for the Base64 encoded string), it consists of 3 stages: InitializeFileBlocksUpload, UploadBlock, and CommitFileBlocksUpload. Total number of Web API calls is number of blocks + 2.
185185

186186
## Retrieve multiple with paging
187187

examples/quickstart_file_upload.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -290,9 +290,10 @@ def get_dataset_info(file_path: Path):
290290
backoff(lambda: client.upload_file(
291291
entity_set,
292292
record_id,
293-
pk_attr,
294293
file_attr_logical,
295294
str(DATASET_FILE),
295+
mode="block",
296+
id_attribute=pk_attr,
296297
))
297298
print({"block_upload_completed": True})
298299
# Immediate download + verify
@@ -319,9 +320,10 @@ def get_dataset_info(file_path: Path):
319320
backoff(lambda: client.upload_file(
320321
entity_set,
321322
record_id,
322-
pk_attr,
323323
file_attr_logical,
324324
str(replacement_file),
325+
mode="block",
326+
id_attribute=pk_attr,
325327
))
326328
print({"block_replace_upload_completed": True})
327329
# Download and verify replacement
@@ -347,11 +349,12 @@ def get_dataset_info(file_path: Path):
347349
print("Small single-request upload demo:")
348350
try:
349351
DATASET_FILE, small_file_size, src_hash = get_dataset_info(_GENERATED_TEST_FILE)
350-
backoff(lambda: client.upload_file_small(
352+
backoff(lambda: client.upload_file(
351353
entity_set,
352354
record_id,
353355
small_file_attr_logical,
354356
str(DATASET_FILE),
357+
mode="small",
355358
))
356359
print({"small_upload_completed": True, "small_source_size": small_file_size})
357360
odata = client._get_odata()
@@ -374,11 +377,12 @@ def get_dataset_info(file_path: Path):
374377
# Now test replacing with an 8MB file
375378
print("Small single-request upload demo - REPLACE with 8MB file:")
376379
replacement_file, replace_size_small, replace_hash_small = get_dataset_info(_GENERATED_TEST_FILE_8MB)
377-
backoff(lambda: client.upload_file_small(
380+
backoff(lambda: client.upload_file(
378381
entity_set,
379382
record_id,
380383
small_file_attr_logical,
381384
str(replacement_file),
385+
mode="small",
382386
))
383387
print({"small_replace_upload_completed": True, "small_replace_source_size": replace_size_small})
384388
resp_single_replace = odata._request("get", dl_url_single, headers=odata._headers())
@@ -402,11 +406,12 @@ def get_dataset_info(file_path: Path):
402406
print("Streaming chunk upload demo (upload_file_chunk):")
403407
try:
404408
DATASET_FILE, src_size_chunk, src_hash_chunk = get_dataset_info(_GENERATED_TEST_FILE)
405-
backoff(lambda: client.upload_file_chunk(
409+
backoff(lambda: client.upload_file(
406410
entity_set,
407411
record_id,
408412
chunk_file_attr_logical,
409413
str(DATASET_FILE),
414+
mode="chunk",
410415
))
411416
print({"chunk_upload_completed": True})
412417
odata = client._get_odata()
@@ -429,11 +434,12 @@ def get_dataset_info(file_path: Path):
429434
# Now test replacing with an 8MB file
430435
print("Streaming chunk upload demo - REPLACE with 8MB file:")
431436
replacement_file, replace_size_chunk, replace_hash_chunk = get_dataset_info(_GENERATED_TEST_FILE_8MB)
432-
backoff(lambda: client.upload_file_chunk(
437+
backoff(lambda: client.upload_file(
433438
entity_set,
434439
record_id,
435440
chunk_file_attr_logical,
436441
str(replacement_file),
442+
mode="chunk",
437443
))
438444
print({"chunk_replace_upload_completed": True})
439445
resp_chunk_replace = odata._request("get", dl_url_chunk, headers=odata._headers())

src/dataverse_sdk/client.py

Lines changed: 22 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -264,30 +264,44 @@ def upload_file(
264264
self,
265265
entity_set: str,
266266
record_id: str,
267-
id_attribute: str,
268267
file_name_attribute: str,
269268
path: str,
270269
*,
270+
mode: Optional[str] = None,
271271
mime_type: Optional[str] = None,
272-
) -> Dict[str, Any]:
273-
"""Upload a local file into a Dataverse file column.
272+
id_attribute: Optional[str] = None,
273+
if_none_match: bool = True,
274+
) -> None:
275+
"""Upload a file to a Dataverse file column with automatic method selection.
274276
275277
Parameters
276278
----------
277279
entity_set : str
278280
Target entity set (plural logical name), e.g. "accounts".
279281
record_id : str
280282
GUID of the target record.
281-
id_attribute : str
282-
Logical name of the record primary key attribute (e.g. ``accountid``).
283283
file_name_attribute : str
284284
Logical name of the file column attribute.
285285
path : str
286286
Local filesystem path to the file. Stored filename will be the basename of this path.
287+
mode : str | None, keyword-only, optional
288+
Upload strategy: "auto" (default), "block", "small", or "chunk".
289+
- "auto": Automatically selects best method based on file size
290+
- "small": Single PATCH request (files <128MB only)
291+
- "chunk": Streaming chunked upload (any size, most efficient for large files)
292+
- "block": Message-based block upload (any size, compatibility fallback)
287293
mime_type : str | None, keyword-only, optional
288294
Explicit MIME type to persist with the file (e.g. "application/pdf"). If omitted the
289295
lower-level client attempts to infer from the filename extension and falls back to
290296
``application/octet-stream``.
297+
id_attribute : str | None, keyword-only, optional
298+
Logical name of the primary key attribute for the record (e.g. ``accountid``).
299+
**Required** when using "block" mode; raises ValueError if omitted.
300+
Not used for "small" or "chunk" modes.
301+
if_none_match : bool, keyword-only, optional
302+
When True (default), sends ``If-None-Match: null`` to only succeed if the column is
303+
currently empty. Set False to always overwrite (uses ``If-Match: *``).
304+
Used for "small" and "chunk" modes only.
291305
292306
Returns
293307
-------
@@ -297,95 +311,13 @@ def upload_file(
297311
self._get_odata().upload_file(
298312
entity_set,
299313
record_id,
300-
id_attribute,
301314
file_name_attribute,
302315
path,
316+
mode=mode,
303317
mime_type=mime_type,
304-
)
305-
return None
306-
307-
def upload_file_small(
308-
self,
309-
entity_set: str,
310-
record_id: str,
311-
file_name_attribute: str,
312-
path: str,
313-
*,
314-
content_type: Optional[str] = None,
315-
if_none_match: bool = True,
316-
) -> None:
317-
"""Upload a file (<128MB) in one PATCH request to a file column.
318-
319-
Parameters
320-
----------
321-
entity_set : str
322-
Target entity set (plural logical name), e.g. "accounts".
323-
record_id : str
324-
GUID of the target record (with or without braces / parentheses).
325-
file_name_attribute : str
326-
Logical name of the file column attribute.
327-
path : str
328-
Local filesystem path to the file.
329-
content_type : str | None
330-
Optional explicit MIME type. If omitted a basic guess isn't performed here; defaults to application/octet-stream.
331-
if_none_match : bool
332-
When True sends ``If-None-Match: null`` to only succeed if the column is currently empty.
333-
Set False to always overwrite (uses ``If-Match: *``).
334-
335-
Returns
336-
-------
337-
None
338-
Returns nothing on success. Raises on failure.
339-
"""
340-
self._get_odata().upload_file_small(
341-
entity_set,
342-
record_id,
343-
file_name_attribute,
344-
path,
345-
content_type=content_type,
318+
id_attribute=id_attribute,
346319
if_none_match=if_none_match,
347320
)
348321
return None
349322

350-
def upload_file_chunk(
351-
self,
352-
entity_set: str,
353-
record_id: str,
354-
file_name_attribute: str,
355-
path: str,
356-
*,
357-
if_none_match: bool = True,
358-
) -> None:
359-
"""Stream a local file using native chunked PATCH protocol (x-ms-transfer-mode: chunked).
360-
361-
Parameters
362-
----------
363-
entity_set : str
364-
Target entity set (plural logical name), e.g. "accounts".
365-
record_id : str
366-
GUID of the target record.
367-
file_name_attribute : str
368-
Logical name of the file column attribute.
369-
path : str
370-
Local filesystem path to the file.
371-
if_none_match : bool
372-
When True sends ``If-None-Match: null`` to only succeed if the column is currently empty.
373-
Set False to always overwrite (uses ``If-Match: *``).
374-
375-
Returns
376-
-------
377-
None
378-
Returns nothing on success. Raises on failure.
379-
"""
380-
self._get_odata().upload_file_chunk(
381-
entity_set,
382-
record_id,
383-
file_name_attribute,
384-
path,
385-
if_none_match=if_none_match,
386-
)
387-
return None
388-
389-
390-
__all__ = ["DataverseClient"]
391-
323+
__all__ = ["DataverseClient"]

0 commit comments

Comments
 (0)