Skip to content

Commit cd80d9c

Browse files
committed
Add ActiveDirectoryServicePrincipal support for bulk copy
Wires a Python token-factory callback into the mssql-py-core connection context so bulk copy can authenticate with `Authentication= ActiveDirectoryServicePrincipal`. The callback is invoked by mssql-tds mid-handshake (FedAuth workflow 0x02), receives the STS URL from the server, parses tenant_id from it, and uses azure-identity's ClientSecretCredential to acquire a JWT — matching what ODBC does for the query path. Why a callback rather than pre-acquire (Model A): - ServicePrincipal needs `tenant_id` to build ClientSecretCredential. - The connection string does not carry it; azure-identity does not discover it. The only place we can learn tenant_id client-side is from the STS URL the server hands back during pre-login (which is exactly what ODBC does internally). - A pre-acquire flow therefore can't work. Saurabh approved the callback model on EXP--Drivers--Python (May 13 2026). Scope: ServicePrincipal only. ActiveDirectoryPassword and ActiveDirectoryIntegrated remain on their existing code paths (still rejected in py-core today with a clear error). Separate follow-ups. Changes: - constants.py: Add AuthType.SERVICE_PRINCIPAL. - auth.py: - Add ServicePrincipalAuth.make_token_factory(client_id, secret) that returns a (spn, sts_url, auth_method) -> bytes callable matching the entra_id_token_factory contract py-core expects. - Add _parse_tenant_id() helper. - process_auth_parameters: detect SP and return auth_type=None (let ODBC handle the query path natively, msodbcsql 17.3+ has native SP support). - extract_auth_type: propagate "serviceprincipal" so bulkcopy can distinguish the SP path. - cursor.py bulkcopy: when _auth_type=="serviceprincipal", build the factory from connection-string UID/PWD and register it via the new entra_id_token_factory dict key. Keep authentication/user_name/ password in pycore_context (py-core's auth validator + transformer need them to resolve the method to ActiveDirectoryServicePrincipal before the factory is dispatched). Existing Default/DeviceCode/ Interactive (Model A) flow unchanged. Requires mssql-py-core 0.1.5+ which wires the entra_id_token_factory dict key into ClientContext.auth_method_map. Tests: 17 new in test_008_auth.py covering tenant parsing (GUID/domain/query-string/trailing-slash/empty/etc), credential kwarg forwarding, scope construction, UTF-16LE encoding, and error paths (missing client_id/secret, bad STS URL, auth failure propagation). Partial fix for #534.
1 parent c776ede commit cd80d9c

4 files changed

Lines changed: 335 additions & 18 deletions

File tree

mssql_python/auth.py

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,103 @@ def _acquire_token(auth_type: str) -> Tuple[bytes, str]:
130130
raise RuntimeError(f"Failed to create {credential_class.__name__}: {e}") from e
131131

132132

133+
def _parse_tenant_id(sts_url: str) -> Optional[str]:
134+
"""Extract tenant ID (GUID or domain) from a FedAuthInfo STS URL.
135+
136+
Expected formats:
137+
https://login.microsoftonline.com/<tenant>/
138+
https://login.microsoftonline.com/<tenant>/?...
139+
https://login.microsoftonline.com/<tenant>
140+
where <tenant> is either a GUID (e.g. ``aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee``)
141+
or a verified domain (e.g. ``contoso.onmicrosoft.com``). Both forms are
142+
accepted by ``azure.identity.ClientSecretCredential``.
143+
"""
144+
# pylint: disable=import-outside-toplevel
145+
from urllib.parse import urlparse
146+
147+
try:
148+
parsed = urlparse(sts_url)
149+
except (ValueError, AttributeError):
150+
return None
151+
path = (parsed.path or "").strip("/")
152+
if not path:
153+
return None
154+
first_segment = path.split("/", 1)[0]
155+
return first_segment or None
156+
157+
158+
class ServicePrincipalAuth:
159+
"""Builds an ``entra_id_token_factory`` callable for ActiveDirectoryServicePrincipal.
160+
161+
The bulkcopy path through mssql-py-core uses callback-based token
162+
acquisition (FedAuth workflow ``0x02``) because tenant_id is only known
163+
from the STS URL that the server returns during the TDS handshake.
164+
"""
165+
166+
@staticmethod
167+
def make_token_factory(client_id: str, client_secret: str):
168+
"""Return a callable suitable for ``entra_id_token_factory``.
169+
170+
Signature: ``(spn: str, sts_url: str, auth_method: str) -> bytes``.
171+
Returns the JWT encoded as UTF-16LE bytes (the TDS FedAuth wire format).
172+
``ClientSecretCredential`` is constructed inside the callable so
173+
each invocation rebuilds it; tenant is parsed from ``sts_url``
174+
on every call rather than captured.
175+
"""
176+
if not client_id:
177+
raise ValueError("ServicePrincipal auth requires a non-empty client_id (UID)")
178+
if not client_secret:
179+
raise ValueError("ServicePrincipal auth requires a non-empty client_secret (PWD)")
180+
181+
def _factory(spn: str, sts_url: str, auth_method: str) -> bytes:
182+
# pylint: disable=import-outside-toplevel,unused-argument
183+
try:
184+
from azure.identity import ClientSecretCredential
185+
from azure.core.exceptions import ClientAuthenticationError
186+
except ImportError as e:
187+
raise RuntimeError(
188+
"Azure authentication libraries are not installed. "
189+
"Please install with: pip install azure-identity azure-core"
190+
) from e
191+
192+
tenant_id = _parse_tenant_id(sts_url)
193+
if not tenant_id:
194+
raise RuntimeError(
195+
f"Could not extract tenant_id from STS URL: {sts_url!r}"
196+
)
197+
198+
logger.info(
199+
"ServicePrincipal token factory: acquiring token for tenant=%s, spn=%s",
200+
tenant_id,
201+
spn,
202+
)
203+
try:
204+
credential = ClientSecretCredential(
205+
tenant_id=tenant_id,
206+
client_id=client_id,
207+
client_secret=client_secret,
208+
)
209+
# mssql-tds passes the resource SPN; azure-identity wants a scope.
210+
scope = spn if spn.endswith("/.default") else spn.rstrip("/") + "/.default"
211+
token = credential.get_token(scope).token
212+
logger.info(
213+
"ServicePrincipal token factory: token acquired, length=%d chars",
214+
len(token),
215+
)
216+
return token.encode("utf-16-le")
217+
except ClientAuthenticationError as e:
218+
logger.error(
219+
"ServicePrincipal authentication failed: tenant=%s, error=%s",
220+
tenant_id,
221+
str(e),
222+
)
223+
raise RuntimeError(
224+
f"ServicePrincipal authentication failed for tenant '{tenant_id}': {e}"
225+
) from e
226+
227+
return _factory
228+
229+
133230
def process_auth_parameters(parameters: List[str]) -> Tuple[List[str], Optional[str]]:
134231
"""
135232
Process connection parameters and extract authentication type.
@@ -180,6 +277,19 @@ def process_auth_parameters(parameters: List[str]) -> Tuple[List[str], Optional[
180277
# Default authentication (uses DefaultAzureCredential)
181278
logger.debug("process_auth_parameters: Default Azure authentication detected")
182279
auth_type = "default"
280+
elif value_lower == AuthType.SERVICE_PRINCIPAL.value:
281+
# ServicePrincipal authentication. ODBC (msodbcsql 17.3+)
282+
# handles this natively for regular queries, so leave
283+
# auth_type=None to let ODBC own the query path.
284+
# Bulkcopy still needs the auth type — extract_auth_type()
285+
# propagates it as "serviceprincipal" so the bulkcopy path
286+
# can register an entra_id_token_factory callback (Model B,
287+
# required because tenant_id is only known from the STS URL
288+
# that the server returns during the FedAuth handshake).
289+
logger.debug(
290+
"process_auth_parameters: Service principal authentication detected"
291+
)
292+
auth_type = None
183293
modified_parameters.append(param)
184294

185295
logger.debug(
@@ -246,6 +356,7 @@ def extract_auth_type(connection_string: str) -> Optional[str]:
246356
AuthType.INTERACTIVE.value: "interactive",
247357
AuthType.DEVICE_CODE.value: "devicecode",
248358
AuthType.DEFAULT.value: "default",
359+
AuthType.SERVICE_PRINCIPAL.value: "serviceprincipal",
249360
}
250361
for part in connection_string.split(";"):
251362
key, _, value = part.strip().partition("=")

mssql_python/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,7 @@ class AuthType(Enum):
337337
INTERACTIVE = "activedirectoryinteractive"
338338
DEVICE_CODE = "activedirectorydevicecode"
339339
DEFAULT = "activedirectorydefault"
340+
SERVICE_PRINCIPAL = "activedirectoryserviceprincipal"
340341

341342

342343
class SQLTypes:

mssql_python/cursor.py

Lines changed: 52 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2934,24 +2934,58 @@ def bulkcopy(
29342934
# Token acquisition — only thing cursor must handle (needs azure-identity SDK)
29352935
if self.connection._auth_type:
29362936
# Fresh token acquisition for mssql-py-core connection
2937-
from mssql_python.auth import AADAuth
2938-
2939-
try:
2940-
raw_token = AADAuth.get_raw_token(self.connection._auth_type)
2941-
except (RuntimeError, ValueError) as e:
2942-
raise RuntimeError(
2943-
f"Bulk copy failed: unable to acquire Azure AD token "
2944-
f"for auth_type '{self.connection._auth_type}': {e}"
2945-
) from e
2946-
pycore_context["access_token"] = raw_token
2947-
# Token replaces credential fields — py-core's validator rejects
2948-
# access_token combined with authentication/user_name/password.
2949-
for key in ("authentication", "user_name", "password"):
2950-
pycore_context.pop(key, None)
2951-
logger.debug(
2952-
"Bulk copy: acquired fresh Azure AD token for auth_type=%s",
2953-
self.connection._auth_type,
2954-
)
2937+
from mssql_python.auth import AADAuth, ServicePrincipalAuth
2938+
2939+
if self.connection._auth_type == "serviceprincipal":
2940+
# Model B: callback-based. tenant_id is only known from the
2941+
# STS URL the server returns mid-handshake, so we register a
2942+
# factory that py-core invokes during FedAuth (workflow 0x02).
2943+
client_id = params.get("uid", "")
2944+
client_secret = params.get("pwd", "")
2945+
if not client_id or not client_secret:
2946+
raise RuntimeError(
2947+
"Bulk copy with Authentication=ActiveDirectoryServicePrincipal "
2948+
"requires UID (client_id) and PWD (client_secret) in the "
2949+
"connection string."
2950+
)
2951+
try:
2952+
factory = ServicePrincipalAuth.make_token_factory(
2953+
client_id, client_secret
2954+
)
2955+
except (RuntimeError, ValueError) as e:
2956+
raise RuntimeError(
2957+
f"Bulk copy failed: unable to build ServicePrincipal "
2958+
f"token factory: {e}"
2959+
) from e
2960+
pycore_context["entra_id_token_factory"] = factory
2961+
# Keep authentication/user_name/password in pycore_context —
2962+
# py-core's auth validator + transformer need them to resolve
2963+
# the auth method to ActiveDirectoryServicePrincipal before
2964+
# the factory is dispatched at handshake time.
2965+
logger.debug(
2966+
"Bulk copy: registered ServicePrincipal token factory for client_id=%s",
2967+
client_id,
2968+
)
2969+
else:
2970+
# Model A: pre-acquired token. Used for Default, DeviceCode,
2971+
# Interactive (non-Windows), and any other AD method whose
2972+
# tenant_id is discoverable client-side via Azure Identity SDK.
2973+
try:
2974+
raw_token = AADAuth.get_raw_token(self.connection._auth_type)
2975+
except (RuntimeError, ValueError) as e:
2976+
raise RuntimeError(
2977+
f"Bulk copy failed: unable to acquire Azure AD token "
2978+
f"for auth_type '{self.connection._auth_type}': {e}"
2979+
) from e
2980+
pycore_context["access_token"] = raw_token
2981+
# Token replaces credential fields — py-core's validator rejects
2982+
# access_token combined with authentication/user_name/password.
2983+
for key in ("authentication", "user_name", "password"):
2984+
pycore_context.pop(key, None)
2985+
logger.debug(
2986+
"Bulk copy: acquired fresh Azure AD token for auth_type=%s",
2987+
self.connection._auth_type,
2988+
)
29552989

29562990
pycore_connection = None
29572991
pycore_cursor = None

0 commit comments

Comments
 (0)