Problem
ColumnValueProvider.column_value() in datafaker/providers.py:27–36 selects a random row using:
query = select(orm_class).order_by(functions.random()).limit(1)
SQLAlchemy's functions.random() compiles to random() for all dialects. MS-SQL does not have a random() function and raises an error at runtime:
sqlalchemy.exc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 18
for SQL Server][SQL Server]'random' is not a recognized built-in function name.")
The correct MS-SQL equivalents are RAND() (returns a scalar float, usable in ORDER BY) or NEWID() (generates a random GUID, commonly used for random row ordering).
Root cause
functions.random() is not dialect-aware — SQLAlchemy emits random() unconditionally regardless of the connected engine. PostgreSQL and DuckDB both expose a random() function, so this works on those engines, but fails on MS-SQL.
Impact
Any df.py that calls column_value against an MS-SQL source database will fail. This includes OMOP-style schemas where many FK columns are populated via column_value lookups against vocabulary tables (e.g. concept).
Proposed fix
Detect the dialect at runtime inside column_value and substitute the appropriate random expression:
@staticmethod
def column_value(
db_connection: Connection, orm_class: Any, column_name: str
) -> Any:
"""Return a random value from the column specified."""
dialect = db_connection.dialect.name
if dialect == "mssql":
random_fn = func.newid()
else:
random_fn = functions.random()
query = select(orm_class).order_by(random_fn).limit(1)
random_row = db_connection.execute(query).first()
if random_row:
return getattr(random_row, column_name)
return None
NEWID() is preferred over RAND() for MS-SQL row ordering because RAND() returns the same value for every row in a single query execution, giving a non-random sort. NEWID() produces a unique value per row and is the standard MS-SQL idiom for ORDER BY random().
Reference
The existing dialect-specific hooks in datafaker/create.py (e.g. remove_mssql_identity, remove_mssql_on_delete_cascade) show the established pattern for handling dialect differences in this codebase.
Problem
ColumnValueProvider.column_value()indatafaker/providers.py:27–36selects a random row using:SQLAlchemy's
functions.random()compiles torandom()for all dialects. MS-SQL does not have arandom()function and raises an error at runtime:The correct MS-SQL equivalents are
RAND()(returns a scalar float, usable inORDER BY) orNEWID()(generates a random GUID, commonly used for random row ordering).Root cause
functions.random()is not dialect-aware — SQLAlchemy emitsrandom()unconditionally regardless of the connected engine. PostgreSQL and DuckDB both expose arandom()function, so this works on those engines, but fails on MS-SQL.Impact
Any
df.pythat callscolumn_valueagainst an MS-SQL source database will fail. This includes OMOP-style schemas where many FK columns are populated viacolumn_valuelookups against vocabulary tables (e.g.concept).Proposed fix
Detect the dialect at runtime inside
column_valueand substitute the appropriate random expression:NEWID()is preferred overRAND()for MS-SQL row ordering becauseRAND()returns the same value for every row in a single query execution, giving a non-random sort.NEWID()produces a unique value per row and is the standard MS-SQL idiom forORDER BY random().Reference
The existing dialect-specific hooks in
datafaker/create.py(e.g.remove_mssql_identity,remove_mssql_on_delete_cascade) show the established pattern for handling dialect differences in this codebase.