-
Notifications
You must be signed in to change notification settings - Fork 41
fix joins for missing key columns #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6cc656c
f6ea709
a34d062
f8e8dcb
5cca119
1c5f364
a870459
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,6 @@ | ||
| {-# LANGUAGE BangPatterns #-} | ||
| {-# LANGUAGE CPP #-} | ||
| {-# LANGUAGE ConstraintKinds #-} | ||
| {-# LANGUAGE ExplicitNamespaces #-} | ||
| {-# LANGUAGE FlexibleContexts #-} | ||
| {-# LANGUAGE GADTs #-} | ||
|
|
@@ -48,6 +50,18 @@ import System.Random | |
| import Type.Reflection | ||
| import Prelude hiding (filter, take) | ||
|
|
||
| #if MIN_VERSION_random(1,3,0) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this prevent compilation? What ghc and build tool are you using?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no for the repo’s normal build. this module enables CPP, and MIN_VERSION_random(...) is provided by cabal preprocessing. i tested it with ghc 9.10.3 and cabal-install 3.16.1.0 in the nix dev shell, and the repo ci also builds with cabal. if someone compiles the file with bare ghc outside cabal, then that macro would not be defined. |
||
| type SplittableGen g = (SplitGen g, RandomGen g) | ||
|
|
||
| splitForStratified :: SplittableGen g => g -> (g, g) | ||
| splitForStratified = splitGen | ||
| #else | ||
| type SplittableGen g = RandomGen g | ||
|
|
||
| splitForStratified :: SplittableGen g => g -> (g, g) | ||
| splitForStratified = split | ||
| #endif | ||
|
|
||
| -- | O(k * n) Take the first n rows of a DataFrame. | ||
| take :: Int -> DataFrame -> DataFrame | ||
| take n d = d{columns = V.map (takeColumn n') (columns d), dataframeDimensions = (n', c)} | ||
|
|
@@ -513,7 +527,7 @@ ghci> D.stratifiedSample (mkStdGen 42) 0.8 "label" df | |
| -} | ||
| stratifiedSample :: | ||
| forall a g. | ||
| (SplitGen g, RandomGen g, Columnable a) => | ||
| (SplittableGen g, Columnable a) => | ||
| g -> Double -> Expr a -> DataFrame -> DataFrame | ||
| stratifiedSample gen p strataCol df = | ||
| let col = case strataCol of | ||
|
|
@@ -523,7 +537,7 @@ stratifiedSample gen p strataCol df = | |
| go _ [] = mempty | ||
| go g (ixs : rest) = | ||
| let stratum = rowsAtIndices ixs df | ||
| (g1, g2) = splitGen g | ||
| (g1, g2) = splitForStratified g | ||
| in sample g1 p stratum <> go g2 rest | ||
| in go gen groups | ||
|
|
||
|
|
@@ -537,7 +551,7 @@ ghci> D.stratifiedSplit (mkStdGen 42) 0.8 "label" df | |
| -} | ||
| stratifiedSplit :: | ||
| forall a g. | ||
| (SplitGen g, RandomGen g, Columnable a) => | ||
| (SplittableGen g, Columnable a) => | ||
| g -> Double -> Expr a -> DataFrame -> (DataFrame, DataFrame) | ||
| stratifiedSplit gen p strataCol df = | ||
| let col = case strataCol of | ||
|
|
@@ -547,7 +561,7 @@ stratifiedSplit gen p strataCol df = | |
| go _ [] = (mempty, mempty) | ||
| go g (ixs : rest) = | ||
| let stratum = rowsAtIndices ixs df | ||
| (g1, g2) = splitGen g | ||
| (g1, g2) = splitForStratified g | ||
| (tr, va) = randomSplit g1 p stratum | ||
| (trAcc, vaAcc) = go g2 rest | ||
| in (tr <> trAcc, va <> vaAcc) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant we should consolidate the two error paths since the two exceptions are practically the same. ColumnNotFound should be a special case of columns not found with a single element lost.