-
Notifications
You must be signed in to change notification settings - Fork 11
DataFrames conversion
Leonid Poliakov edited this page Jul 14, 2016
·
11 revisions
GigaspacesClassRelation.buildScan(...):
- infers schema from class
- reads classes
- parses the
RDD[class]intoRDD[Row]
| In Space | DataFrame Type | DataFrame Content |
|---|---|---|
| @SQLUserDefinedType(udt=MyUdt) | MyUdt | User object |
| Geospatial Shape | ShapeUDT (PointUDT) |
Shape (Point) |
| Scala case class (Product) | StructType | Row |
| Java class | StructType | Row |
| scala.Int | IntegerType | Int |
| java.lang.Integer | IntegerType | Int |
GigaspacesDocumentRelation.buildScan(...):
- infers schema from descriptor
- reads documents
- parses the
RDD[SpaceDocument]intoRDD[Row]
| In Space | DataFrame Type | DataFrame Content |
|---|---|---|
| Geospatial Shape | ShapeUDT (PointUDT) |
Shape (Point) |
| Scala case class (Product) | StructType | Row |
| Java class | StructType | Row |
| java.lang.Integer | IntegerType | Int |
Third-party
SpaceDocumentcan have nested java/scala fields
GigaspacesDocumentRelation.buildScan(...):
- reads schema stored in space in
DataFrameSchema - reads documents
- parses the
RDD[SpaceDocument]intoRDD[Row].
| In Space | DataFrame Type | DataFrame Content |
|---|---|---|
| DocumentProperties | StructType | Row |
| java.lang.Integer | IntegerType | Int |
| Geospatial Shape | ShapeUDT (PointUDT) |
Shape (Point) |
Persisted documents do not have typed nested fields, just raw
Rows
GigaspacesDocumentRelation.insert(...):
- converts
Rdd[Row]intoRdd[SpaceDocument] - saves with
rdd.saveToGrid() - saves schema in
DataFrameSchema
| In Space | DataFrame Type | DataFrame Content |
|---|---|---|
| DocumentProperties | StructType | Row |
| ??? | IntegerType | Int |
| Geospatial Shape | ShapeUDT (PointUDT) |
Shape (Point) |
Read the table above from right to left: DataFrame Shape is stored as Shape field in space