-
Notifications
You must be signed in to change notification settings - Fork 28
Expand file tree
/
Copy path_visualizing_spatial_data.qmd
More file actions
305 lines (243 loc) · 9.12 KB
/
_visualizing_spatial_data.qmd
File metadata and controls
305 lines (243 loc) · 9.12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
This presentation is prepared by Cody Jones.
### Visualizing Spatial Data
Table of contents
- Definition
- Different Types(Vector, Raster)
- Visualization Methods
- Importance of Map Projections
- Design Principles
- Tools and Technology
- Questions
### What is Spatial Data?
Visualizing spatial data is the process of displaying data that has a geographic component on a map. This helps us see patterns across locations, compare regional patterns, identify any trends or clusters, and understand how place influences outcome.
Regular data might tell you information such as income, temperature, and population. But spatial data takes a look "income where? Temperature where? Population where?" Location changes everything.
### Vector Vs Raster Data
Vector data describes the literal geometric level of the data. This includes the following details you would find on a map:
- Points(crime incidents)
- Lines(roads, rivers)
- Polygons(territories, census tracts)
Raster data describes grid data that doesn't havea geometric feature to them.
- Temperature
- Elevation
- Satellite
Raster data is more often used in Environmental Studies
### Visualization Methods(Maps)
#### Choropleth Maps
- Encapsulates spatial data over a studied area
- Regional, colored map
- Usually follows one variable and its correlation to region
- Runs the risk of being misleading if raw count is considered
```{python}
import pandas as pd
import plotly.express as px
data = {
"state": ["AL", "AK", "AZ", "AR", "CA",
"CO", "CT", "DE", "FL", "GA"],
"population": [4903185, 731545, 7278717, 3017804, 39512223,
5758736, 3565287, 973764, 21477737, 10617423]
}
df = pd.DataFrame(data)
fig = px.choropleth(
df,
locations="state",
locationmode="USA-states",
color="population",
scope="usa",
color_continuous_scale="Viridis",
labels={"population": "Population"},
title="US State Populations"
)
fig
```
#### Heat Maps
- Focuses on the density of variables
- Includes a variance in hue or intensity of color
- 2 Dimensional, covers the observations into categories between two variables
- Unlike choropleth maps, heat maps do not require a geographic map
```{python}
import pandas as pd
import numpy as np
import plotly.express as px
np.random.seed(42)
df = pd.DataFrame({
"Income": np.random.normal(60000, 15000, 200),
"Education Years": np.random.normal(16, 2, 200),
"Age": np.random.normal(40, 10, 200),
"Work Hours": np.random.normal(40, 5, 200),
})
# Create some relationships to make heatmap interesting
df["Spending"] = df["Income"] * 0.3 + np.random.normal(0, 5000, 200)
df["Savings"] = df["Income"] * 0.2 + np.random.normal(0, 3000, 200)
# Compute correlation matrix
corr = df.corr()
# Create heatmap
fig = px.imshow(
corr,
text_auto=True,
color_continuous_scale="RdBu_r",
title="Correlation Heatmap"
)
fig
```
#### Proportional Symbol Map
- Uses symbol sizes as its main form of communication(commonly a circle)
- Covers over geographical maps, similar to choropleth maps
- Size of symbol expresses the condition of the variable
- Comparable details to choropleth, substitutes the symbol for color changes
```{python}
import pandas as pd
import plotly.express as px
# Sample population data (millions scale works best visually)
data = {
"state": ["CA", "TX", "FL", "NY", "PA",
"IL", "OH", "GA", "NC", "MI"],
"population": [39.5, 29.0, 21.5, 19.8, 12.8,
12.6, 11.7, 10.6, 10.4, 10.0],
"lat": [36.77, 31.97, 27.99, 42.95, 40.88,
40.63, 40.42, 32.17, 35.78, 44.31],
"lon": [-119.42, -99.90, -81.76, -75.53, -77.80,
-89.40, -82.91, -82.90, -78.64, -85.60]
}
df = pd.DataFrame(data)
# Create proportional symbol map
fig = px.scatter_geo(
df,
lat="lat",
lon="lon",
size="population",
hover_name="state",
size_max=50,
scope="usa",
title="US State Populations (Proportional Symbol Map)",
labels={"population": "Population (millions)"}
)
fig.update_layout(
geo=dict(
showland=True,
landcolor="rgb(217, 217, 217)"
)
)
fig
```
#### Dot Density Map
- Covers a region and details occurances
- The data is technically quantifiable with symbols and a legend
- Dots can be approximate or accurate
- Simple color variance allows for more variables
- Usually covered over a map
- Some drawbacks of dot density maps are that they could be congested/needs to be counted
```{python}
import pandas as pd
import numpy as np
import plotly.express as px
# Sample population data (millions)
data = {
"state": ["CA", "TX", "FL", "NY", "PA"],
"population_millions": [39, 29, 21, 19, 12],
"lat": [36.77, 31.97, 27.99, 42.95, 40.88],
"lon": [-119.42, -99.90, -81.76, -75.53, -77.80]
}
df = pd.DataFrame(data)
# Create dot density dataset
dots = []
for _, row in df.iterrows():
for _ in range(int(row["population_millions"])): # 1 dot per million
dots.append({
"state": row["state"],
"lat": row["lat"] + np.random.uniform(-1.5, 1.5),
"lon": row["lon"] + np.random.uniform(-1.5, 1.5)
})
dots_df = pd.DataFrame(dots)
# Create dot density map
fig = px.scatter_geo(
dots_df,
lat="lat",
lon="lon",
scope="usa",
title="Dot Density Map (1 Dot = 1 Million People)",
)
fig.update_traces(marker=dict(size=3))
fig.update_layout(showlegend=False)
fig
```
#### Cartograms
- Takes a regional map and distorts regions based on the given variable
- Cartograms follow one variable
- Geographical maps are necessary
- Goofy looking
```{python}
import pandas as pd
import plotly.express as px
# Sample population data (millions)
data = {
"state": ["CA", "TX", "FL", "NY", "PA",
"IL", "OH", "GA", "NC", "MI"],
"population": [39.5, 29.0, 21.5, 19.8, 12.8,
12.6, 11.7, 10.6, 10.4, 10.0],
"lat": [36.77, 31.97, 27.99, 42.95, 40.88,
40.63, 40.42, 32.17, 35.78, 44.31],
"lon": [-119.42, -99.90, -81.76, -75.53, -77.80,
-89.40, -82.91, -82.90, -78.64, -85.60]
}
df = pd.DataFrame(data)
# Create cartogram-style proportional circle map
fig = px.scatter_geo(
df,
lat="lat",
lon="lon",
size="population",
color="population",
hover_name="state",
size_max=60,
scope="usa",
color_continuous_scale="Plasma",
title="US Population Cartogram (Circle-Based)"
)
fig.update_layout(
geo=dict(
showland=True,
landcolor="rgb(240,240,240)"
)
)
fig
```
### Significance
#### Decision making
Strictly numerical data cannot determine conclusions influenced by geographical trends. The ability to view data through a map grants new eyes for the viewer. Visualizing spatial data allows for new levels of decision making and conclusion-drawing. Some examples of this are:
- Detecting anomalies in network performance
- Determining patterns based on seasonal variations
- General urban planning
#### Map Projections
It's important to understand the relationship between the world and maps. Earth is 3-Dimensional space and capturing and presenting is data can be challenging. While transferring geographical data into a workspace, some aspects might be distorted for digestion. This includes:
- Area
- Shape
- Distance
- Direction
- Mercator Projection(full)
- Robinson Projection(slight roundedness)
### Design Principles
It's important to keep visuals basic and easy for the viewer to understand. This includes following some general rules in order to keep information neat and understandable. Some of these rules include:
- Colors(sequential vs diverging)
- Classification Methods(Equal intervals, even quantiles)
- Natural breaks(jenks: reduce variance within, maximize variance between)
- Avoid misinterpretation(Rates vs totals, per capita, color-friendly)
### Tools and Technologies
#### Tableau
Stand-out data visualization tool. Tableau creates interactive maps and visualizes geographic trends with strategies mentioned previously. It is able to connect location data to data sets. Primary tool for dashboards and storytelling. Prioritizes clean visuals over heavy analysis.
- Shapefiles(GIS, geometric)
- MapInfo Tables
- KML Files(Google Earth)
#### R
Statistical programming language that is able to perform some spatial anaylsis. Handles GIS data(shapefiles, GeoJSON). Utilizes packages to create its advanced maps. R is a tool for analyzing spatial data statistically
- sf: structure, rgdal: enables reading, sp: classes and methods
- ggplot2: customizable data visualizations
#### Python
All-purpose programming language used for data science. It can focus on spatial data processing and GIS anaylsis. It can involve machine learning with location data and adopts libraries to accomplish its modeling and processing.
- Geopandas: loads spatial data formats(.shp, geojson)
- Matplotlib: plotting
#### Google Maps
A familiar mapping platform for the web. It provides basemaps and geographical context to the user. It visualizes spatial data in an interactive manner and embeds maps into apps. It's the perfect platform for displaying and interacting with geographic data.
- Imports GIS data (shapefiles, converted into KML)
- Geocoding API(addresses to coordinates)
### Questions?
Thank you! - Cody Jones