As I delve into the ‘fatal-police-shootings-data’ dataset using Python, my primary goal is to unpack its variables and scrutinize their distributions. The ‘age’ column, representing the ages of individuals in fatal police encounters, is particularly striking, offering a grim glimpse into the demographic affected. Equally telling are the latitude and longitude values, which enable pinpointing the exact locations of these tragedies.
During my preliminary data exploration, I noted the ‘id’ column, which seems to have minimal impact on our analysis and thus might be excluded moving forward. My data quality assessment revealed missing entries across several columns, including ‘name,’ ‘armed,’ ‘age,’ ‘gender,’ ‘race,’ ‘flee,’ and the geographical coordinates. A solitary duplicate record was detected, lacking a ‘name,’ which underscores the otherwise unique nature of each record.
Next steps include a thorough examination of the ‘age’ distribution to extract meaningful patterns. This analysis will be instrumental in understanding the demographic profile of those involved in fatal police shootings.
In our recent classroom discussions, we’ve learned to calculate geospatial distances, which sets the stage for creating GeoHistograms. These histograms will not just visualize the data but will be pivotal in identifying spatial patterns, hotspots, and clusters, hence deepening our comprehension of the spatial dynamics within the data. This methodical approach, anchored in both statistical and geospatial analysis, will help us build a comprehensive picture of the circumstances surrounding these fatal incidents.