Commencing my analysis of the ‘fatal-police-shootings-data’ dataset in Python, I’ve loaded the data to scrutinize its various variables and their respective distributions. Notably, among these variables, ‘age’ stands out as a numerical column, offering insights into the ages of individuals tragically shot by law enforcement. Additionally, the dataset contains latitude and longitude values, pinpointing the precise geographical locations of these incidents.
During this preliminary assessment, I’ve identified an ‘id’ column, which appears to hold limited significance for our analysis. Consequently, I’m considering its exclusion from our further examination. Delving deeper, I’ve scrutinized the dataset for missing values, revealing that several variables exhibit null or missing data, including ‘name,’ ‘armed,’ ‘age,’ ‘gender,’ ‘race,’ ‘flee,’ ‘longitude,’ and ‘latitude.’ Furthermore, I’ve undertaken an investigation into potential duplicate records. This examination has uncovered just a single duplicate entry within the entire dataset, notable for its absence of a ‘name’ value. For the subsequent phase of my analysis, I intend to shift our focus towards exploring the distribution of the ‘age’ variable, a critical step in unraveling insights from this dataset.
In today’s classroom session, essential knowledge on computing geospatial distances using location information. This newfound expertise equips us to create GeoHistograms, a valuable tool for visualizing and analyzing geographical data. GeoHistograms serve as a powerful instrument for pinpointing spatial trends, identifying hotspots, and uncovering clusters within datasets associated with geographic locations. This, in turn, enhances our comprehension of the underlying phenomena embedded within the data.