Cartography Week 4: Data Classification


For this week's lab, we compared four different classification methods and two different measurements of the senior population from the 2010 census in Miami-Dade County. For the first map, I compared the natural breaks, quantile, equal interval, and standard deviation classification methods for the percentage of seniors in the population of each census tract.

The equal interval method creates classes that all have the same range in their values. The number of observations will vary depending on the data distribution. This results in some classes with only a few or zero observations and some classes with many observations, and the presence of outliers can easily conceal clusters of observations or other variations in the data within each class.

The quantile method creates classes that all have the same number of observations. The values in each class will vary depending on the data distribution. This results in a nice balance of colors on the map, since each class is represented the same number of times, but can conceal large differences in classes or place similar values in different classes.

The standard deviation method uses the mean and standard deviation of the data to create classes that fall within certain numbers of standard deviations from the mean. If the mean is affected by outliers and does not accurately represent the data, it will create a misleading sense of where the “middle” of the data is, since this method is based around the mean and uses a diverging color scheme.

The natural breaks method uses an algorithm to create classes that maximize the similarity of the values in each class and the differences between values in adjacent classes (in other words, looking for “natural breaks” or clusters in the data distribution). This reveals unique patterns in the distribution, but the inconsistency of the values and number of observations in each class may make the map harder to interpret. 


For the second map, I compared the same four classification methods for the population over 65 per square mile in each census tract. 
In my opinion, the natural breaks method is the best for targeting the senior population because it finds clusters of observations in the distribution, thereby ensuring that there will be a reasonably large difference between the lowest percentage of seniors in the highest class and the highest percentage of seniors in the class below it. The lower value for this top class is 29.86%, which seems like a good threshold for what constitutes a disproportionately high percentage of seniors in an area. Since I assume areas with few seniors are not of interest, the equal interval and standard deviation methods are not helpful because they place a large number of census tracts into the lower classes. I think the quantile method is a decent choice because it places 20% of the census tracts in the class with the highest percentage of seniors, which seems like a reasonable number to focus on. However, the natural breaks method is still the best, since it is tailored to this specific distribution.

When comparing these two measurements of the distribution of seniors in Miami-Dade County, the percentages are a more accurate representation than the normalized population counts. Since the population of seniors per square mile is constrained by the total population per square mile, there is a confounding variable of population density. Even though I know nothing about Miami-Dade County, it is easy to see that the senior population per square mile follows the pattern of higher population density in the urban core and lower population density in the outer parts of the county. The percentage of seniors in the population is independent of population density, meaning that even if a census tract has a low senior population density (probably because the overall population density is low), seniors may make up a high percentage of the population in that tract. The best data presentation will depend on the purpose for which the data will be used, but I think that percentages more accurately display where seniors are distributed in the county.

Comments

Popular posts from this blog

Week 4: Vector Analysis

Remote Sensing Final Project: Change in NDVI Over Time