Data Description

The dataset I will be looking at is the SF Park Scores dataset. This dataset was created using data provided by the San Francisco Recreation & Parks Department via the San Francisco Open Data Portal. It contains quarterly park evaluation scores in San Francisco, from Q3 2005 to Q4 2014. Criteria for the parks' scores are mostly based on cleanliness, condition of structures and equipment, etc. Here are more details about the criteria: 2005, 2013, 2014. The dataset is 872 KB, with 5495 rows and 16 columns. As for the data itself, apart from the score of the parks, the dataset also includes information about each park such as location, facilities, and size.

The specific attributes are:
ParkID PSA Park FQ
Score Address Acres Square Feet
State Zipcode Latitude Longitude
Perimeter Length Facility Type Facility Name Floor Count
Moreover, as part of DataSF, this dataset is under the “Public Domain and Dedication License” (PDDL). You can find a human-readable summary of the license here.

However, when thinking about what type of questions I wanted to answer with my visualizations, I decided that I wanted some extra information to complement the SF Parks dataset. Specifically, I looked for San Francisco population by neighborhoods to be able to talk more about the relationship between population and parks' scores. To obtain this data, I found a study by the San Francisco Planning Department, which uses data produced by the U.S. Census Bureau (Terms of Service). The study contains a lot of details about the population for each neighborhood in San Francisco such as race and age, but for now, I will only be looking at the total population number for each of the 41 neighborhoods.

Another interesting perspective would be to compare a neighborhood's crime rate to nearby parks' scores.

Data Processing

In order to create choroplets or maps with the parks' boundaries, I downloaded three sets of maps (part of DataSF, so also under the “Public Domain and Dedication License”):

  • Geojson data of SF Parks
  • Geojson data of SF map by zipcode
  • Geojson data of Golden Gate Park Sections

  • On the geojson data of the parks, a change was needed for the park name "Ferry Park" to "Sue Bierman Park" (it is the same park, but known as the Ferry Park due to being next to the Ferry Building).

    As for the main dataset, SF Park Scores, there were some parks with missing size and location data (i.e. square feet, perimeter, acres, longitude and latitude). I used the data from the SF Parks geojson to fill in the missing data in the main dataset. I also looked up their missing zipcode on Google Maps. Specifically, I filled in the data for the following parks: Buchanan Street Mall, Chester-Palmetto Mini Park, Collis P. Huntington Park, Golden Gate-Steiner Mini Park, Hayes Valley Playground, Herz Playground, Joseph Conrad Mini Park, Louis Sutter Playground, Maritime Plaza, Palace of Fine Arts, Patricia's Green in Hayes Valley, Portola Open Space, Randolph-Bright Mini Park, Roosevelt-Henry Steps, Saturn Street Steps, Sue Bierman Park, Union Square, Visitacion Valley Greenway, and Yacht Harbor & Marina Green.

    There were still some parks which needed other arrangements. The first one was the Chester-Palmetto Mini Park, for which I could not get any information and was therefore left out of the visualization. The second park was the Golden Gate Park.

    The problem with the Golden Gate Park data is that it used to be designated by a single park ID up until 2010. From 2011 onwards, the park was divided into 6 different sections for a better, more detailed assessment of such a big park. This means that the SF Park Scores dataset contains scores for two different designations of the same park. In order to standardize the data, I decided to use only one designation and transform the second designation to it. I opted to use the represetation the Golden Gate Park as one park (no sections) basically because I can't create data for the different sections, but I can average the sections to get data for the park as one. The transformation was simply averaging the scores of all 6 sections for each quarter from 2011-2014.

    Another problem with the Golden Gate Park is the fact that is that it has parts on different zipcodes. I basically utilized the same scores but added them to all the zipcodes which have a part of the Golden Gate Park. For the square feet area, I used the Acres data in the geojson data of Golden Gate Park Sections to decide how much area is in each zipcode (sqft = acres * 43560).

    Something to note is that there seems to be some parks left out from the data. For example, the Presidio doesn't show up.