Data Description
The dataset I will be looking at is the SF Park Scores dataset.
This dataset was created using data provided by the San Francisco Recreation & Parks Department via the San Francisco Open Data Portal.
It contains quarterly park evaluation scores in San Francisco, from Q3 2005 to Q4 2014. Criteria for the parks' scores
are mostly based on cleanliness, condition of structures and equipment, etc. Here are more details about the criteria:
2005,
2013,
2014.
The dataset is 872 KB, with 5495 rows and 16 columns.
As for the data itself, apart from the score of the parks, the dataset also includes information about each park such as
location, facilities, and size.
The specific attributes are:
ParkID |
PSA |
Park |
FQ |
Score |
Address |
Acres |
Square Feet |
State |
Zipcode |
Latitude |
Longitude |
Perimeter Length |
Facility Type |
Facility Name |
Floor Count |
Moreover, as part of
DataSF, this dataset is under
the
“Public Domain and Dedication License” (PDDL). You can find
a human-readable summary of the license
here.
However, when thinking about what type of questions I wanted to answer with my visualizations, I decided that I wanted some extra information
to complement the SF Parks dataset. Specifically, I looked for San Francisco population by neighborhoods to be able to talk more
about the relationship between population and parks' scores. To obtain this data, I found a study by the
San Francisco Planning Department, which uses data produced by the U.S. Census Bureau
(
Terms of Service).
The study contains a lot of details about the population for each neighborhood in San Francisco such as race and age, but for now, I will
only be looking at the total population number for each of the 41 neighborhoods.
Another interesting perspective would be to compare a neighborhood's crime rate to nearby parks' scores.
Data Processing
In order to create choroplets or maps with the parks' boundaries, I downloaded three sets of maps (part of
DataSF, so also under the
“Public Domain and Dedication License”):
Geojson data of SF Parks
Geojson data of SF map by zipcode
Geojson data of Golden Gate Park Sections
On the geojson data of the parks, a change was needed for the park name "Ferry Park" to "Sue Bierman Park" (it is the same park, but known as the Ferry Park
due to being next to the Ferry Building).
As for the main dataset,
SF Park Scores,
there were some parks with missing size and location data (i.e. square feet, perimeter, acres, longitude and latitude).
I used the data from the
SF Parks geojson
to fill in the missing data in the main dataset. I also looked up their missing zipcode on Google Maps.
Specifically, I filled in the data for the following parks: Buchanan Street Mall, Chester-Palmetto Mini Park,
Collis P. Huntington Park, Golden Gate-Steiner Mini Park, Hayes Valley Playground, Herz Playground,
Joseph Conrad Mini Park, Louis Sutter Playground, Maritime Plaza, Palace of Fine Arts, Patricia's Green in Hayes Valley,
Portola Open Space, Randolph-Bright Mini Park, Roosevelt-Henry Steps, Saturn Street Steps,
Sue Bierman Park, Union Square, Visitacion Valley Greenway, and Yacht Harbor & Marina Green.
There were still some parks which needed other arrangements. The first one was the Chester-Palmetto Mini Park,
for which I could not get any information and was therefore left out of the visualization. The second park was
the Golden Gate Park.
The problem with the Golden Gate Park data is that it used to be designated by a single park ID up until 2010. From 2011 onwards,
the park was divided into 6 different sections for a better, more detailed assessment of such a big park. This means that the
SF Park Scores dataset contains scores for
two different designations of the same park. In order to standardize the data, I decided to use only one designation and transform the
second designation to it. I opted to use the represetation the Golden Gate Park as one park (no sections) basically because I can't
create data for the different sections, but I can average the sections to get data for the park as one. The transformation was
simply averaging the scores of all 6 sections for each quarter from 2011-2014.
Another problem with the Golden Gate Park is the fact that is that it has parts on different zipcodes. I basically utilized the same
scores but added them to all the zipcodes which have a part of the Golden Gate Park. For the square feet area, I used the Acres data
in the
geojson data of Golden Gate Park Sections
to decide how much area is in each zipcode (sqft = acres * 43560).
Something to note is that there seems to be some parks left out from the data. For example, the Presidio doesn't show up.