San Francisco Crime Incidents

Darianne Lopez

Dataset:

Police Department Incident Reports: 2018

License:

Open Data Commons Public Domain Dedication and License

Size of dataset:

Rows: 154K, Columns: 26

About the Data:

This dataset includes police incident reports filed by officers and by individuals through self-service online reporting for non-emergency cases. Reports included those for incidents that occured starting Jaunary 1, 2018, onward and have been approved by a supervising officer. The dataset includes various types of data such as dates, text, lattitude and longitude, and point, with various information like Neighborhood, Incident Time, Incident Day of Week, Report Type Code, Report Type Description, and more.

Columns I used in my visualizations:

Incident Date - (Date & Time) The date the incident occured
Incident Category - (Plain Text) A category mapped on to the Incident Code used in statistics and reporting. Mappings provided by the Crime Analysis Unit of the Police Department.
Analysis Neighborhood - (Plain Text) The Department of Public Health and the Mayor's Office of Housing and Community Development, with support from the Planning Department, created 41 neighborhoods by grouping 2010 Census tracts, using common real estate and resident definitions for the purpose of providing consistency in the analysis and reporting of socio-economic, demographic, and environmental data, and data on City-funded programs and services. They are not codified in Planning Code nor Administrative Code. This boundary is produced by assigning Census tracts to neighborhoods based on existing neighborhood definitions used by Planning and MOHCD. A qualitative assessment was made to identify the appropriate neighborhood for a given tract based on understanding of population distribution and significant landmarks. Once all tracts have been assigned a neighborhood, the tracts were dissolved to produce these boundaries. See reference: https://data.sfgov.org/d/p5b7-5n3h Please note this boundary is assigned based on the intersection, it may differ from the boundary the incident actually occurred within.


Other columns in this Dataset:

Incident Datetime - (Date & Time) The date and time when the incident occured
Incident time - (Plain Text) The time the incident occured
Incident Year - (Plain Text) The year the incident occurred, provided as a convenience for filtering
Incident Day of Week - (Plain Text) The day of the week the incident occurred
Report Datetime - (Date & Time) Distinct from incident Datetime, Report Datetime is when the report was filed
Row ID - (Plain Text) An identifier unique to the dataset
Incident ID - (Plain Text) This is the system generated identifier for incident reports. An incident report can have multiple incident codes associated. Thus, this identifier, while unique to the report, will be duplicated within this dataset to represent those 1 to many relationships when they exist. Incident IDs and Incident Numbers both uniquely identify reports, but incident Numbers are what are used and referenced in the cases and report documents
Incident Number - (Plain Text) The number issued on the report, sometimes interchangeably referred to as the Case Number
CAD Number - (Plain Text) The Computer Aided Dispatch is the system used by the Department of Emergency Management (DEM) to dispatch officers and other public safety personnel. CAD Numbers are assigned by the DEM system and linked to relevant incident reports (Incident Number). Not all Incidents will have a CAD Number. Reports filed online via Coplogic ( see field: Filed Online) will not have a CAD Number and certain other reports not filed through the DEM system will also not have these numbers.
Report Type Code - (Plain Text) A system code for report types, these have corresponding descriptions within the dataset.
Report Type Description - (Plain Text) The description of the report type, can be one of: Initial; Initial Supplement; Vehicle Initial; Vehicle Supplement; Coplogic Initial; Coplogic Supplement
Filed online - (Checkbox) Police reports can be filed online for non-emergency cases. These reports are entered via a self-service system called Coplogic http://sanfranciscopolice.org/reports. This field is a boolean indicating the record was filed this way. These are also indicated in the Report Type Code and Report Type Description fields.
Incident Code - (Plain Text) Incident Codes are the system codes to describe a type of incident. A single incident report can have one or many incident types associated. In those cases you will see multiple rows representing a unique combination of the Incident ID and Incident Code.
Incident Subcategory - (Plain Text) A subcategory mapped on to the Incident Code used in statistics and reporting. These nest inside the Category field. Mappings provided by the Crime Analysis Unit of the Police Department.
Incident Description - (Plain Text) The description of the incident that corresponds with the Incident Code. These are generally self-explanatory.
Resolution - (Plain Text) The resolution of the incident at the time of the report. Can be one of: - Cite or Arrest Adult - Cite or Arrest Juvenile - Exceptional Adult - Exceptional Juvenile - Open or Active - Unfounded Note: once a report is filed the resolution does not change on the filed report later. Updates to a case will be issued later as Supplemental reports if there's a status change.
Intersection - (Plain Text) The 2 or more street names that intersect closest to the original incident separated by a forward slash (\). Note, the possible intersections will only include those that satisfy the privacy controls.
CNN - (Plain Text) The unique identifier of the intersection for reference back to other related basemap datasets. For more on the Centerline Node Network see https://datasf.gitbooks.io/draft-publishing-standards/content/basemap/street-centerlines-nodes.html
Police District - (Plain Text) The Police District reflecting current boundaries (boundaries changed in 2015). Reference here: https://data.sfgov.org/d/wkhw-cjsf Please note these are entered by officers and not based on the point.
Supervisor District - (Plain Text) There are 11 members elected to the Board of Supervisors in San Francisco, each representing a geographic district. The Board of Supervisors is the legislative body for San Francisco. The districts are numbered 1 through 11. See reference: https://data.sfgov.org/d/8nkz-x4ny Please note this boundary is assigned based on the intersection, it may differ from the boundary the incident actually occurred within.
lattiitude - (Number) The latitude coordinate in WGS84, spatial reference is EPSG:4326. Note, will be blank where geocoding was not possible.
longitude - (Number) The longitude coordinate in WGS84, spatial reference is EPSG:4326. Note, will be blank where geocoding was not possible.
Point - (Location) The point geometry used for mapping features in the open data portal platform. Latitude and Longitude are provided separately as well as a convenience. Note, will be blank where geocoding was not possible.

Data Processing

Initial data processing included using a filter on the dataset to gather data from only the interval between Jaunary 1, 2018 to December 31, 2018 (a twelve-month interval).
From this initial processing I had to change a few things for my first visualization, San Francisco Incidents. This dataset originally had a mm/dd/yyyy format for its date column. I parsed this date to only show the month value, which I did using Tableau's custom split on a delimiter and then saving the csv file. Another calculation I needed to make was the count of each incident for each date. This is another thing I did in Tableau by creating my area chart prototype and saving the dataset generated by Tableau. From here, I deleted multiple unessessary columns as I was only using three columns for this dataset.
For my second visualization, Incidents by Neighborhood, I had to sort the dataset by hierarchy. I did this using the Police Department Incident Reports: 2018 filter where I filtered the date column to use all data from 2018 and then used Sort & Roll-Up to order the data first by Analysis Neighboorhood and then by the count of Incident Category. From this output, I created the chart called Incidents by Neighborhood.