ABOUT THE DATA
The data used can be found here. The data I am using ends on the 1st of October 2021.
Since the data from the website is too large and filtering it with python can cause errors, I manually deleted the columns that were not necessary for this project. After that, I then used python to extract all the data related to streetlights and lampost and save them as "light.csv".
We can run the code below to review the columns and their data types:
import pandas as pd
df=pd.read_csv("light.csv")
print(df.dtypes)
The code will print out the following:
Unique Key int64
Created Date object
Closed Date object
Complaint Type object
Descriptor object
Status object
Resolution Action Updated Date object
Borough object
dtype: object
THE DATASETS
UNIQUE KEY
This is an integer signifying the case's ID.
CREATED DATE
This is a series of strings of dates of when the cases were reported. This will later be transformed to DateTime using the following code.
df['Created Date'] = pd.to_datetime(df['Created Date']
CLOSED DATE
Is a series of strings of dates of when the cases were closed. This will later be transformed to DateTime using the following code.
df['Closed Date'] = pd.to_datetime(df['Closed Date'])
COMPLAINT TYPE
Is a series of strings about the type of complaint. For this project, we will only be looking at the Complaint Types called "Street Light Condition".
DESCRIPTOR
Is a series of strings detailing what kind of issue is wrong with the complaint. It could be about flickering lights, damaged lightbulbs, etc.
STATUS
Is a series of strings about the status of the case. A case could either be "Closed", "Opened", or "Assigned".
RESOLUTION ACTION UPDATED DATE
Is a series of strings of dates of when the cases were closed. This will later be transformed to DateTime using the following code.
df['Resolution Action Updated Date] = pd.to_datetime(df['Resolution Action Updated Date'])
BOROUGH
Is a series of strings detailing which borough in NYC the case is from.