ABOUT THE DATA
The data used can be found here. The data I am using ends on the 1st of October 2021.
​
Since the data from the website is too large and filtering it with python can cause errors, I manually deleted the columns that were not necessary for this project. After that, I then used python to extract all the data related to streetlights and lampost and save them as "light.csv".
​
We can run the code below to review the columns and their data types:
import pandas as pd​
df=pd.read_csv("light.csv")
print(df.dtypes)
​
​
The code will print out the following:
Unique Key int64
Created Date object
Closed Date object
Complaint Type object
Descriptor object
Status object
Resolution Action Updated Date object
Borough object
dtype: object
THE DATASETS
UNIQUE KEY
This is an integer signifying the case's ID.
CREATED DATE
This is a series of strings of dates of when the cases were reported. This will later be transformed to DateTime using the following code.
df['Created Date'] = pd.to_datetime(df['Created Date']
CLOSED DATE
Is a series of strings of dates of when the cases were closed. This will later be transformed to DateTime using the following code.
df['Closed Date'] = pd.to_datetime(df['Closed Date'])
COMPLAINT TYPE
Is a series of strings about the type of complaint. For this project, we will only be looking at the Complaint Types called "Street Light Condition".
DESCRIPTOR
Is a series of strings detailing what kind of issue is wrong with the complaint. It could be about flickering lights, damaged lightbulbs, etc.
STATUS
Is a series of strings about the status of the case. A case could either be "Closed", "Opened", or "Assigned".
RESOLUTION ACTION UPDATED DATE
Is a series of strings of dates of when the cases were closed. This will later be transformed to DateTime using the following code.
df['Resolution Action Updated Date] = pd.to_datetime(df['Resolution Action Updated Date'])
BOROUGH
Is a series of strings detailing which borough in NYC the case is from.