Dates are one of the most used features of Data Analysis. We will show you how you can parse, use, and analyze them with some simple but very useful functions. Let us first create a DateTime object to understand its concept a bit more.
import pandas as pd import yfinance as yf from datetime import datetime import numpy as np year=2021 month=1 day=1 #create a datetime object date=datetime(year,month,day)
In the code above we set the year month and date (we can add more if we want like minutes, hours, etc.). A very useful feature of the DateTime object is that we can grab their attributes. This is used a lot with pandas when we want to create columns like “Month”, “Year”, etc.
print('day',date.day) print('month',date.month) print('year',date.year)
day 1
month 1
year 2021
String to Datetime in Pandas
In the real-world, in most cases, we are getting dates as strings. That’s why we will show you how to convert them into DateTime objects. You can use the pd.to_datetime function.
pd.to_datetime(['01-01-2021','01-02-2021','01-03-2021'])
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03'], dtype='datetime64[ns]', freq=None)
As you can see even if the date is in an unusual format, the to_datetime function can parse the dates. However, in our example, we have to set the format of the date to be sure that the first number is parsed as a month.
pd.to_datetime(['01-01-2021','01-02-2021','01-03-2021'],format="%m-%d-%Y")
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03'], dtype='datetime64[ns]', freq=None)
You can find all symbols for date formating in the following table
Directive | Description | Example |
---|---|---|
%a | Weekday, short version | Wed |
%A | Weekday, full version | Wednesday |
%w | Weekday as a number 0-6, 0 is Sunday | 3 |
%d | Day of month 01-31 | 31 |
%b | Month name, short version | Dec |
%B | Month name, full version | December |
%m | Month as a number 01-12 | 12 |
%y | Year, short version, without century | 18 |
%Y | Year, full version | 2018 |
%H | Hour 00-23 | 17 |
%I | Hour 00-12 | 05 |
%p | AM/PM | PM |
%M | Minute 00-59 | 41 |
%S | Second 00-59 | 08 |
%f | Microsecond 000000-999999 | 548513 |
%z | UTC offset | +0100 |
%Z | Timezone | CST |
%j | Day number of year 001-366 | 365 |
%U | Week number of year, Sunday as the first day of week, 00-53 | 52 |
%W | Week number of year, Monday as the first day of week, 00-53 | 52 |
%c | Local version of date and time | Mon Dec 31 17:41:00 2018 |
%x | Local version of date | 12/31/18 |
%X | Local version of time | 17:41:00 |
%% | A % character | % |
Datetime functions in Pandas
Let’s get our data. For this post we will use the stock Prices of Tesla.
df=yf.download('TSLA',start='2017-11-07', end='2021-01-01',progress=False) df
DatetimeIndex: 291 entries, 2019-11-06 to 2020-12-31
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 291 non-null float64
1 High 291 non-null float64
2 Low 291 non-null float64
3 Close 291 non-null float64
4 Adj Close 291 non-null float64
5 Volume 291 non-null int64
As we can see, the index is already a Datetime Index.
If we are reading the data from a file, we can parse the dates easily by setting the parse_dates variable as a list of the column names of the Dates we want to parse.
#save a sample dataframe with dates df.reset_index().to_csv('tesla.csv',index=False) df=pd.read_csv('tesla.csv',parse_dates=['Date']) df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 794 non-null datetime64[ns]
1 Open 794 non-null float64
2 High 794 non-null float64
3 Low 794 non-null float64
4 Close 794 non-null float64
5 Adj Close 794 non-null float64
6 Volume 794 non-null int64
However, as we said before we may have a different format of dates and even if the default parser is great we may need to create our own just to be sure that the parsing is correct. The easiest way is by using lambda functions.
custom_dateparser = lambda x: datetime.strptime(x, "%Y-%m-%d") df = pd.read_csv('test.csv', parse_dates=['Date'], date_parser=custom_dateparser)
Aggregate Dates
Having our index as DateTime can help us aggregate our data according to a rule using the resample function. For example, we may want to aggregate our data by year.
#Here the rule A is for End of Year. In other words, group by Year. df.resample(rule='A').mean()
#Q is for quarter end df.resample(rule='Q').max()
You can find all the availlable rules for the resample function in the following Table
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter end frequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A, Y | year end frequency |
BA, BY | business year end frequency |
AS, YS | year start frequency |
BAS, BYS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
Shifting Dates
Using the same alias as resample we can shift the dates in our Data using the tshift function.
#M is for end of month df.tshift(freq='M').head()
It changes the index of the dataframe to be the end of the month for every date of the same month.
If you are new to Pandas, we may want to check the Pandas Group by Tips.