Predictive Hacks

How to Create Dummy Pandas Data Frames

dummy data

Mainly for testing purposes, sometimes we want to create some dummy data frames. Pandas give us this possibility with the util.testing package.


Dummy Data Frame

By default, it creates 30 rows with 4 columns called A,B,C and D and the index alpha-numeric.

import pandas as pd
pd.util.testing.makeDataFrame()
 
ABCD
1kd8TfzEiX1.032690-0.5739850.671357-0.005690
jvoji1iqSq-0.921305-1.4221190.799405-0.757337
9kVZnQU29u0.7726751.559491-0.585057-1.661675
EUriSO6l9C-0.489347-1.317456-1.0844170.217104
LuyVlcJAgF-1.5590431.473184-0.0299680.250103

Dummy Data Frame with Missing Values

It assigns some NaN values randomly.

pd.util.testing.makeMissingDataframe()
  
ABCD
8SscPSnyy3-0.1868940.8677060.976297-0.768294
h3cvhbkSWTNaN0.083227-0.5703440.633503
CI0V1MUGal-0.025917-1.909735-0.270712-1.622608
IeLbykQMB2NaN-0.4149580.479902-1.418628
QDn4bxJpAU-0.602611-1.1102270.425438-0.467016

Dummy Data Frame of Time-Series format

Here the index is as Time Series

pd.util.testing.makeTimeDataFrame()
 
ABCD
2000-01-030.8242571.3672411.448037-0.649556
2000-01-04-0.6404700.1892390.681814-0.737980
2000-01-05-0.8288751.2398000.0037760.744634
2000-01-061.0566021.6608390.546301-0.521864
2000-01-070.285226-0.2698750.697068-0.295571

Dummy Data Frame of Mixed Types

It creates a mixed dummy data containing categorical, date-time and continuous variables.

pd.util.testing.makeMixedDataFrame()
 
ABCD
00.00.0foo12009-01-01
11.01.0foo22009-01-02
22.00.0foo32009-01-05
33.01.0foo42009-01-06
44.00.0foo52009-01-07

Dummy Data Frame with Periodical data

It creates dummy data frames with periodical data.

pd.util.testing.makePeriodFrame()
 
ABCD
2000-01-031.5865590.2906120.609690-0.155839
2000-01-04-0.540105-0.478986-1.0649011.302807
2000-01-051.1135940.611258-0.574987-1.149406
2000-01-06-0.841371-0.2949330.023008-0.097956
2000-01-07-0.0802832.5888330.0054250.150920

More rows and columns?

In case we want more rows and columns than the default which are 30 and 4 respectively, we can define the testing.N as the number of rows and testing.K as the number of columns.

pd.util.testing.N = 10
pd.util.testing.K = 5
pd.util.testing.makeDataFrame()
 

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

connect with sql
R

How to Connect R with SQL

Need to Connect R with SQL It is common for Data Analysts/Scientists to connect R with SQL. For that reason,

[the_ad_group id="232"]
[the_ad id="2133"]