Most Data Scientists use Pandas for reading files, provided that the data are structured. In this tutorial, we will work with the “open” built-in function that takes two arguments, such as the file name and the mode. The mode indicates what action is required like reading, writing or creating and it also defines the format like text or binary. Below, we represent the description of the modes.
File Modes
Mode | Description |
---|---|
r | It opens an existing file to read-only mode. The file pointer exists at the beginning. |
rb | It opens the file to read-only in binary format. The file pointer exists at the beginning. |
r+ | It opens the file to read and write both. The file pointer exists at the beginning. |
rb+ | It opens the file to read and write both in binary format. The file pointer exists at the beginning of the file. |
w | It opens the file to write only. It overwrites the file if previously exists or creates a new one if no file exists with the same name. |
wb | It opens the file to write only in binary format. It overwrites the file if it exists previously or creates a new one if no file exists. |
w+ | It opens the file to write and read data. It will override existing data. |
wb+ | It opens the file to write and read both in binary format |
a | It opens the file in the append mode. It will not override existing data. It creates a new file if no file exists with the same name. |
ab | It opens the file in the append mode in binary format. |
a+ | It opens a file to append and read both. |
ab+ | It opens a file to append and read both in binary format. |
Read Files
For this tutorial, we have created a simple txt file called myfile.txt
with the following content:
This is the first line
This is the second line
This is the third line
This is the forth line
and this is the fith and final line
Let’s see how we can read it.
Using the open function
We can read the file using the “open” function as follows:
# open the file with the mode r which means "read" my_file = open('myfile.txt', mode = 'r') # read the content of the file storing # it in a variable called data data = my_file.read() print(data) # close the connection my_file.close()
Output:
This is the first line
This is the second line
This is the third line
This is the forth line
and this is the fith and final line
Using the with open function
Alternatively, we can use the “with open” function. The main difference is that it closes the connection automatically, and this is very helpful for file and handling. Let’s code!
with open('myfile.txt', mode = 'r') as my_file: data = my_file.read() print(data)
Output:
This is the first line
This is the second line
This is the third line
This is the forth line
and this is the fith and final line
The Three Methods for Reading Files
The three methods for reading files in Python with the open function are:
read()
It returns the entire contents of the file as a string that will contain all the characters. You can also pass in an integer to return only the specified number of characters in the file. For example, let’s return the first 10 characters.
with open('myfile.txt', mode = 'r') as my_file: # read the 10 first characters data = my_file.read(10) print(data)
Output:
This is th
readline()
It returns the first line of the file. For example:
with open('myfile.txt', mode = 'r') as my_file: data = my_file.readline() print(data)
Output:
This is the first line
Notice that the readline() function can take an integer argument for returning a specific number of characters of the first line.
readlines()
It returns the entire content as a list, where each element corresponds to a line. For example:
with open('myfile.txt', mode = 'r') as my_file: data = my_file.readlines() print(data)
Output:
['This is the first line\n', 'This is the second line\n', 'This is the third line\n', 'This is the forth line\n', 'and this is the fith and final line\n']
Write Files
By changing the mode in the open function, we can create files. Let’s create a new empty file called “newfile.txt”.
# the 'w' mode is for write with open('newfile.txt', mode='w') as my_file: pass
write() method
We have created an empty file called “newfile.txt”. Let’s see how we can add content to a new file.
# the 'w' mode is for write with open('newfile.txt', mode='w') as my_file: # add text my_file.write('I write the first line')
So the “newfile.txt” has the line “I write the first line”.
writelines() method
We can write multiple lines at once using the writelines method and passing a list. For example.
# the 'w' mode is for write with open('newfile.txt', mode='w') as my_file: # add text as a list and add the \n for the new lines my_file.writelines(['first line\n', 'second line\n', 'third line\n'])
Let’s see the content of the file:
cat newfile.txt
first line
second line
third line
Notice that every time we run the open function with the “w” mode, it overwrites the file.
Append New Lines
We can append new lines using the a
mode that comes from append. For example, let’s add another three lines to our previous file.
# the 'w' mode is for write with open('newfile.txt', mode='a') as my_file: # append new lines my_file.writelines(['forth line\n', 'fifth line\n', 'sixth line\n'])
Let’s see the content of the file.
cat newfile.txt
first line
second line
third line
forth line
fifth line
sixth line
Error Handling
It is common in Data Engineering pipelines to read files that sometimes for some reason do not exist. So, it is necessary to handle the errors with exceptions. For example, assume that we try to open a file that does not exist.
with open('nonexisting.txt', mode = 'r') as my_file: data = my_file.readline() print(data)
Output:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_21724\2298896326.py in <module>
----> 1 with open('nonexisting.txt', mode = 'r') as my_file:
2
3 # read the 10 first characters
4 data = my_file.readline()
5 print(data)
FileNotFoundError: [Errno 2] No such file or directory: 'nonexisting.txt'
As we can see, we got the “FileNotFoundError” error. Let’s see how we can handle with the try-except.
try: with open('nonexisting.txt', mode = 'r') as my_file: # read the 10 first characters data = my_file.readline() print(data) except FileNotFoundError as e: print ('Error', e)
Output:
Error [Errno 2] No such file or directory: 'nonexisting.txt'