Data Scientists use to work with Anaconda Environments and for installing packages they use to run the “conda” commands. However, apart from conda there is the “pip” package manager that is still the most popular. Although these two package managers are very similar, they are designed for different purposes and should be used accordingly. In this tutorial, we will show you some tips about pip that you are going to apply to your daily tasks.
What is pip?
According to Wikipedia, pip is a package-management system written in Python used to install and manage software packages. It connects to an online repository of public packages, called the Python Package Index. pip can also be configured to connect to other package repositories, provided that they comply to Python Enhancement Proposal 503.
pip Tips
Install packages
I think that most of you know how to install packages using pip which is simply by running the command:
pip install some-package-name
If you would like to install a specific version you can run:
pip install 'some-package-name==1.2.2' --force-reinstall
Where 1.2.2 is the version of the package. We can add the flag –force-reinstall in case we want to re-install the package if it is already installed. Moreover, you can give a range of versions like:
pip install 'some-package-name>=1.3.0,<1.4.0' --force-reinstall
Finally, you can install packages for a specific python version. For example, if we want for python 3, we can run
pip3 install some-package-name
Uninstall packages
We can easily remove a package by running:
pip uninstall some-package-name
Install packages from the requirements
We have explained how to create the requirements.txt file. Let’s assume that the requirements.txt is the file below:
pandas==1.2.5 numpy==1.21.1
We can install these libraries by running:
pip install -r requirements.txt
Generate the requirements.txt file
Usually, we work with virtual environments and once we have installed the required libraries, we can easily generate the requirements.txt file using pip.
pip freeze > requirements.txt
Get the installed packages
Using pip, we can get a list of the installed packages in our environment by running:
pip list
You can search for a specific package using the list and the grep command. Let’s get my pandas version.
pip list | grep pandas
pandas 1.2.5
Check for compatibility issues
When we install packages, it is common to have compatibility issues with dependencies and so on. We can check if everything is OK by running:
pip check
If I run it at my base environment, I get the following:
streamlit 0.86.0 requires protobuf, which is not installed. spyder 4.2.5 requires pyqt5, which is not installed. spyder 4.2.5 requires pyqtwebengine, which is not installed. qdarkstyle 2.8.1 requires helpdev, which is not installed. conda-repo-cli 1.0.4 requires pathlib, which is not installed. anaconda-project 0.10.1 requires ruamel-yaml, which is not installed. awswrangler 2.9.0 has requirement numpy<1.21.0,>=1.18.0, but you have numpy 1.21.1. awswrangler 2.9.0 has requirement pyarrow<4.1.0,>=2.0.0, but you have pyarrow 5.0.0. awscli 1.20.12 has requirement botocore==1.21.12, but you have botocore 1.20.112. awscli 1.20.12 has requirement colorama<0.4.4,>=0.2.5, but you have colorama 0.4.4. awscli 1.20.12 has requirement docutils<0.16,>=0.10, but you have docutils 0.17.1. awscli 1.20.12 has requirement s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.4.2.
Apparently, I have some work to do!
Show more info about packages
We can get more information about an installed package by running
pip show some-package-name
For example, this is what I get for pandas.
Name: pandas Version: 1.2.5 Summary: Powerful data structures for data analysis, time series, and statistics Home-page: https://pandas.pydata.org Author: Author-email: License: BSD Location: c:\users\gpipis\anaconda3\lib\site-packages Requires: pytz, numpy, python-dateutil Required-by: streamlit, statsmodels, seaborn, mlxtend, awswrangler, altair
The Takeaway
Data Scientists and/or Data Engineers work with Python on a daily basis and as a result, a basic knowledge of “pip” is a really useful tool for their work. That was an introduction to pip, I encourage you to dive into pip and unlock its power, and feel free to share your tips with our community.