Predictive Hacks

# Python pip Tips for Data Scientists

Data Scientists use to work with Anaconda Environments and for installing packages they use to run the “conda” commands. However, apart from conda there is the “pip” package manager that is still the most popular. Although these two package managers are very similar, they are designed for different purposes and should be used accordingly. In this tutorial, we will show you some tips about pip that you are going to apply to your daily tasks.

## What is pip?

According to Wikipedia, pip is a package-management system written in Python used to install and manage software packages. It connects to an online repository of public packages, called the Python Package Index. pip can also be configured to connect to other package repositories, provided that they comply to Python Enhancement Proposal 503.

## pip Tips

### Install packages

I think that most of you know how to install packages using pip which is simply by running the command:

pip install some-package-name


If you would like to install a specific version you can run:

pip install 'some-package-name==1.2.2' --force-reinstall


Where 1.2.2 is the version of the package. We can add the flag –force-reinstall in case we want to re-install the package if it is already installed. Moreover, you can give a range of versions like:

pip install 'some-package-name>=1.3.0,<1.4.0' --force-reinstall


Finally, you can install packages for a specific python version. For example, if we want for python 3, we can run

pip3 install some-package-name


### Uninstall packages

We can easily remove a package by running:

pip uninstall some-package-name


### Install packages from the requirements

We have explained how to create the requirements.txt file. Let’s assume that the requirements.txt is the file below:

pandas==1.2.5
numpy==1.21.1


We can install these libraries by running:

pip install -r requirements.txt


### Generate the requirements.txt file

Usually, we work with virtual environments and once we have installed the required libraries, we can easily generate the requirements.txt file using pip.

pip freeze > requirements.txt


### Get the installed packages

Using pip, we can get a list of the installed packages in our environment by running:

pip list


You can search for a specific package using the list and the grep command. Let’s get my pandas version.

pip list | grep pandas

pandas                             1.2.5


### Check for compatibility issues

When we install packages, it is common to have compatibility issues with dependencies and so on. We can check if everything is OK by running:

pip check


If I run it at my base environment, I get the following:

streamlit 0.86.0 requires protobuf, which is not installed.
spyder 4.2.5 requires pyqt5, which is not installed.
spyder 4.2.5 requires pyqtwebengine, which is not installed.
qdarkstyle 2.8.1 requires helpdev, which is not installed.
conda-repo-cli 1.0.4 requires pathlib, which is not installed.
anaconda-project 0.10.1 requires ruamel-yaml, which is not installed.
awswrangler 2.9.0 has requirement numpy<1.21.0,>=1.18.0, but you have numpy 1.21.1.
awswrangler 2.9.0 has requirement pyarrow<4.1.0,>=2.0.0, but you have pyarrow 5.0.0.
awscli 1.20.12 has requirement botocore==1.21.12, but you have botocore 1.20.112.
awscli 1.20.12 has requirement colorama<0.4.4,>=0.2.5, but you have colorama 0.4.4.
awscli 1.20.12 has requirement docutils<0.16,>=0.10, but you have docutils 0.17.1.
awscli 1.20.12 has requirement s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.4.2.


Apparently, I have some work to do!

We can get more information about an installed package by running

pip show some-package-name


For example, this is what I get for pandas.

Name: pandas
Version: 1.2.5
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author:
Author-email:
Location: c:\users\gpipis\anaconda3\lib\site-packages
Requires: pytz, numpy, python-dateutil
Required-by: streamlit, statsmodels, seaborn, mlxtend, awswrangler, altair


## The Takeaway

Data Scientists and/or Data Engineers work with Python on a daily basis and as a result, a basic knowledge of “pip” is a really useful tool for their work. That was an introduction to pip, I encourage you to dive into pip and unlock its power, and feel free to share your tips with our community.

### Get updates and learn from the best

Miscellaneous

#### How to Redirect and Save Errors in Unix

In Unix, there are three types of redirection such as: Standard Input (stdin) that is denoted by 0. Usually, it’s

Python

#### Content-Based Recommender Systems with TensorFlow Recommenders

In this post, we will consider as a reference point the “Building deep retrieval models” tutorial from TensorFlow and we