When working with data, especially relatively large datasets, it is often important to keep an eye on the data’s memory consumption and at times take actions to optimize memory use such as clear unnecessary data frames from memory, change the data types of certain column, or work with a smaller subset of the data.
In this page, we look at how to get information about the memory consumption of datasets.
We wish to know how much memory does a particular data frame consume.
df.info(memory_usage="deep")
Here is how this works:
df.info()
returns the memory consumed by a data frame.df.info()
however doesn't account for columns of type object
which includes string columns. To account for those we set memory_usage="deep"
.We wish to know much memory does each column of a data frame consume.
df.memory_usage()
Here is how this works:
Pandas data frames have a memory_usage()
method that returns the memory used by each column of the data frame as a Series
(whose Index
is the column names).
We wish to know how much memory does a particular column of a data frame consume.
df.memory_usage()['col_1']
Here is how this works:
memory_usage()
function of Pandas data frames as above.cols_mem['col_1']
.We wish to know how much memory does our system have in total, how much is used, and how much is free.
import psutil
psutil.virtual_memory()
Here is how this works:
virtual_memory()
function from the psutil
Python package.virtual_memory()
function returns a useful summary of the current state of system’s memory. Typically, we look at available
which is memory that is currently free and percent
which is the proportion of total memory that is free.