Memory Use

When working with data, especially relatively large datasets, it is often important to keep an eye on the data’s memory consumption and at times take actions to optimize memory use such as clear unnecessary data frames from memory, change the data types of certain column, or work with a smaller subset of the data.

In this page, we look at how to get information about the memory consumption of datasets.

Data Frame

We wish to know how much memory does a particular data frame consume.

df.info(memory_usage="deep")

Here is how this works:

  • df.info() returns the memory consumed by a data frame.
  • df.info() however doesn't account for columns of type object which includes string columns. To account for those we set memory_usage="deep".

Each Column

We wish to know much memory does each column of a data frame consume.

df.memory_usage()

Here is how this works:

Pandas data frames have a memory_usage() method that returns the memory used by each column of the data frame as a Series (whose Index is the column names).

Particular Column

We wish to know how much memory does a particular column of a data frame consume.

df.memory_usage()['col_1']

Here is how this works:

  • We use the memory_usage() function of Pandas data frames as above.
  • To check the memory consumed by a particular column we index by column name which in this example is cols_mem['col_1'].

Available Memory

We wish to know how much memory does our system have in total, how much is used, and how much is free.

import psutil
psutil.virtual_memory()

Here is how this works:

  • We can use the virtual_memory() function from the psutil Python package.
  • The virtual_memory() function returns a useful summary of the current state of system’s memory. Typically, we look at available which is memory that is currently free and percent which is the proportion of total memory that is free.
PYTHON
I/O