Two useful ways to use pandas.melt(): Splitting into long format and unpivoting

Two useful ways to use pandas.melt(): Splitting into long format and unpivoting
Pandas Logo

Are you familiar with the pandas.melt() function in Python? If not, you're in for a treat! pandas.melt() is a powerful tool for reshaping and restructuring data in a DataFrame. In this blog post, we'll explore two particularly useful ways to use pandas.melt(): splitting a column with multiple values separated by a comma into long format, and unpivoting a dataframe.

Splitting a column with multiple values separated by a comma into long format

Imagine you have a DataFrame with a column that contains multiple values separated by a comma. For example:

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3], 'feature_1': [100, 200, 300], 'feature_2': [5, 6, 7], 'feature_3': ['value_a, value_b', 'value_c, value_d', 'value_e, value_f']})
df

This results in a DataFrame that looks like this:

  id  feature_1  feature_2       feature_3
0  1        100          5  value_a, value_b
1  2        200          6  value_c, value_d
2  3        300          7  value_e, value_f

We can use pandas.melt() to split the values in feature_3 into separate rows. Here's how:

df_melted = df.melt(id_vars=['id', 'feature_1', 'feature_2'], value_vars='feature_3', value_name='feature_3_values')
df_melted

This results in a new DataFrame df_melted that looks like this:

   id  feature_1  feature_2 feature_3       feature_3_values
0   1        100          5  feature_3               value_a
1   2        200          6  feature_3               value_c
2   3        300          7  feature_3               value_e
3   1        100          5  feature_3               value_b
4   2        200          6  feature_3               value_d
5   3        300          7  feature_3               value_f

Unpivoting a dataframe

Another useful way to use pandas.melt() is to "unpivot" a dataframe. Suppose we have a dataframe that looks like this:

df = pd.DataFrame({'id': [1, 2, 3], 'feature_1': [100, 200, 300], 'feature_2': [5, 6, 7], 'value_1': [10, 40, 70], 'value_2': [20, 50, 80], 'value_3': [30, 60, 90]})
df

This results in a DataFrame that looks like this:

   id  feature_1  feature_2  value_1  value_2  value_3
0   1        100          5       10       20       30
1   2        200          6       40       50       60
2   3        300          7       70       80       90

We can use pandas.melt() to transform this dataframe into a "long" format, where the columns value_1, value_2, and value_3 are turned into a single column values with multiple rows. Here's how:

df_melted = df.melt(id_vars=['id', 'feature_1', 'feature_2'], value_vars=['value_1', 'value_2', 'value_3'], var_name='feature', value_name='values')
df_melted

This results in a new DataFrame df_melted that looks like this:

   id  feature_1  feature_2   feature  values
0   1        100          5  value_1      10
1   2        200          6  value_1      40
2   3        300          7  value_1      70
3   1        100          5  value_2      20
4   2        200          6  value_2      50
5   3        300          7  value_2      80
6   1        100          5  value_3      30
7   2        200          6  value_3      60
8   3        300          7  value_3      90

As you can see, pandas.melt() has transformed the dataframe into a long format, with a single column values containing all the values from the original columns value_1, value_2, and value_3. This can be a very useful way to reshape and restructure a dataframe when you need to analyze or manipulate the data in a specific way.

Conclusion

In conclusion, pandas.melt() is a powerful tool for reshaping and restructuring data in a DataFrame. We've explored two particularly useful ways to use pandas.melt(): splitting a column with multiple values separated by a comma into long format, and unpivoting a dataframe. Whether you're working with multi-valued columns or wide-format data, pandas.melt() can help you transform your data into the shape and format you need for your analysis or manipulation.