Two useful ways to use pandas.melt(): Splitting into long format and unpivoting

Are you familiar with the pandas.melt()
function in Python? If not, you're in for a treat! pandas.melt()
is a powerful tool for reshaping and restructuring data in a DataFrame. In this blog post, we'll explore two particularly useful ways to use pandas.melt()
: splitting a column with multiple values separated by a comma into long format, and unpivoting a dataframe.
Splitting a column with multiple values separated by a comma into long format
Imagine you have a DataFrame with a column that contains multiple values separated by a comma. For example:
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'feature_1': [100, 200, 300], 'feature_2': [5, 6, 7], 'feature_3': ['value_a, value_b', 'value_c, value_d', 'value_e, value_f']})
df
This results in a DataFrame that looks like this:
id feature_1 feature_2 feature_3
0 1 100 5 value_a, value_b
1 2 200 6 value_c, value_d
2 3 300 7 value_e, value_f
We can use pandas.melt()
to split the values in feature_3
into separate rows. Here's how:
df_melted = df.melt(id_vars=['id', 'feature_1', 'feature_2'], value_vars='feature_3', value_name='feature_3_values')
df_melted
This results in a new DataFrame df_melted
that looks like this:
id feature_1 feature_2 feature_3 feature_3_values
0 1 100 5 feature_3 value_a
1 2 200 6 feature_3 value_c
2 3 300 7 feature_3 value_e
3 1 100 5 feature_3 value_b
4 2 200 6 feature_3 value_d
5 3 300 7 feature_3 value_f
Unpivoting a dataframe
Another useful way to use pandas.melt()
is to "unpivot" a dataframe. Suppose we have a dataframe that looks like this:
df = pd.DataFrame({'id': [1, 2, 3], 'feature_1': [100, 200, 300], 'feature_2': [5, 6, 7], 'value_1': [10, 40, 70], 'value_2': [20, 50, 80], 'value_3': [30, 60, 90]})
df
This results in a DataFrame that looks like this:
id feature_1 feature_2 value_1 value_2 value_3
0 1 100 5 10 20 30
1 2 200 6 40 50 60
2 3 300 7 70 80 90
We can use pandas.melt()
to transform this dataframe into a "long" format, where the columns value_1
, value_2
, and value_3
are turned into a single column values
with multiple rows. Here's how:
df_melted = df.melt(id_vars=['id', 'feature_1', 'feature_2'], value_vars=['value_1', 'value_2', 'value_3'], var_name='feature', value_name='values')
df_melted
This results in a new DataFrame df_melted
that looks like this:
id feature_1 feature_2 feature values
0 1 100 5 value_1 10
1 2 200 6 value_1 40
2 3 300 7 value_1 70
3 1 100 5 value_2 20
4 2 200 6 value_2 50
5 3 300 7 value_2 80
6 1 100 5 value_3 30
7 2 200 6 value_3 60
8 3 300 7 value_3 90
As you can see, pandas.melt()
has transformed the dataframe into a long format, with a single column values
containing all the values from the original columns value_1
, value_2
, and value_3
. This can be a very useful way to reshape and restructure a dataframe when you need to analyze or manipulate the data in a specific way.
Conclusion
In conclusion, pandas.melt()
is a powerful tool for reshaping and restructuring data in a DataFrame. We've explored two particularly useful ways to use pandas.melt()
: splitting a column with multiple values separated by a comma into long format, and unpivoting a dataframe. Whether you're working with multi-valued columns or wide-format data, pandas.melt()
can help you transform your data into the shape and format you need for your analysis or manipulation.