Skip to content

BUG: Raise ValueError when var_name collides with id_vars in melt()#66036

Closed
C1-BA-B1-F3 wants to merge 1 commit into
pandas-dev:mainfrom
C1-BA-B1-F3:fix-melt-var-name-collision
Closed

BUG: Raise ValueError when var_name collides with id_vars in melt()#66036
C1-BA-B1-F3 wants to merge 1 commit into
pandas-dev:mainfrom
C1-BA-B1-F3:fix-melt-var-name-collision

Conversation

@C1-BA-B1-F3

Copy link
Copy Markdown

Description

Fixes #65654

DataFrame.melt silently corrupts data when var_name matches an id_vars column. The original id_vars data gets overwritten with the variable labels.

Before (silent corruption):

df = pd.DataFrame({"id": [1, 2], "a": [10, 20], "b": [100, 200]})
out = df.melt(id_vars="id", var_name="id")
#   id id  value
# 0  a  a     10
# 1  a  a     20
# 2  b  b    100
# 3  b  b    200

After (ValueError raised):

ValueError: var_name ({'id'}) cannot match an element in id_vars. This would cause the id_vars data to be silently overwritten.

Changes

  • pandas/core/reshape/melt.py: Added validation check after var_name is resolved to detect collisions with id_vars. Raises ValueError with a clear message explaining the issue.
  • pandas/tests/reshape/test_melt.py: Added two test cases:
    • test_raise_of_var_name_in_id_vars: Tests scalar id_vars collision
    • test_raise_of_var_name_in_id_vars_list: Tests list-like id_vars collision

Testing

import pandas as pd

# Test 1: Collision detected
try:
    df = pd.DataFrame({"id": [1, 2], "a": [10, 20], "b": [100, 200]})
    pd.melt(df, id_vars="id", var_name="id")
except ValueError as e:
    print(f"PASS: {e}")

# Test 2: No collision works fine
df = pd.DataFrame({"id": [1, 2], "a": [10, 20], "b": [100, 200]})
result = pd.melt(df, id_vars="id", var_name="variable")
print(result)

This fix is consistent with the existing value_name collision check (line 179-183 in the original code).

Fixes GH#65654 - DataFrame.melt silently corrupts data when var_name
matches an id_vars column. The fix adds validation to raise a ValueError
before the data is overwritten, consistent with the existing value_name
collision check.

Tests added for both scalar and list-like id_vars cases.
@jbrockmendel

Copy link
Copy Markdown
Member

closed by #65655

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.melt silently corrupts data on var_name output-name collisions

2 participants