Skip to content

BUG: Fix Memory Leak in read_csv#66033

Merged
jbrockmendel merged 2 commits into
pandas-dev:mainfrom
Alvaro-Kothe:fix/leak-csv
Jun 26, 2026
Merged

BUG: Fix Memory Leak in read_csv#66033
jbrockmendel merged 2 commits into
pandas-dev:mainfrom
Alvaro-Kothe:fix/leak-csv

Conversation

@Alvaro-Kothe

Copy link
Copy Markdown
Member

I used the reproduction from #19941 (comment) to start solving the problem, after solving that, couldn't reproduce the original issue with the ThreadPool.


Reproduction

import io
from pandas import read_csv

s = """item_id,user_id,region,city,parent_category_name,category_name,param_1,param_2,param_3,title,description,price,item_seq_number,activation_date,user_type,image,image_top_1,deal_probability
ba83aefab5dc,91e2f88dd6e3,Ростовская область,Ростов-на-Дону,Бытовая электроника,Аудио и видео,"Видео, DVD и Blu-ray плееры",,,Philips bluray,"В хорошем состоянии, домашний кинотеатр с blu ray, USB. Если настроить, то работает смарт тв Торг",4000.0,9,2017-03-20,Private,b7f250ee3f39e1fedd77c141f273703f4a9be59db4b48a8713f112c67e29bb42,3032.0,0.43177"""

buf = io.StringIO(s)

read_csv(buf)

LSan report:

Direct leak of 39 byte(s) in 1 object(s) allocated from:
    #0 0x0000004a9478 in malloc (/home/alvaro/opt/python-asan/bin/python3.14+0x4a9478) (BuildId: d34f3059a20abaa7865a706a3f4007981ffba2c9)
    #1 0x0000005d2e72 in _PyBytes_FromSize /home/alvaro/opt/Python-3.14.6/Objects/bytesobject.c:125:31
    #2 0x0000005d2e72 in PyBytes_FromStringAndSize /home/alvaro/opt/Python-3.14.6/Objects/bytesobject.c:155:27
    #3 0x0000007acdb8 in unicode_encode_utf8 /home/alvaro/opt/Python-3.14.6/Objects/unicodeobject.c:5865:16
    #4 0x0000007ad565 in _PyUnicode_AsUTF8String /home/alvaro/opt/Python-3.14.6/Objects/unicodeobject.c:5954:12
    #5 0x0000007ad565 in PyUnicode_AsEncodedString /home/alvaro/opt/Python-3.14.6/Objects/unicodeobject.c:3959:24
    #6 0x000000807aae in unicode_encode_impl /home/alvaro/opt/Python-3.14.6/Objects/unicodeobject.c:11867:12
    #7 0x000000807aae in unicode_encode /home/alvaro/opt/Python-3.14.6/Objects/clinic/unicodeobject.c.h:293:20
    #8 0x0000005f642b in _PyObject_VectorcallTstate /home/alvaro/opt/Python-3.14.6/./Include/internal/pycore_call.h:177:11
    #9 0x0000005f642b in PyObject_VectorcallMethod /home/alvaro/opt/Python-3.14.6/Objects/call.c:918:18
    #10 0x7b6b428ebd45 in __pyx_pf_6pandas_5_libs_7parsers_10TextReader___cinit__ /home/alvaro/projects/oss/pandas/build/clang-asan/pandas/_libs/parsers.cpython-314-x86_64-linux-gnu.so.p/pandas/_libs/parsers.pyx.c:7861:19
    #11 0x7b6b428e7ea2 in __pyx_pw_6pandas_5_libs_7parsers_10TextReader_1__cinit__ /home/alvaro/projects/oss/pandas/build/clang-asan/pandas/_libs/parsers.cpython-314-x86_64-linux-gnu.so.p/pandas/_libs/parsers.pyx.c:7785:13
    #12 0x7b6b428be7c8 in __pyx_tp_new_6pandas_5_libs_7parsers_TextReader /home/alvaro/projects/oss/pandas/build/clang-asan/pandas/_libs/parsers.cpython-314-x86_64-linux-gnu.so.p/pandas/_libs/parsers.pyx.c:31489:16
    #13 0x00000075ccfe in type_call /home/alvaro/opt/Python-3.14.6/Objects/typeobject.c:2372:11
...

SUMMARY: AddressSanitizer: 39 byte(s) leaked in 1 allocation(s).

Comment thread pandas/_libs/parsers.pyx
bint allow_leading_cols
uint64_t parser_start # this is modified after __init__
const char *encoding_errors
object _encoding_errors

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be bytes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be an object.

The element that PyBytes_AsString receives is an object. And the goal of this member is to keep a valid reference for self.encoding_errors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehh, can do this in a separate step

@jbrockmendel jbrockmendel merged commit 6bc176a into pandas-dev:main Jun 26, 2026
46 checks passed
@jbrockmendel

Copy link
Copy Markdown
Member

thanks @Alvaro-Kothe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read_csv leaks memory when used in multiple threads

2 participants