Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,7 @@ I/O
- Fixed bug in :func:`read_csv` with the ``c`` engine where an embedded ``\r`` followed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (:issue:`51141`)
- Fixed bug in :func:`read_excel` where usage of ``skiprows`` could lead to an infinite loop (:issue:`64027`)
- Fixed bug where :func:`read_html` parsed nested tables incorrectly when using ``html5lib`` or ``bs4`` flavors (:issue:`64524`)
- Fixed memory leak in :func:`read_csv` (:issue:`19941`)
- Fixed segfault when instantiating the internal ``pandas._libs.parsers.TextReader`` with no arguments; it now raises ``TypeError`` (:issue:`53131`)
- Fixed :func:`read_json` with ``lines=True`` and ``chunksize`` to respect ``nrows``
when the requested row count is not a multiple of the chunk size (:issue:`64025`)
Expand Down
5 changes: 3 additions & 2 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ from cpython.exc cimport (
from cpython.long cimport PyLong_FromString
from cpython.object cimport PyObject
from cpython.ref cimport (
Py_INCREF,
Py_XDECREF,
)
from cpython.unicode cimport (
Expand Down Expand Up @@ -325,6 +324,7 @@ cdef class TextReader:
bint allow_leading_cols
uint64_t parser_start # this is modified after __init__
const char *encoding_errors
object _encoding_errors

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be bytes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be an object.

The element that PyBytes_AsString receives is an object. And the goal of this member is to keep a valid reference for self.encoding_errors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehh, can do this in a separate step

kh_str_starts_t *false_set
kh_str_starts_t *true_set
int64_t buffer_lines, skipfooter
Expand Down Expand Up @@ -386,7 +386,8 @@ cdef class TextReader:
encoding_errors = encoding_errors.encode("utf-8")
elif encoding_errors is None:
encoding_errors = b"strict"
Py_INCREF(encoding_errors)
# store encoding_errors in `self` for Cython to manage its lifetime.
self._encoding_errors = encoding_errors
self.encoding_errors = PyBytes_AsString(encoding_errors)

self.parser = parser_new()
Expand Down
Loading