Skip to content

Commit 50389df

Browse files
committed
pythongh-12005: Align FileIO.readall between _pyio and _io
Utilize `bytearray.resize()` and `os.readinto()` to reduce copies and match behavior of `_io.FileIO.readall()`. There is still an extra copy which means twice the memory required compared to FileIO because there isn't a zero-copy path from `bytearray` -> `bytes` currently. On my system reading a 2GB file `./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v` Goes from ~2.7 seconds -> ~2.2 seconds
1 parent cdcacec commit 50389df

2 files changed

Lines changed: 21 additions & 10 deletions

File tree

Lib/_pyio.py

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1674,22 +1674,31 @@ def readall(self):
16741674
except OSError:
16751675
pass
16761676

1677-
result = bytearray()
1677+
result = bytearray(bufsize)
1678+
bytes_read = 0
16781679
while True:
1679-
if len(result) >= bufsize:
1680-
bufsize = len(result)
1681-
bufsize += max(bufsize, DEFAULT_BUFFER_SIZE)
1682-
n = bufsize - len(result)
1680+
if bytes_read >= bufsize:
1681+
# Parallels _io/fileio.c new_buffersize
1682+
if bufsize > 65536:
1683+
addend = bufsize >> 3
1684+
else:
1685+
addend = 256 + bufsize
1686+
if addend < DEFAULT_BUFFER_SIZE:
1687+
addend = DEFAULT_BUFFER_SIZE
1688+
bufsize += addend
1689+
result.resize(bufsize)
1690+
1691+
assert bufsize - bytes_read > 0, "Should always try and read at least one byte"
16831692
try:
1684-
chunk = os.read(self._fd, n)
1693+
n = os.readinto(self._fd, memoryview(result)[bytes_read:])
16851694
except BlockingIOError:
1686-
if result:
1695+
if bytes_read:
16871696
break
16881697
return None
1689-
if not chunk: # reached the end of the file
1698+
if n == 0: # Reached the end of the file
16901699
break
1691-
result += chunk
1692-
1700+
bytes_read += n
1701+
result.resize(bytes_read)
16931702
return bytes(result)
16941703

16951704
def readinto(self, buffer):
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
``_pyio.FileIO.readall()`` now allocates, resizes, and fills a data buffer
2+
using the same algorithm ``_io.FileIO.readall()`` uses.

0 commit comments

Comments
 (0)