Specialize UTF8 encoding in StringTextConverter writing#6297
Specialize UTF8 encoding in StringTextConverter writing#6297NinoFloris merged 3 commits intonpgsql:mainfrom
Conversation
|
@NinoFloris Another alternative to #5985, what do you think about this approach? |
|
Thanks @bbowyersmyth, that might work. I will take a better look after 10.0 is released, currently busy getting the remaining bits ready for release. |
# Conflicts: # test/Npgsql.Benchmarks/TypeHandlers/Text.cs
f1b3fbc to
47d25f8
Compare
There was a problem hiding this comment.
Pull request overview
Specializes UTF-8 encoding in PgWriter.WriteChars/WriteCharsAsync by using System.Text.Unicode.Utf8.FromUtf16 directly, avoiding the double-counting overhead of GetByteCount + GetBytes for UTF-8 strings. This yields significant performance improvements (up to ~2x for larger strings) with zero allocations.
Changes:
- Added UTF-8 fast path in
WriteCharsandWriteCharsAsyncusingUtf8.FromUtf16for known encoder fallbacks (exception/replacement) - Optimized the existing encoding path with a
GetMaxByteCountshort-circuit before the more expensiveGetByteCount - Updated benchmark to use
NpgsqlWriteBuffer.UTF8Encodingto exercise the new fast path
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Npgsql/Internal/PgWriter.cs | Added UTF-8 specialized Utf8Core paths for sync/async writing; added GetMaxByteCount short-circuit |
| test/Npgsql.Benchmarks/TypeHandlers/Text.cs | Updated benchmark to use NpgsqlWriteBuffer.UTF8Encoding instead of Encoding.UTF8 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Rebased onto main. I've moved the conversion into WriteChars{Async} itself, should be cheap enough. I've also made them try GetMaxByteCount before doing the double counting in non utf8 cases. PTAL @bbowyersmyth :) |
|
@NinoFloris That looks good. Lets go with that thanks. |
a61a13e to
f6f8c5b
Compare
To avoid the double counting of UTF8 bytes in writing UTF16 strings, create a specialized StringBasedTextConverter that can use the newer
System.Text.Unicode.Utf8namespace.When the encoding is the same object instance as
NpgsqlWriteBuffer.UTF8Encoding, which is a throwing encoder, we can handle partial encodes to the write remaining buffer and iterate if needed.Before
After