Skip to content

Specialize UTF8 encoding in StringTextConverter writing#6297

Merged
NinoFloris merged 3 commits intonpgsql:mainfrom
bbowyersmyth:users/bruceb/utf8textconverter
Mar 17, 2026
Merged

Specialize UTF8 encoding in StringTextConverter writing#6297
NinoFloris merged 3 commits intonpgsql:mainfrom
bbowyersmyth:users/bruceb/utf8textconverter

Conversation

@bbowyersmyth
Copy link
Copy Markdown
Contributor

To avoid the double counting of UTF8 bytes in writing UTF16 strings, create a specialized StringBasedTextConverter that can use the newer System.Text.Unicode.Utf8 namespace.

When the encoding is the same object instance as NpgsqlWriteBuffer.UTF8Encoding, which is a throwing encoder, we can handle partial encodes to the write remaining buffer and iterate if needed.

Before

Method Value Mean Error StdDev Op/s Gen0 Allocated
Write xxxxxxxxxx 17.62 ns 0.052 ns 0.048 ns 56,761,267.5 0.0000 -
WriteAsync xxxxxxxxxx 18.96 ns 0.028 ns 0.024 ns 52,749,799.6 0.0000 -
Write xxxx(...)xxxx [100] 23.58 ns 0.043 ns 0.038 ns 42,413,674.2 0.0005 5 B
WriteAsync xxxx(...)xxxx [100] 24.22 ns 0.109 ns 0.102 ns 41,293,211.8 0.0005 5 B
Write xxxx(...)xxxx [1000] 66.98 ns 0.292 ns 0.274 ns 14,928,853.2 0.0051 54 B
WriteAsync xxxx(...)xxxx [1000] 76.86 ns 0.203 ns 0.180 ns 13,010,451.6 0.0051 54 B

After

Method Value Mean Error StdDev Op/s Gen0 Allocated
Write xxxxxxxxxx 13.22 ns 0.024 ns 0.021 ns 75,642,930.8 - -
WriteAsync xxxxxxxxxx 15.90 ns 0.036 ns 0.034 ns 62,878,778.0 - -
Write xxxx(...)xxxx [100] 15.75 ns 0.024 ns 0.021 ns 63,478,157.3 - -
WriteAsync xxxx(...)xxxx [100] 18.56 ns 0.034 ns 0.030 ns 53,876,045.3 - -
Write xxxx(...)xxxx [1000] 34.91 ns 0.148 ns 0.139 ns 28,646,020.9 - -
WriteAsync xxxx(...)xxxx [1000] 44.01 ns 0.179 ns 0.149 ns 22,723,470.8 - -

@bbowyersmyth
Copy link
Copy Markdown
Contributor Author

@NinoFloris Another alternative to #5985, what do you think about this approach?
#6016 could then be moved to the new class as well.

@NinoFloris
Copy link
Copy Markdown
Member

Thanks @bbowyersmyth, that might work. I will take a better look after 10.0 is released, currently busy getting the remaining bits ready for release.

bbowyersmyth and others added 2 commits March 13, 2026 14:27
# Conflicts:
#	test/Npgsql.Benchmarks/TypeHandlers/Text.cs
@NinoFloris NinoFloris force-pushed the users/bruceb/utf8textconverter branch from f1b3fbc to 47d25f8 Compare March 13, 2026 13:40
@NinoFloris NinoFloris marked this pull request as ready for review March 13, 2026 13:40
Copilot AI review requested due to automatic review settings March 13, 2026 13:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Specializes UTF-8 encoding in PgWriter.WriteChars/WriteCharsAsync by using System.Text.Unicode.Utf8.FromUtf16 directly, avoiding the double-counting overhead of GetByteCount + GetBytes for UTF-8 strings. This yields significant performance improvements (up to ~2x for larger strings) with zero allocations.

Changes:

  • Added UTF-8 fast path in WriteChars and WriteCharsAsync using Utf8.FromUtf16 for known encoder fallbacks (exception/replacement)
  • Optimized the existing encoding path with a GetMaxByteCount short-circuit before the more expensive GetByteCount
  • Updated benchmark to use NpgsqlWriteBuffer.UTF8Encoding to exercise the new fast path

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Npgsql/Internal/PgWriter.cs Added UTF-8 specialized Utf8Core paths for sync/async writing; added GetMaxByteCount short-circuit
test/Npgsql.Benchmarks/TypeHandlers/Text.cs Updated benchmark to use NpgsqlWriteBuffer.UTF8Encoding instead of Encoding.UTF8

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Npgsql/Internal/PgWriter.cs
@NinoFloris
Copy link
Copy Markdown
Member

Rebased onto main.

I've moved the conversion into WriteChars{Async} itself, should be cheap enough. I've also made them try GetMaxByteCount before doing the double counting in non utf8 cases.

PTAL @bbowyersmyth :)

@bbowyersmyth
Copy link
Copy Markdown
Contributor Author

@NinoFloris That looks good. Lets go with that thanks.

@NinoFloris NinoFloris force-pushed the users/bruceb/utf8textconverter branch from a61a13e to f6f8c5b Compare March 17, 2026 07:13
@NinoFloris NinoFloris merged commit 25def34 into npgsql:main Mar 17, 2026
12 checks passed
@bbowyersmyth bbowyersmyth deleted the users/bruceb/utf8textconverter branch March 18, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants