Skip to content

Port 'Adjust placement of paragraph markers' from machine.py#435

Merged
ddaspit merged 2 commits into
masterfrom
copilot/port-adjust-placement-of-paragraph-markers
Jun 29, 2026
Merged

Port 'Adjust placement of paragraph markers' from machine.py#435
ddaspit merged 2 commits into
masterfrom
copilot/port-adjust-placement-of-paragraph-markers

Conversation

Copilot AI commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Ports machine.py#298 — after alignment-based placement of paragraph markers, apply small boundary adjustments to produce more natural splits (e.g. keeping a trailing comma with its sentence rather than letting it open the next paragraph).

New: SegmentBoundaryAdjuster

Two new classes in SegmentBoundaryAdjuster.cs:

  • TokenRejoiner — reconstructs token lists into strings with correct punctuation spacing (no space before ,/./closing quotes, no space after opening brackets/quotes).
  • SegmentBoundaryAdjuster — adjusts a segment boundary by:
    • Moving prohibited segment-starting characters (, ; . ? ! closing quotes/brackets) from the head of the next segment to the tail of the current one
    • Moving prohibited segment-ending characters (opening brackets/quotes) from the tail of the current segment to the head of the next one
    • Correcting late sentence starts (capitalized words that crossed the boundary too early)
    • Correcting early sentence ends (words + terminal punctuation that crossed the boundary too late)
    • AdjustTokenizedSegmentPairBoundaries(int boundary, IReadOnlyList<string> tokens) — token-index variant used by the handler

Change: PlaceMarkersUsfmUpdateBlockHandler

After PredictMarkerLocation, paragraph markers now go through AdjustTokenizedSegmentPairBoundaries before their string index is resolved:

// If inserting a paragraph marker, make small adjustments to place it in a more natural location
if (element.Type == UsfmUpdateBlockElementType.Paragraph)
{
    adjacentTargetToken = _segmentBoundaryAdjuster.AdjustTokenizedSegmentPairBoundaries(
        adjacentTargetToken,
        targetTokens
    );
}

Before: alignment places \p before , → paragraph opens with , y esta prueba…
After: comma stays in the preceding paragraph → Este texto está en inglés, / \p y esta prueba…


This change is Reviewable

Copilot AI changed the title [WIP] Port relevant changes from PR #298 to machine Port 'Adjust placement of paragraph markers' from machine.py Jun 24, 2026
Copilot AI requested a review from ddaspit June 24, 2026 20:26
@ddaspit ddaspit marked this pull request as ready for review June 25, 2026 13:59
@ddaspit

ddaspit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@copilot The build is failing.

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot The build is failing.

Fixed in the latest commit — the two new files (SegmentBoundaryAdjuster.cs and SegmentBoundaryAdjusterTests.cs) weren't run through CSharpier before being committed. Running dotnet csharpier format on both files resolved the formatting errors.

@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.79769% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.31%. Comparing base (0eb2979) to head (bfa9987).

Files with missing lines Patch % Lines
src/SIL.Machine/Corpora/SegmentBoundaryAdjuster.cs 94.54% 3 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #435      +/-   ##
==========================================
+ Coverage   73.20%   73.31%   +0.10%     
==========================================
  Files         442      443       +1     
  Lines       37017    37190     +173     
  Branches     5091     5110      +19     
==========================================
+ Hits        27100    27264     +164     
- Misses       8797     8800       +3     
- Partials     1120     1126       +6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Enkidu93 Enkidu93 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ddaspit).

@ddaspit

ddaspit commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@copilot Rebase the PR on the latest changes.

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot Rebase the PR on the latest changes.

Done — merged the latest master into the branch (commit 2749287e). All existing tests still pass.

@ddaspit ddaspit force-pushed the copilot/port-adjust-placement-of-paragraph-markers branch from 2749287 to bfa9987 Compare June 29, 2026 18:44

@ddaspit ddaspit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on copilot[bot]).

@ddaspit ddaspit merged commit c271990 into master Jun 29, 2026
4 checks passed
@ddaspit ddaspit deleted the copilot/port-adjust-placement-of-paragraph-markers branch June 29, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants