[PR #1672] [MERGED] Attachment fixes to address inconsistencies between DB and backend store #1686

Closed
opened 2026-05-07 01:03:14 +02:00 by BreizhHardware · 0 comments

📋 Pull Request Information

Original PR: https://github.com/binwiederhier/ntfy/pull/1672
Author: @binwiederhier
Created: 3/24/2026
Status: Merged
Merged: 3/25/2026
Merged by: @binwiederhier

Base: mainHead: attachment-fixes


📝 Commits (5)

📊 Changes

18 files changed (+161 additions, -167 deletions)

View changed files

📝 attachment/store.go (+17 -11)
📝 attachment/store_file_test.go (+2 -1)
📝 attachment/store_s3_test.go (+2 -2)
📝 docs/releases.md (+1 -1)
📝 message/cache.go (+19 -45)
📝 message/cache_postgres.go (+4 -8)
📝 message/cache_postgres_schema.go (+42 -4)
📝 message/cache_sqlite.go (+4 -8)
📝 message/cache_sqlite_schema.go (+14 -1)
📝 message/cache_sqlite_test.go (+2 -2)
📝 message/cache_test.go (+11 -29)
📝 server/config.go (+12 -5)
📝 server/server.go (+5 -2)
📝 server/server_account_test.go (+2 -0)
📝 server/server_manager.go (+18 -28)
📝 server/server_payments_test.go (+2 -0)
📝 server/server_test.go (+4 -2)
📝 web/package-lock.json (+0 -18)

📄 Description

Summary

Fixes attachment handling to prevent orphaned S3/filesystem files and phantom DB records (DB records pointing to missing files). These issues were caused by race conditions during server restarts and missing safeguards in the cleanup logic.

Attachment expiry cap

  • attachment_expires is now capped to never exceed message.expires, ensuring attachments can never outlive their message

Single owner for file deletion

  • Before: Both pruneAttachments(), pruneMessages(), and sync() could delete attachment files, creating race conditions during restarts (files deleted but DB rows surviving → phantom records causing user-facing 404s)
  • After: Only sync() deletes files from storage. pruneAttachments() and pruneMessages() only update the DB. If the process is killed mid-operation, the worst case is orphaned files (cleaned up by sync()) rather than phantom DB records

Batched message/attachment pruning

  • Before: pruneMessages() deleted rows one-by-one in a transaction (242K individual DELETE WHERE mid=$1 over remote Postgres = 115 seconds, blocking the entire manager cycle)
  • After: Single DELETE ... WHERE mid IN (SELECT mid ... LIMIT $2) query, capped at ManagerBatchSize (default 30K) per cycle. Backlog clears over multiple 1-minute cycles (~2s each) instead of one long blocking operation

Missing Postgres index

  • Added idx_message_attachment_expires ON message (attachment_expires) WHERE attachment_deleted = FALSE — SQLite already had this, Postgres didn't. Without it, AttachmentsExpired and AttachmentsWithSizes queries did full table scans on 1M+ rows
  • Added Postgres migration system (14→15) matching the existing SQLite migration pattern

Configurable orphan grace period

  • orphanGracePeriod is now injected through the store constructor via Config.AttachmentOrphanGracePeriod instead of being a package-level constant

Removed dead code

  • Removed DeleteMessages(), MessagesExpired(), AttachmentsExpired(), MarkAttachmentsDeleted(), and their per-row queries — all replaced by the batched equivalents

Test plan

  • Tested with prod data snapshot (1M messages, 14K attachments) on staging with S3 backend — zero phantoms, zero orphans after settlement
  • Tested with prod data snapshot on staging with filesystem backend — zero new phantoms, zero orphans after settlement (3 pre-existing prod phantoms from backup timing)
  • Verified manager cycle times: first cycle ~7s (with backlog), steady state ~300ms (down from 168s)
  • Verified messy restarts no longer create phantom records
  • All existing tests pass

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/binwiederhier/ntfy/pull/1672 **Author:** [@binwiederhier](https://github.com/binwiederhier) **Created:** 3/24/2026 **Status:** ✅ Merged **Merged:** 3/25/2026 **Merged by:** [@binwiederhier](https://github.com/binwiederhier) **Base:** `main` ← **Head:** `attachment-fixes` --- ### 📝 Commits (5) - [`e55d1ce`](https://github.com/binwiederhier/ntfy/commit/e55d1cee6b8dd1bcfb9973b24d04a375ffbe48f7) Attachment fixes to address inconsistencies between DB and backend store - [`071543e`](https://github.com/binwiederhier/ntfy/commit/071543efdadfe7239fb38b13ea7bd9f8478d7c1c) Fixes - [`ca59cfc`](https://github.com/binwiederhier/ntfy/commit/ca59cfc1e1c05298dc7940c2668f6fc3726724bb) Words - [`3ff8bac`](https://github.com/binwiederhier/ntfy/commit/3ff8bacc45a63abe5b640879ab05189050c63f4d) Update changelog - [`f256a41`](https://github.com/binwiederhier/ntfy/commit/f256a4101bed94ad6e1f205dfee535ba4a861e00) Fix test ### 📊 Changes **18 files changed** (+161 additions, -167 deletions) <details> <summary>View changed files</summary> 📝 `attachment/store.go` (+17 -11) 📝 `attachment/store_file_test.go` (+2 -1) 📝 `attachment/store_s3_test.go` (+2 -2) 📝 `docs/releases.md` (+1 -1) 📝 `message/cache.go` (+19 -45) 📝 `message/cache_postgres.go` (+4 -8) 📝 `message/cache_postgres_schema.go` (+42 -4) 📝 `message/cache_sqlite.go` (+4 -8) 📝 `message/cache_sqlite_schema.go` (+14 -1) 📝 `message/cache_sqlite_test.go` (+2 -2) 📝 `message/cache_test.go` (+11 -29) 📝 `server/config.go` (+12 -5) 📝 `server/server.go` (+5 -2) 📝 `server/server_account_test.go` (+2 -0) 📝 `server/server_manager.go` (+18 -28) 📝 `server/server_payments_test.go` (+2 -0) 📝 `server/server_test.go` (+4 -2) 📝 `web/package-lock.json` (+0 -18) </details> ### 📄 Description ## Summary Fixes attachment handling to prevent orphaned S3/filesystem files and phantom DB records (DB records pointing to missing files). These issues were caused by race conditions during server restarts and missing safeguards in the cleanup logic. ### Attachment expiry cap - `attachment_expires` is now capped to never exceed `message.expires`, ensuring attachments can never outlive their message ### Single owner for file deletion - **Before:** Both `pruneAttachments()`, `pruneMessages()`, and `sync()` could delete attachment files, creating race conditions during restarts (files deleted but DB rows surviving → phantom records causing user-facing 404s) - **After:** Only `sync()` deletes files from storage. `pruneAttachments()` and `pruneMessages()` only update the DB. If the process is killed mid-operation, the worst case is orphaned files (cleaned up by `sync()`) rather than phantom DB records ### Batched message/attachment pruning - **Before:** `pruneMessages()` deleted rows one-by-one in a transaction (242K individual `DELETE WHERE mid=$1` over remote Postgres = 115 seconds, blocking the entire manager cycle) - **After:** Single `DELETE ... WHERE mid IN (SELECT mid ... LIMIT $2)` query, capped at `ManagerBatchSize` (default 30K) per cycle. Backlog clears over multiple 1-minute cycles (~2s each) instead of one long blocking operation ### Missing Postgres index - Added `idx_message_attachment_expires ON message (attachment_expires) WHERE attachment_deleted = FALSE` — SQLite already had this, Postgres didn't. Without it, `AttachmentsExpired` and `AttachmentsWithSizes` queries did full table scans on 1M+ rows - Added Postgres migration system (14→15) matching the existing SQLite migration pattern ### Configurable orphan grace period - `orphanGracePeriod` is now injected through the store constructor via `Config.AttachmentOrphanGracePeriod` instead of being a package-level constant ### Removed dead code - Removed `DeleteMessages()`, `MessagesExpired()`, `AttachmentsExpired()`, `MarkAttachmentsDeleted()`, and their per-row queries — all replaced by the batched equivalents ## Test plan - [x] Tested with prod data snapshot (1M messages, 14K attachments) on staging with S3 backend — zero phantoms, zero orphans after settlement - [x] Tested with prod data snapshot on staging with filesystem backend — zero new phantoms, zero orphans after settlement (3 pre-existing prod phantoms from backup timing) - [x] Verified manager cycle times: first cycle ~7s (with backlog), steady state ~300ms (down from 168s) - [x] Verified messy restarts no longer create phantom records - [x] All existing tests pass --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
BreizhHardware 2026-05-07 01:03:14 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ntfy#1686
No description provided.