mirror of
https://github.com/binwiederhier/ntfy.git
synced 2026-05-09 08:26:00 +02:00
[GH-ISSUE #338] Some publish requests on ntfy.sh take up to 15 seconds #264
Labels
No labels
ai-generated
android-app
android-app
android-app
🪲 bug
build
build
dependencies
docs
enhancement
enhancement
🔥 HOT
in-progress 🏃
ios
prio:low
prio:low
pull-request
question
🔒 security
server
server
unified-push
web-app
website
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ntfy#264
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @binwiederhier on GitHub (Jun 21, 2022).
Original GitHub issue: https://github.com/binwiederhier/ntfy/issues/338
So I've been noticing that every now and then some requests against ntfy.sh had been taking 11-15s (as opposed to <1s). At first I thought it was a problem with the Linux kernel tuning variables (somaxconn, nofile, ...). Then I thought it was nginx. After randomly poking around I found that the
updateStatsAndPrune()code is likely to blame, because it locks the server mutex for a very long time (or so it appears).Here's what I saw:
This happened even when doing it against localhost:11080 (= not through nginx), meaning DNS and nginx could be ruled out.
I briefly turned on trace logging in ntfy and saw this:
This corresponds to this block of code:
github.com/binwiederhier/ntfy@4e29216b5f/server/server.go (L1114-L1150), which locks the global server mutex here:github.com/binwiederhier/ntfy@4e29216b5f/server/server.go (L1083)Note the timestamps, 18:51:05 + 18:51:20 -- That's 15 seconds to run this code, meaning that all POST/PUT requests have to wait on the lock this entire time.
This is likely relatively easy to fix, and looking at the code it is obviously pretty inefficient.
@binwiederhier commented on GitHub (Jun 22, 2022):
In trying to fix this, I have encountered a horrible data race that I have not been able to figure out in quite some time. It appears to be happening when the Go HTTP code reads from the socket when closing the request, and causes a data race with the
ResponseWriter.Here's what the data race stack looks like (see https://github.com/binwiederhier/ntfy/runs/6994933562?check_suite_focus=true):
This stack in particular indicates that something inside the Go stdlib is reading from the underlying connection/socket
The
w ResponseWriteris protected withwlock sync.Mutex, but that is obviously not used by the standard library, which is why the race occurs. I have not been able to figure out a proper fix, other than to callwlock.TryLock()(orLock()) before the HTTP handler function exits. That is a hack, but it's harmless, because the lock is freed and irrelevant anyway after the function exits.@binwiederhier commented on GitHub (Jun 23, 2022):
Hopefully fixed, will be released on the next server release
@binwiederhier commented on GitHub (Jun 23, 2022):
Wrote this test to see if it fixed it and how much performance has improved.
Before (note:
Publishing message; took 948ms, ❗):After (note:
Publishing message; took 3ms❗):