[GH-ISSUE #677] HTTP 499 for polling iOS clients #497

Closed
opened 2026-05-07 00:24:46 +02:00 by BreizhHardware · 9 comments

Originally created by @binwiederhier on GitHub (Mar 21, 2023).
Original GitHub issue: https://github.com/binwiederhier/ntfy/issues/677

root@ntfy:/var/log# cat nginx/access.log.1 nginx/access_error.log.1 | grep 20/Mar/2023:14:46: | grep iOS | grep " 200 "|wc -l
44

root@ntfy:/var/log# cat nginx/access.log.1 nginx/access_error.log.1 | grep 20/Mar/2023:14:46: | grep iOS | grep " 499 "|wc -l
423
Originally created by @binwiederhier on GitHub (Mar 21, 2023). Original GitHub issue: https://github.com/binwiederhier/ntfy/issues/677 ``` root@ntfy:/var/log# cat nginx/access.log.1 nginx/access_error.log.1 | grep 20/Mar/2023:14:46: | grep iOS | grep " 200 "|wc -l 44 root@ntfy:/var/log# cat nginx/access.log.1 nginx/access_error.log.1 | grep 20/Mar/2023:14:46: | grep iOS | grep " 499 "|wc -l 423 ```
BreizhHardware 2026-05-07 00:24:46 +02:00
Author
Owner

@binwiederhier commented on GitHub (Mar 21, 2023):

image

<!-- gh-comment-id:1477209054 --> @binwiederhier commented on GitHub (Mar 21, 2023): ![image](https://user-images.githubusercontent.com/664597/226507368-43592eea-e3ed-46df-b44b-fe0aa5cd14c1.png)
Author
Owner

@binwiederhier commented on GitHub (Mar 21, 2023):

{
  "time": "2023-03-21T02:26:50.137Z",
  "level": "INFO",
  "message": "Connection closed with HTTP 500 (ntfy error 50001)",
  "error": "write unix /tmp/ntfy.sock->@: write: broken pipe",
  "http_method": "GET",
  "http_path": "/xxx/json?poll=1&since=HtLxH3Qm76CO",
  "tag": "http",
  "visitor_auth_limiter_limit": 0.016666666666666666,
  "visitor_auth_limiter_tokens": 30,
  "visitor_emails": 0,
  "visitor_emails_limit": 10,
  "visitor_emails_remaining": 10,
  "visitor_messages": 14,
  "visitor_messages_limit": 750,
  "visitor_messages_remaining": 736,
  "visitor_request_limiter_limit": 0.2,
  "visitor_request_limiter_tokens": 59.5017925624,
  "visitor_seen": "2023-03-21T02:26:47.628Z"
}
<!-- gh-comment-id:1477211549 --> @binwiederhier commented on GitHub (Mar 21, 2023): ``` { "time": "2023-03-21T02:26:50.137Z", "level": "INFO", "message": "Connection closed with HTTP 500 (ntfy error 50001)", "error": "write unix /tmp/ntfy.sock->@: write: broken pipe", "http_method": "GET", "http_path": "/xxx/json?poll=1&since=HtLxH3Qm76CO", "tag": "http", "visitor_auth_limiter_limit": 0.016666666666666666, "visitor_auth_limiter_tokens": 30, "visitor_emails": 0, "visitor_emails_limit": 10, "visitor_emails_remaining": 10, "visitor_messages": 14, "visitor_messages_limit": 750, "visitor_messages_remaining": 736, "visitor_request_limiter_limit": 0.2, "visitor_request_limiter_tokens": 59.5017925624, "visitor_seen": "2023-03-21T02:26:47.628Z" } ```
Author
Owner

@binwiederhier commented on GitHub (Mar 23, 2023):

This can't possibly be a timeout issue. The GET start end end are only 2 seconds apart. That's still long, but it shouldn't lead to a timeout client side. Sadly it looks like I cannot wireshark/tshark on unix sockets, so I'm not sure if I can see what's happening on the wire. 🤔

nginx:

root@ntfy:/var/log# tail -n 1000 nginx/access_error.log|grep topicname
x.x.x.x - - [23/Mar/2023:01:27:41 +0000] "GET /topicname/json?poll=1&since=GOhOM5fKP8Df HTTP/2.0" 499 0 "-" "ntfy/1.2 (build 3; iOS 16.3.1)" "TLSv1.3" "TLS_AES_128_GCM_SHA256"

ntfy:

root@ntfy:/var/log# tail -n 1000 ntfy.log|jq 'select(.visitor_ip == "x.x.x.x")'
{
  "time": "2023-03-23T01:27:40.392Z",
  "level": "DEBUG",
  "message": "HTTP stream connection opened",
  "http_method": "GET",
  "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df",
  "tag": "subscribe",
  "visitor_auth_limiter_limit": 0.016666666666666666,
  "visitor_auth_limiter_tokens": 30,
  "visitor_emails": 0,
  "visitor_emails_limit": 10,
  "visitor_emails_remaining": 10,
  "visitor_id": "ip:x.x.x.x",
  "visitor_ip": "x.x.x.x",
  "visitor_messages": 281,
  "visitor_messages_limit": 750,
  "visitor_messages_remaining": 469,
  "visitor_request_limiter_limit": 0.2,
  "visitor_request_limiter_tokens": 58.000108232,
  "visitor_seen": "2023-03-23T01:27:40.392Z"
}
{
  "time": "2023-03-23T01:27:40.595Z",
  "level": "DEBUG",
  "message": "HTTP stream connection closed",
  "http_method": "GET",
  "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df",
  "tag": "subscribe",
  "visitor_auth_limiter_limit": 0.016666666666666666,
  "visitor_auth_limiter_tokens": 30,
  "visitor_emails": 0,
  "visitor_emails_limit": 10,
  "visitor_emails_remaining": 10,
  "visitor_id": "ip:x.x.x.x",
  "visitor_ip": "x.x.x.x",
  "visitor_messages": 281,
  "visitor_messages_limit": 750,
  "visitor_messages_remaining": 469,
  "visitor_request_limiter_limit": 0.2,
  "visitor_request_limiter_tokens": 58.4954717692,
  "visitor_seen": "2023-03-23T01:27:40.392Z"
}
{
  "time": "2023-03-23T01:27:42.869Z",
  "level": "INFO",
  "message": "Connection closed with HTTP 500 (ntfy error 50001)",
  "error": "write unix /tmp/ntfy.sock->@: write: broken pipe",
  "http_method": "GET",
  "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df",
  "tag": "http",
  "visitor_auth_limiter_limit": 0.016666666666666666,
  "visitor_auth_limiter_tokens": 30,
  "visitor_emails": 0,
  "visitor_emails_limit": 10,
  "visitor_emails_remaining": 10,
  "visitor_id": "ip:x.x.x.x",
  "visitor_ip": "x.x.x.x",
  "visitor_messages": 281,
  "visitor_messages_limit": 750,
  "visitor_messages_remaining": 469,
  "visitor_request_limiter_limit": 0.2,
  "visitor_request_limiter_tokens": 58.495496099,
  "visitor_seen": "2023-03-23T01:27:40.392Z"
}
{
  "time": "2023-03-23T01:27:40.392Z",
  "level": "DEBUG",
  "message": "HTTP request finished",
  "http_method": "GET",
  "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df",
  "tag": "http",
  "time_taken_ms": 2477,
  "visitor_auth_limiter_limit": 0.016666666666666666,
  "visitor_auth_limiter_tokens": 30,
  "visitor_emails": 0,
  "visitor_emails_limit": 10,
  "visitor_emails_remaining": 10,
  "visitor_id": "ip:x.x.x.x",
  "visitor_ip": "x.x.x.x",
  "visitor_messages": 281,
  "visitor_messages_limit": 750,
  "visitor_messages_remaining": 469,
  "visitor_request_limiter_limit": 0.2,
  "visitor_request_limiter_tokens": 58.4955154748,
  "visitor_seen": "2023-03-23T01:27:40.392Z"
}
<!-- gh-comment-id:1480464007 --> @binwiederhier commented on GitHub (Mar 23, 2023): This can't possibly be a timeout issue. The GET start end end are only 2 seconds apart. That's still long, but it shouldn't lead to a timeout client side. Sadly it looks like I cannot wireshark/tshark on unix sockets, so I'm not sure if I can see what's happening on the wire. :thinking: nginx: ``` root@ntfy:/var/log# tail -n 1000 nginx/access_error.log|grep topicname x.x.x.x - - [23/Mar/2023:01:27:41 +0000] "GET /topicname/json?poll=1&since=GOhOM5fKP8Df HTTP/2.0" 499 0 "-" "ntfy/1.2 (build 3; iOS 16.3.1)" "TLSv1.3" "TLS_AES_128_GCM_SHA256" ``` ntfy: ``` root@ntfy:/var/log# tail -n 1000 ntfy.log|jq 'select(.visitor_ip == "x.x.x.x")' { "time": "2023-03-23T01:27:40.392Z", "level": "DEBUG", "message": "HTTP stream connection opened", "http_method": "GET", "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df", "tag": "subscribe", "visitor_auth_limiter_limit": 0.016666666666666666, "visitor_auth_limiter_tokens": 30, "visitor_emails": 0, "visitor_emails_limit": 10, "visitor_emails_remaining": 10, "visitor_id": "ip:x.x.x.x", "visitor_ip": "x.x.x.x", "visitor_messages": 281, "visitor_messages_limit": 750, "visitor_messages_remaining": 469, "visitor_request_limiter_limit": 0.2, "visitor_request_limiter_tokens": 58.000108232, "visitor_seen": "2023-03-23T01:27:40.392Z" } { "time": "2023-03-23T01:27:40.595Z", "level": "DEBUG", "message": "HTTP stream connection closed", "http_method": "GET", "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df", "tag": "subscribe", "visitor_auth_limiter_limit": 0.016666666666666666, "visitor_auth_limiter_tokens": 30, "visitor_emails": 0, "visitor_emails_limit": 10, "visitor_emails_remaining": 10, "visitor_id": "ip:x.x.x.x", "visitor_ip": "x.x.x.x", "visitor_messages": 281, "visitor_messages_limit": 750, "visitor_messages_remaining": 469, "visitor_request_limiter_limit": 0.2, "visitor_request_limiter_tokens": 58.4954717692, "visitor_seen": "2023-03-23T01:27:40.392Z" } { "time": "2023-03-23T01:27:42.869Z", "level": "INFO", "message": "Connection closed with HTTP 500 (ntfy error 50001)", "error": "write unix /tmp/ntfy.sock->@: write: broken pipe", "http_method": "GET", "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df", "tag": "http", "visitor_auth_limiter_limit": 0.016666666666666666, "visitor_auth_limiter_tokens": 30, "visitor_emails": 0, "visitor_emails_limit": 10, "visitor_emails_remaining": 10, "visitor_id": "ip:x.x.x.x", "visitor_ip": "x.x.x.x", "visitor_messages": 281, "visitor_messages_limit": 750, "visitor_messages_remaining": 469, "visitor_request_limiter_limit": 0.2, "visitor_request_limiter_tokens": 58.495496099, "visitor_seen": "2023-03-23T01:27:40.392Z" } { "time": "2023-03-23T01:27:40.392Z", "level": "DEBUG", "message": "HTTP request finished", "http_method": "GET", "http_path": "/topicname/json?poll=1&since=GOhOM5fKP8Df", "tag": "http", "time_taken_ms": 2477, "visitor_auth_limiter_limit": 0.016666666666666666, "visitor_auth_limiter_tokens": 30, "visitor_emails": 0, "visitor_emails_limit": 10, "visitor_emails_remaining": 10, "visitor_id": "ip:x.x.x.x", "visitor_ip": "x.x.x.x", "visitor_messages": 281, "visitor_messages_limit": 750, "visitor_messages_remaining": 469, "visitor_request_limiter_limit": 0.2, "visitor_request_limiter_tokens": 58.4955154748, "visitor_seen": "2023-03-23T01:27:40.392Z" } ```
Author
Owner

@binwiederhier commented on GitHub (Mar 23, 2023):

I couldn't figure out how to intercept the traffic without restarting or messing with nginx. I am seeing evidence that this is not related to iOS at all though, and likely more on the thundering herd. It's happening to the web app and Android clients as well on occasion. I am too tired now, but I will look more tomorrow.

<!-- gh-comment-id:1480533576 --> @binwiederhier commented on GitHub (Mar 23, 2023): I couldn't figure out how to intercept the traffic without restarting or messing with nginx. I am seeing evidence that this is not related to iOS at all though, and likely more on the thundering herd. It's happening to the web app and Android clients as well on occasion. I am too tired now, but I will look more tomorrow.
Author
Owner

@binwiederhier commented on GitHub (Mar 23, 2023):

Hmm. Ok, I can reproduce it on a topic that has a lot of messages, by just doing:

curl "https://ntfy.sh/topicname/json?since=all"
^C (quickly, before the output is downloaded/written)

Not surprising, but I am wondering if the non-iOS ones we're seeing is just these cases...

<!-- gh-comment-id:1480536578 --> @binwiederhier commented on GitHub (Mar 23, 2023): Hmm. Ok, I can reproduce it on a topic that has a lot of messages, by just doing: ``` curl "https://ntfy.sh/topicname/json?since=all" ^C (quickly, before the output is downloaded/written) ``` Not surprising, but I am wondering if the non-iOS ones we're seeing is just these cases...
Author
Owner

@binwiederhier commented on GitHub (Mar 23, 2023):

More data:

Happens exactly every 20min as we know already, almost exactly to the second:

root@ntfy:/var/log/nginx# cat access_error.log.1 access_error.log | grep " 499 " | awk '{ print $4 }' | awk -F: '{ print $1 ":" $2 ":" $3 }' | sort | cut -c 1-18 | uniq -c| tail -n 20
    442 [23/Mar/2023:12:07
     11 [23/Mar/2023:12:08
      1 [23/Mar/2023:12:09
      2 [23/Mar/2023:12:17
    452 [23/Mar/2023:12:27
     10 [23/Mar/2023:12:28
      1 [23/Mar/2023:12:30
      1 [23/Mar/2023:12:35
      1 [23/Mar/2023:12:36
      1 [23/Mar/2023:12:42
    422 [23/Mar/2023:12:47
     21 [23/Mar/2023:12:48
      1 [23/Mar/2023:12:49
      3 [23/Mar/2023:12:53
      2 [23/Mar/2023:12:56
      2 [23/Mar/2023:13:00
    426 [23/Mar/2023:13:07
     14 [23/Mar/2023:13:08
      2 [23/Mar/2023:13:09
      7 [23/Mar/2023:13:13

root@ntfy:/var/log/nginx# cat access_error.log.1 access_error.log | grep " 499 " | awk '{ print $4 }' | awk -F: '{ print $1 ":" $2 ":" $3 ":" $4 }' | sort | cut -c 1-21 | uniq -c| tail -n 20
      2 [23/Mar/2023:13:00:14
      6 [23/Mar/2023:13:07:54
    140 [23/Mar/2023:13:07:55
    202 [23/Mar/2023:13:07:56
     65 [23/Mar/2023:13:07:57
      4 [23/Mar/2023:13:07:58

Happens pretty much exclusively on iOS (ntfy/1.2), but that could be false hint, since they all connect at the same time:

root@ntfy:/var/log/nginx# cat access_error.log|grep " 499 "| awk '{ print $12 }' | sort | uniq -c |sort -nr
  16503 "ntfy/1.2
    119 "Mozilla/5.0
     50 "ntfy/1.16.0
      6 "Jetty/9.4.46.v20220331"
      4 "Dalvik/2.1.0
      3 "python-requests/2.28.2"
      2 "ntfy/1.14.0
<!-- gh-comment-id:1481177672 --> @binwiederhier commented on GitHub (Mar 23, 2023): More data: Happens exactly every 20min as we know already, almost exactly to the second: ``` root@ntfy:/var/log/nginx# cat access_error.log.1 access_error.log | grep " 499 " | awk '{ print $4 }' | awk -F: '{ print $1 ":" $2 ":" $3 }' | sort | cut -c 1-18 | uniq -c| tail -n 20 442 [23/Mar/2023:12:07 11 [23/Mar/2023:12:08 1 [23/Mar/2023:12:09 2 [23/Mar/2023:12:17 452 [23/Mar/2023:12:27 10 [23/Mar/2023:12:28 1 [23/Mar/2023:12:30 1 [23/Mar/2023:12:35 1 [23/Mar/2023:12:36 1 [23/Mar/2023:12:42 422 [23/Mar/2023:12:47 21 [23/Mar/2023:12:48 1 [23/Mar/2023:12:49 3 [23/Mar/2023:12:53 2 [23/Mar/2023:12:56 2 [23/Mar/2023:13:00 426 [23/Mar/2023:13:07 14 [23/Mar/2023:13:08 2 [23/Mar/2023:13:09 7 [23/Mar/2023:13:13 root@ntfy:/var/log/nginx# cat access_error.log.1 access_error.log | grep " 499 " | awk '{ print $4 }' | awk -F: '{ print $1 ":" $2 ":" $3 ":" $4 }' | sort | cut -c 1-21 | uniq -c| tail -n 20 2 [23/Mar/2023:13:00:14 6 [23/Mar/2023:13:07:54 140 [23/Mar/2023:13:07:55 202 [23/Mar/2023:13:07:56 65 [23/Mar/2023:13:07:57 4 [23/Mar/2023:13:07:58 ``` Happens pretty much exclusively on iOS (ntfy/1.2), but that could be false hint, since they all connect at the same time: ``` root@ntfy:/var/log/nginx# cat access_error.log|grep " 499 "| awk '{ print $12 }' | sort | uniq -c |sort -nr 16503 "ntfy/1.2 119 "Mozilla/5.0 50 "ntfy/1.16.0 6 "Jetty/9.4.46.v20220331" 4 "Dalvik/2.1.0 3 "python-requests/2.28.2" 2 "ntfy/1.14.0 ```
Author
Owner

@binwiederhier commented on GitHub (Mar 24, 2023):

I am not really closer. I added logging times to nginx, and tuned the listen block to add backlog=5000; and I looked at the time_taken_ms in ntfy.log. See https://www.nginx.com/blog/tuning-nginx/#listen

nginx.conf:

listen 443 ssl http2 backlog=5000;

log_format combined_ssl '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" '
               'request_time=$request_time '
               'upstream_status=$upstream_status '
               'upstream_response_time=$upstream_response_time '
               'upstream_connect_time=$upstream_connect_time '
               'upstream_header_time=$upstream_header_time';

What I see is that during the thundering herd, ntfy requests in the backend take very long to complete (6-8 seconds), but nginx timings do not at all match the ntfy backend. It is entirely conceivable that if requests actually take this long, that iOS will hang up.

x.x.x.x - - [24/Mar/2023:00:55:59 +0000] "GET /topicname/json?poll=1&since=xCYUwMFKwjhu HTTP/2.0" 499 0 "-" "ntfy/1.2 (build 3; iOS 15.6.0)" 
  request_time=0.744 
  upstream_status=- 
  upstream_response_time=0.744 
  upstream_connect_time=0.000 
  upstream_header_time=-

root@ntfy:/var/log# tail -n 100000 ntfy.log|grep topicname|grep finished|jq .time_taken_ms
8994

In the example about (which is representative), you can see that the backend request took almost 9 seconds, but nginx reports a response time of 0.7s and a status code of 499. So it could be that the iOS client really did hang up after 0.7s (unlikely), and the backend request just kept going.

<!-- gh-comment-id:1482123989 --> @binwiederhier commented on GitHub (Mar 24, 2023): I am not really closer. I added logging times to nginx, and tuned the `listen` block to add `backlog=5000`; and I looked at the `time_taken_ms` in ntfy.log. See https://www.nginx.com/blog/tuning-nginx/#listen nginx.conf: ``` listen 443 ssl http2 backlog=5000; log_format combined_ssl '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" ' 'request_time=$request_time ' 'upstream_status=$upstream_status ' 'upstream_response_time=$upstream_response_time ' 'upstream_connect_time=$upstream_connect_time ' 'upstream_header_time=$upstream_header_time'; ``` What I see is that during the thundering herd, ntfy requests in the backend take very long to complete (6-8 seconds), but nginx timings do not at all match the ntfy backend. It is entirely conceivable that if requests actually take this long, that iOS will hang up. ``` x.x.x.x - - [24/Mar/2023:00:55:59 +0000] "GET /topicname/json?poll=1&since=xCYUwMFKwjhu HTTP/2.0" 499 0 "-" "ntfy/1.2 (build 3; iOS 15.6.0)" request_time=0.744 upstream_status=- upstream_response_time=0.744 upstream_connect_time=0.000 upstream_header_time=- root@ntfy:/var/log# tail -n 100000 ntfy.log|grep topicname|grep finished|jq .time_taken_ms 8994 ``` In the example about (which is representative), you can see that the backend request took almost 9 seconds, but nginx reports a response time of 0.7s and a status code of 499. So it could be that the iOS client really did hang up after 0.7s (unlikely), and the backend request just kept going.
Author
Owner

@binwiederhier commented on GitHub (Mar 28, 2023):

So. My profiling endeavors show me that the 499-spikes are likely simply caused by CPU exhaustion. The bottleneck for this many queries seems to be simply related to SQLite queries, and not anything in the code in particular. I think with a bigger box, we'd be fine, which is good to know, since the box that runs ntfy.sh is pretty small.

As for my concrete steps from this, I will likely simply disable the iOS polling entirely (for now). This should not have any impact on iOS (hopefully), since the polling is a secondary measure anyway, and almost all of them seem to fail anyway right now.

It may even be beneficial, because the duplicate delivery issues will disappear.

There is no good way to test this IMHO, so I'll just do it and see if iOS users will scream. 😬

<!-- gh-comment-id:1487329709 --> @binwiederhier commented on GitHub (Mar 28, 2023): So. My profiling endeavors show me that the 499-spikes are likely simply caused by CPU exhaustion. The bottleneck for this many queries seems to be simply related to SQLite queries, and not anything in the code in particular. I think with a bigger box, we'd be fine, which is good to know, since the box that runs ntfy.sh is pretty small. As for my concrete steps from this, I will likely simply disable the iOS polling entirely (for now). This should not have any impact on iOS (hopefully), since the polling is a secondary measure anyway, and almost all of them seem to fail anyway right now. It may even be beneficial, because the duplicate delivery issues will disappear. There is no good way to test this IMHO, so I'll just do it and see if iOS users will scream. :grimacing:
Author
Owner

@binwiederhier commented on GitHub (Mar 30, 2023):

Disabled entirely in github.com/binwiederhier/ntfy@bdae48afba

<!-- gh-comment-id:1490773398 --> @binwiederhier commented on GitHub (Mar 30, 2023): Disabled entirely in https://github.com/binwiederhier/ntfy/commit/bdae48afba3d3a313803a86f9c2acae2ba50927d
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ntfy#497
No description provided.