You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,10 @@
1
-
## HEAD (unreleased)
1
+
## 0.7.0
2
+
3
+
- Honor an `X-Request-Start` header with the `t=<microseconds>` format, to allow using `wait_timeout` functionality with Apache (https://github.com/zombocom/rack-timeout/pull/210)
4
+
- Improve message when Terminate on Timeout is used on a platform that does not support it (eg. Windows or JVM) (https://github.com/zombocom/rack-timeout/pull/192)
5
+
- Fix a thread safety issue for forks that are not on the main thread (https://github.com/zombocom/rack-timeout/pull/212)
6
+
- Add compatibility with frozen_string_literal: true (https://github.com/zombocom/rack-timeout/pull/196)
7
+
- Fix if Rails is defined but Rails::VERSION is not defined (https://github.com/zombocom/rack-timeout/pull/191)
Copy file name to clipboardExpand all lines: doc/risks.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Risks and shortcomings of using Rack::Timeout
5
5
6
6
Sometimes a request is taking too long to complete because it's blocked waiting on synchronous IO. Such IO does not need to be file operations, it could be, say, network or database operations. If said IO is happening in a C library that's unaware of ruby's interrupt system (i.e. anything written without ruby in mind), calling `Thread#raise` (that's what rack-timeout uses) will not have effect until after the IO block is gone.
7
7
8
-
At the moment rack-timeout does not try to address this issue. As a fail-safe against these cases, a blunter solution that kills the entire process is recommended, such as unicorn's timeouts.
8
+
As a fail-safe against these cases, a blunter solution that kills the entire process is recommended, such as unicorn's timeouts. You can enable this process killing behavior by enabling `term_on_timeout` for more info see [setting][term-on-timeout].
9
9
10
10
More detailed explanations of the issues surrounding timing out in ruby during IO blocks can be found at:
11
11
@@ -15,14 +15,16 @@ More detailed explanations of the issues surrounding timing out in ruby during I
15
15
16
16
Raising mid-flight in stateful applications is inherently unsafe. A request can be aborted at any moment in the code flow, and the application can be left in an inconsistent state. There's little way rack-timeout could be aware of ongoing state changes. Applications that rely on a set of globals (like class variables) or any other state that lives beyond a single request may find those left in an unexpected/inconsistent state after an aborted request. Some cleanup code might not have run, or only half of a set of related changes may have been applied.
17
17
18
-
A lot more can go wrong. An intricate explanation of the issue by JRuby's Charles Nutter can be found [here][broken-timeout].
18
+
A lot more can go wrong. An intricate explanation of the issue by JRuby's Charles Nutter can be found [
19
+
Ruby's Thread#raise, Thread#kill, timeout.rb, and net/protocol.rb libraries are broken][broken-timeout]. In addition Richard Schneeman talked about this issue in [The Oldest Bug In Ruby - Why Rack::Timeout Might Hose your Server][oldest-bug]. One solution from having `rack-timeout` corrupt process state is to restart the entire process on timeout. You can enable this behavior by setting [term_on_timeout][term-on-timeout].
19
20
20
-
Ruby 2.1 provides a way to defer the result of raising exceptions through the [Thread.handle_interrupt][handle-interrupt] method. This could be used in critical areas of your application code to prevent Rack::Timeout from accidentally wreaking havoc by raising just in the wrong moment. That said, `handle_interrupt` and threads in general are hard to reason about, and detecting all cases where it would be needed in an application is a tall order, and the added code complexity is probably not worth the trouble.
21
+
Ruby 2.1+ provides a way to defer the result of raising exceptions through the [Thread.handle_interrupt][handle-interrupt] method. This low level interface is meant more for library authors than higher level application developers. This interface could be used in critical areas of your application code to prevent Rack::Timeout from accidentally wreaking havoc by raising just in the wrong moment. That said, `handle_interrupt` and threads in general are hard to reason about, and detecting all cases where it would be needed in an application is a tall order, and the added code complexity is probably not worth the trouble.
21
22
22
23
Your time is better spent ensuring requests run fast and don't need to timeout.
23
24
24
25
That said, it's something to be aware of, and may explain some eerie wonkiness seen in logs.
@@ -33,3 +35,5 @@ Because of the aforementioned issues, it's recommended you set library-specific
33
35
You'll want to set all relevant timeouts to something lower than Rack::Timeout's `service_timeout`. Generally you want them to be at least 1s lower, so as to account for time spent elsewhere during the request's lifetime while still giving libraries a chance to time out before Rack::Timeout.
Copy file name to clipboardExpand all lines: doc/settings.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,9 @@
3
3
Rack::Timeout has 4 settings, each of which impacts when Rack::Timeout
4
4
will raise an exception, and which type of exception will be raised.
5
5
6
+
7
+
Additionally there is a [demo app](https://github.com/zombocom/rack_timeout_demos) that shows the impact of changing settings and how the library behaves when a timeout is hit.
8
+
6
9
### Service Timeout
7
10
8
11
`service_timeout` is the most important setting.
@@ -26,9 +29,18 @@ Wait timeout can be disabled entirely by setting the property to `0` or `false`.
26
29
27
30
A request's computed wait time may affect the service timeout used for it. Basically, a request's wait time plus service time may not exceed the wait timeout. The reasoning for that is based on Heroku router's behavior, that the request would be dropped anyway after the wait timeout. So, for example, with the default settings of `service_timeout=15`, `wait_timeout=30`, a request that had 20 seconds of wait time will not have a service timeout of 15, but instead of 10, as there are only 10 seconds left before `wait_timeout` is reached. This behavior can be disabled by setting `service_past_wait` to `true`. When set, the `service_timeout` setting will always be honored. Please note that if you're using the `RACK_TIMEOUT_SERVICE_PAST_WAIT` environment variable, any value different than `"false"` will be considered `true`.
28
31
29
-
The way we're able to infer a request's start time, and from that its wait time, is through the availability of the `X-Request-Start` HTTP header, which is expected to contain the time since epoch in milliseconds. (A concession is made for nginx's sec.msec notation.)
32
+
The way we're able to infer a request's start time, and from that its wait time, is through the availability of the `X-Request-Start` HTTP header, which is expected to contain the time since UNIX epoch in milliseconds or microseconds.
33
+
34
+
Compatible header string formats are:
35
+
36
+
-`seconds.milliseconds`, e.g. `1700173924.763` - 10.3 digits (nginx format)
37
+
-`t=seconds.milliseconds`, e.g. `t=1700173924.763` - 10.3 digits, nginx format with [New Relic recommended][new-relic-recommended-format]`t=` prefix
38
+
-`milliseconds`, e.g. `1700173924763` - 13 digits (Heroku format)
39
+
-`t=microseconds`, e.g. `t=1700173924763384` - 16 digits with `t=` prefix (Apache format)
-[License to SIGKILL](https://www.sitepoint.com/license-to-sigkill/)
57
69
58
-
**Puma SIGTERM behavior** When a Puma worker receives a `SIGTERM` it will begin to shut down, but not exit right away. It stops accepting new requests and waits for any existing requests to finish before fully shutting down. This means that only the request that experiences a timeout will be interupted, all other in-flight requests will be allowed to run until they return or also are timed out.
70
+
**Puma SIGTERM behavior** When a Puma worker receives a `SIGTERM` it will begin to shut down, but not exit right away. It stops accepting new requests and waits for any existing requests to finish before fully shutting down. This means that only the request that experiences a timeout will be interrupted, all other in-flight requests will be allowed to run until they return or also are timed out.
59
71
60
72
After the worker process exists will Puma's parent process know to boot a replacement worker. While one process is restarting, another can still serve requests (if you have more than 1 worker process per server/dyno). Between when a process exits and when a new process boots, there will be a reduction in throughput. If all processes are restarting, then incoming requests will be blocked while new processes boot.
The platform running your application does not support forking (i.e. Windows, JVM, etc).
106
+
107
+
To avoid this error, either specify RACK_TIMEOUT_TERM_ON_TIMEOUT=0 or
108
+
leave it as default (which will have the same result).
109
+
110
+
MSG
106
111
end
107
112
@app=app
108
113
end
@@ -124,7 +129,7 @@ def call(env)
124
129
seconds_waited=time_started_service - time_started_wait# how long it took between the web server first receiving the request and rack being able to handle it
125
130
seconds_waited=0ifseconds_waited < 0# make up for potential time drift between the routing server and the application server
126
131
final_wait_timeout=wait_timeout + effective_overtime# how long the request will be allowed to have waited
127
-
seconds_service_left=final_wait_timeout - seconds_waited# first calculation of service timeout (relevant if request doesn't get expired, may be overriden later)
132
+
seconds_service_left=final_wait_timeout - seconds_waited# first calculation of service timeout (relevant if request doesn't get expired, may be overridden later)
128
133
info.wait=seconds_waited# updating the info properties; info.timeout will be the wait timeout at this point
129
134
info.timeout=final_wait_timeout
130
135
@@ -154,13 +159,14 @@ def call(env)
154
159
timeout=RT::Scheduler::Timeout.newdo |app_thread| # creates a timeout instance responsible for timing out the request. the given block runs if timed out
155
160
register_state_change.call:timed_out
156
161
157
-
message="Request "
162
+
message=+"Request "
158
163
message << "waited #{info.ms(:wait)}, then "ifinfo.wait
159
164
message << "ran for longer than #{info.ms(:timeout)} "
message << ", sending SIGTERM to process #{Process.pid}"
165
171
Process.kill("SIGTERM",Process.pid)
166
172
else
@@ -188,9 +194,9 @@ def call(env)
188
194
# X-Request-Start contains the time the request was first seen by the server. Format varies wildly amongst servers, yay!
189
195
# - nginx gives the time since epoch as seconds.milliseconds[1]. New Relic documentation recommends preceding it with t=[2], so might as well detect it.
190
196
# - Heroku gives the time since epoch in milliseconds. [3]
191
-
# - Apache uses t=microseconds[4], so we're not even going there.
197
+
# - Apache uses t=microseconds[4], so 16 digits (until November 2286).
192
198
#
193
-
# The sane way to handle this would be by knowing the server being used, instead let's just hack around with regular expressions and ignore apache entirely.
199
+
# The sane way to handle this would be by knowing the server being used, instead let's just hack around with regular expressions.
# This method determines if a body is present. requests with a body (generally POST, PUT) can have a lengthy body which may have taken a while to be received by the web server, inflating their computed wait time. This in turn could lead to unwanted expirations. See wait_overtime property as a way to overcome those.
0 commit comments