Implementing Timeouts in Node.js 'vm' Module
Trusted, but Unknown
When I first started playing with the Node vm module, I was using
it to execute trusted, but unknown script. Using the vm module as a
sandbox was a great way to get a prototype up and running quickly, but
presented a few problems if this type of prototype were to ever run in
a production environment. The goal of the prototype was to give other
developers the ability to execute script they wrote inside of my app.
Everyone makes typos and mistakes, and if a fellow developer asked the
sandbox to execute something which entered an endless loop, it would loop
the node process and defeat the purpose of a multi-user non-blocking app.
There are a few possible solutions to this problem. It’s possible to use
separate processes, perhaps with the cluster module to kill processes if
they loop, but it is not always desirable to use additional pids. To
get even more complicated, an entire JS engine such as
js.js could be embedded, allowing
the outer code to monitor the interpreter and stop execution if necessary.
Hey, it’s possible, but I would not want to increase the maintenance,
security, and performance complexity of my app in that manner unless there
was no other option. I wanted to keep loop detection in-process, which
meant interfacing C++ loop detection with Node.
My first attempt at achieving this was writing a native module, node-scriptdog, at JSConf EU 2012. The module approach failed. In order to plant the timeout into Node, I first had to descend a level into the innards of V8 to teach it how to resume execution after terminating. The reason why V8 needed to be modified is subtle — Node itself is written in script (for the most part) and V8 can not distinguish between “Node” script and “user” script.
Stopping execution
The way a V8 embedder stops exection of an endless loop is to use the C++
API V8::TerminateExecution(). The thread running the endless loop
is obviously busy, so this must be called from a different thread.
Typically, a timer is used to limit the time a single call into the
engine may take. After the function has been called, the engine throws
an uncatchable exception that propagates up to the embedder’s outer
v8::TryCatch, which then lets the app do something else.
This works great, except that engine may only again begin executing script after the exception is fully propagated and the entire Javascript stack is unwound. Since most of Node itself is script frames, this results in unexpected behavior as the Node frames are unwound as well:
1 2 3 4 5 6 | |
I spoke with Vyacheslav Egorov while at JSConf and he indicated this was
how V8:TerminateExecution() was designed to work, but that it should be
possible to modify V8 to support resuming. I couldn’t see anyone on the
V8 team prioritizing this since it isn’t required for their use of the
engine, so I set out to do it myself.
New API
V8 needed to be modified to allow a C++ v8::TryCatch to not only
detect that a termination exception was thrown, but instruct the engine
that things really are OK and execution can resume again.
As part of the changes made to V8, v8::TryCatch received
a new member HasTerminated(), which obviously allows an embedder to
first detect that a termination exception has been thrown due to
another thread calling V8::TerminateExecution().
Once such an exception has been caught, V8::CancelTerminateExecution()
allows the embedder to tell the engine that script execution should
continue after the call completes. This function performs the magic
needed to reset the engine to a sane state and allows the scripted Node
frames still on the stack to continue executing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Watchdog
The final piece of the puzzle is wrapped up in the node::Watchdog class.
Node is built around libuv, which provides the requisite event loop,
timer, and thread primitives to implement the timeout. This class takes
care of spawning a new thread which runs a separate event loop that waits
for either the timer to expire or for async notification that the
Watchdog was destroyed because execution returned normally.
The uv_timer_t handler couldn’t be simpler:
1 2 3 | |
As for the thread function, Ben Noordhuis pointed me towards using
uv_run() with UV_RUN_ONCE, which lets the loop run only once and then
exit after processing either the timer or async notification:
1 2 3 4 5 6 7 8 9 10 11 | |
(I wouldn’t have gathered on my own that a UV_RUN_DEFAULT was needed in
order to let libuv clean up after itself once the loop ref count
reached zero. This pattern is useful for anyone who wishes to implement
anything similar with libuv in the future.)
Resuming Execution
The V8 changes landed in r14378, and were released as part
of v3.18.3. Node updated its copy of V8 in 2f75785c01.
Modifying node_script.cc#L442 to use the new API was
then possible:
1 2 3 4 5 6 7 8 9 10 11 | |
This is pretty much what any use of TryCatch.HasTerminated() and
V8::CancelTerminateExecution() will look like for anyone that embeds
V8 and wishes to implement the same functionality.
The implementation of the timeout parameter landed in
c081809344 and is present in Node v0.11 onward.
What Users See
Now, everyone can simply specify a timeout in milliseconds:
1 2 3 4 5 6 | |
$ time node loop.js
completed!
real 0m1.080s
user 0m1.056s
sys 0m0.022s