Implementing Timeouts in Node.js 'vm' Module

Comments

Trusted, but Unknown

When I first started playing with the Node vm module, I was using it to execute trusted, but unknown script. Using the vm module as a sandbox was a great way to get a prototype up and running quickly, but presented a few problems if this type of prototype were to ever run in a production environment. The goal of the prototype was to give other developers the ability to execute script they wrote inside of my app. Everyone makes typos and mistakes, and if a fellow developer asked the sandbox to execute something which entered an endless loop, it would loop the node process and defeat the purpose of a multi-user non-blocking app.

There are a few possible solutions to this problem. It’s possible to use separate processes, perhaps with the cluster module to kill processes if they loop, but it is not always desirable to use additional pids. To get even more complicated, an entire JS engine such as js.js could be embedded, allowing the outer code to monitor the interpreter and stop execution if necessary. Hey, it’s possible, but I would not want to increase the maintenance, security, and performance complexity of my app in that manner unless there was no other option. I wanted to keep loop detection in-process, which meant interfacing C++ loop detection with Node.

My first attempt at achieving this was writing a native module, node-scriptdog, at JSConf EU 2012. The module approach failed. In order to plant the timeout into Node, I first had to descend a level into the innards of V8 to teach it how to resume execution after terminating. The reason why V8 needed to be modified is subtle — Node itself is written in script (for the most part) and V8 can not distinguish between “Node” script and “user” script.

Stopping execution

The way a V8 embedder stops exection of an endless loop is to use the C++ API V8::TerminateExecution(). The thread running the endless loop is obviously busy, so this must be called from a different thread. Typically, a timer is used to limit the time a single call into the engine may take. After the function has been called, the engine throws an uncatchable exception that propagates up to the embedder’s outer v8::TryCatch, which then lets the app do something else.

This works great, except that engine may only again begin executing script after the exception is fully propagated and the entire Javascript stack is unwound. Since most of Node itself is script frames, this results in unexpected behavior as the Node frames are unwound as well:

1
2
3
4
5
6
try {
    vm.runInNewContext('while(true) {}', '', 100);
} catch(e) {
    // C++ TryCatch did not throw, but an exception
    // was caught here. Yet e === null??. Not good.
}

I spoke with Vyacheslav Egorov while at JSConf and he indicated this was how V8:TerminateExecution() was designed to work, but that it should be possible to modify V8 to support resuming. I couldn’t see anyone on the V8 team prioritizing this since it isn’t required for their use of the engine, so I set out to do it myself.

New API

V8 needed to be modified to allow a C++ v8::TryCatch to not only detect that a termination exception was thrown, but instruct the engine that things really are OK and execution can resume again.

As part of the changes made to V8, v8::TryCatch received a new member HasTerminated(), which obviously allows an embedder to first detect that a termination exception has been thrown due to another thread calling V8::TerminateExecution().

Once such an exception has been caught, V8::CancelTerminateExecution() allows the embedder to tell the engine that script execution should continue after the call completes. This function performs the magic needed to reset the engine to a sane state and allows the scripted Node frames still on the stack to continue executing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void Isolate::CancelTerminateExecution() {
  if (try_catch_handler()) {
    try_catch_handler()->has_terminated_ = false;
  }
  if (has_pending_exception() &&
      pending_exception() == heap_.termination_exception()) {
    thread_local_top()->external_caught_exception_ = false;
    clear_pending_exception();
  }
  if (has_scheduled_exception() &&
      scheduled_exception() == heap_.termination_exception()) {
    thread_local_top()->external_caught_exception_ = false;
    clear_scheduled_exception();
  }
}

Watchdog

The final piece of the puzzle is wrapped up in the node::Watchdog class. Node is built around libuv, which provides the requisite event loop, timer, and thread primitives to implement the timeout. This class takes care of spawning a new thread which runs a separate event loop that waits for either the timer to expire or for async notification that the Watchdog was destroyed because execution returned normally.

The uv_timer_t handler couldn’t be simpler:

1
2
3
void Watchdog::Timer(uv_timer_t* timer, int status) {
  V8::TerminateExecution();
}

As for the thread function, Ben Noordhuis pointed me towards using uv_run() with UV_RUN_ONCE, which lets the loop run only once and then exit after processing either the timer or async notification:

1
2
3
4
5
6
7
8
9
10
11
void Watchdog::Run(void* arg) {
  // UV_RUN_ONCE so async_ or timer_ wakeup exits uv_run() call.
  uv_run(wd->loop_, UV_RUN_ONCE);

  // Loop ref count reaches zero when both handles are closed.
  uv_close(reinterpret_cast<uv_handle_t*>(&wd->async_), NULL);
  uv_close(reinterpret_cast<uv_handle_t*>(&wd->timer_), NULL);

  // UV_RUN_DEFAULT so that libuv has a chance to clean up.
  uv_run(wd->loop_, UV_RUN_DEFAULT);
}

(I wouldn’t have gathered on my own that a UV_RUN_DEFAULT was needed in order to let libuv clean up after itself once the loop ref count reached zero. This pattern is useful for anyone who wishes to implement anything similar with libuv in the future.)

Resuming Execution

The V8 changes landed in r14378, and were released as part of v3.18.3. Node updated its copy of V8 in 2f75785c01. Modifying node_script.cc#L442 to use the new API was then possible:

1
2
3
4
5
6
7
8
9
10
11
if (timeout) {
  Watchdog wd(timeout);
  result = script->Run();
} else {
  result = script->Run();
}
if (try_catch.HasCaught() && try_catch.HasTerminated()) {
  V8::CancelTerminateExecution(args.GetIsolate());
  return ThrowException(Exception::Error(
        String::New("Script execution timed out.")));
}

This is pretty much what any use of TryCatch.HasTerminated() and V8::CancelTerminateExecution() will look like for anyone that embeds V8 and wishes to implement the same functionality.

The implementation of the timeout parameter landed in c081809344 and is present in Node v0.11 onward.

What Users See

Now, everyone can simply specify a timeout in milliseconds:

loop.js
1
2
3
4
5
6
var vm = require('vm');
try {
    vm.runInThisContext('while(true) {}', '', 1000);
} catch(e) {
}
console.log('completed!');
$ time node loop.js 
completed!

real    0m1.080s
user    0m1.056s
sys     0m0.022s

Comments