Skip to content

Proposal: new vm module primitives & loader API for ESM customization #62720

@joyeecheung

Description

@joyeecheung

This proposes new vm module primitives that aim to replace the existing vm.SourceTextModule and provide a high-level loader API for ESM (specifically SourceTextModule) loading customization.

Consider this a very early draft for discussion. This is mostly to investigate whether a new design can addresses the existing issues. I am not 100% it's implementable yet, especially the loader customization part, but it's better to discuss what design would help developers before we think about what is easier to implement.

There'll be a session about this new design at the collaboration summit too openjs-foundation/summit#482

Background

The vm module APIs have been behind --experimental-vm-modules for a long time. There was a tracking issue about their stabilization and accumulated several issues over the years.

The meaningful current users are concentrated in test tooling (Jest, Vitest). Recently there has been renewed momentum to look into what it takes to bring the API out of the experimental status. @legendecas and I have been adding some non-breaking changes e.g. linkRequests(), instantiate(), moduleRequests, hasTopLevelAwait(), hasAsyncGraph(), conditionally synchronous evaluate() so that it now provides capabilities required to implement something similar to how ESM is handled in the built-in loader - specifically, the linking process can be driven by those who construct the SourceTextModule instead of being driven from a link method with callbacks, and it can be conditionally synchronous as the spec allows. But it has become awkward to keep piling methods on the existing classes to do things differently without breaking the API.

There are still a few issues with the current design:

  1. The importModuleDynamically and initializeImportMeta callbacks are passed as options to the module constructors, requiring careful memory management to avoid leaks when callbacks capture over referrers (#33439, #50113, #59118), or use-after-free when remote code calls import() indirectly via a closure (#47096). We've addressed for the main context with a very intricate memory management scheme, but for new contexts it's uncertain. The current implementation works but is not very GC-efficient, and this issue may not have existed in the first place if the callbacks are managed differently.
  2. evaluate() mixes loader errors (status errors, timeout) with module evaluation errors in a single promise rejection, making it hard to handle them differently (#60242).
  3. Many users have expressed that the current API still requires too much plumbing if they only need partial customization. (#31234, #35848, #43899).

It seems better to consolidate the changes into a new API rather than continuing to pile onto the existing interface, while we can still steer the design during the experimental phase.

A draft for a new API

The new API can live in a 'vm/modules' module and exports SourceTextModule, SyntheticModule, and SourceTextModuleLoader.

Core idea

Provide a SourceTextModuleLoader abstraction that users can subclass and override high-level processes they want to customize (#43899). This loader can be used standalone, or registered in a given context:

  • dynamicImport(request, context, parent)
  • importMeta(meta, context, parent)
  • getModules(requests, context, parent) (resolving and linking a batch of modules for a set of import requests)

In this model, SourceTextModule and SyntheticModule are primitives of the loader.

When registered for a given context, the loader is responsible for resolving ESM requests, handling dynamic import(), and initializing import.meta. Instead of one callback per module, there is one loader per context.

Note that this means once there's a loader registered, the internals have to wrap several constructs to be a publically accessible shape e.g. wrapping actual context into vm Context. So it can take a bit of refactoring and adds overhead, but should be managable.

This is separate from module.registerHooks() which installs low-level hooks into built-in resolution/loading process for all types of modules - consider the hooks run underneath super.getModules()/super.dynamicImport() etc. as shown below, so they operate in a different layer. SourceTextModuleLoader is a higher level customization, and as the name implies, is specifically for handling SourceTextModules - SyntheticModule does not have import() or import.meta or load other modules, so they are not applicable to the customization.

1. Full customization

import {
  SourceTextModule,
  SyntheticModule,
  SourceTextModuleLoader,
} from 'vm/modules';

// Names starting with `helper` are only for this example,
// they are not part of the API.
class ExampleModuleLoader extends SourceTextModuleLoader {
  #sourceStore = new Map();
  #moduleCache = new Map();

  // Invoked by Node.js to perform dynamic import(), overrides default.
  dynamicImport(request, context, parent) {
    const mod = this.helperGetSingleModule(request, context, parent);
    mod.evaluate();
    // topLevelCapability does not need to fulfill here.
    // That promise will just be forwarded to the code actually
    // awaiting the dynamic import().
    return mod.namespace();
  }

  // Invoked by Node.js to initialize import.meta, overrides the default.
  importMeta(meta, context, parent) {
    // Objects attached to import.meta should typically be created
    // in the target context via vm.runInContext() or similar.
    meta.identifier = parent.identifier;
  }

  // Invoked by Node.js to resolve and compile modules for a set of
  // import requests. The returned array must correspond 1:1 to the
  // requests array.
  // TODO: it's unclear whether getModules or getModule would work better
  // for the internal way of resolving modules - supposedly batching helps
  // with performance, but it also leaks internal ordering choices.
  getModules(requests, context, parent) {
    const result = [];
    for (const { specifier } of requests) {
      // Consult the custom cache first.
      const cached = this.#moduleCache.get(specifier);
      if (cached) { result.push(cached); continue; }

      // Use the custom source store.
      const source = this.#sourceStore.get(specifier);
      if (!source) {
        throw new Error('module not found');
      }

      const identifier = specifier;
      const mod = new SourceTextModule(source, { identifier, context });

      // Recursively get dependencies (DFS; users can implement BFS too).
      const modules = this.getModules(mod.requests, context, mod);
      mod.link(modules);

      this.#moduleCache.set(specifier, mod);
      result.push(mod);
    }
    return result;
  }

  // --- Example helpers (not part of the API) ---

  helperGetSingleModule(request, context, parent) {
    return this.getModules([request], context, parent)[0];
  }

  helperAddSource(identifier, source) {
    this.#sourceStore.set(identifier, source);
  }

  helperAddBuiltin(identifier, mapping) {
    const keys = Object.keys(mapping);
    const mod = new SyntheticModule(keys, function() {
      for (const key of keys) {
        this.setExport(key, mapping[key]);
      }
    });
    this.#moduleCache.set(identifier, mod);
  }
}

Preparing the custom loader:

const loader = new ExampleModuleLoader();

loader.helperAddSource('async-root', `
  export { foo } from 'foo';
  export let bar;
  bar = await import('builtin:bar') + import.meta.identifier;
`);

loader.helperAddSource('sync-root', `
  export { foo } from 'foo';
  import { default as bar } from 'builtin:bar';
  export const baz = bar + import.meta.identifier;
`);

loader.helperAddSource('foo', `
  export const foo = globalThis.foo;
`);

loader.helperAddBuiltin('builtin:bar', { default: 'bar' });

1.a. Async module graph in a new context

import { createContext } from 'node:vm';
const context = createContext({ foo: 'foo' });

const [mod] = loader.getModules([{ specifier: 'async-root' }], context);

// returns undefined or throws for any loader errors.
mod.evaluate();

mod.hasTopLevelAwait();  // true
mod.hasAsyncGraph();  // true

mod.namespace();  // { foo: .., bar: ... } - not yet populated

// Module completion is tracked separately via topLevelCapability,
// so it's what you await to finish evaluation of the module.
await mod.topLevelCapability;
mod.namespace();  // { foo: 'foo', bar: 'barasync-root' }

1.b. Synchronous require(esm) pattern

See #59656 for this use case.

const [mod] = loader.getModules([{ specifier: 'sync-root' }], context);

mod.evaluate();  // returns undefined or throws for any loader errors.

mod.hasTopLevelAwait();  // false
mod.hasAsyncGraph();  // false
mod.topLevelCapability;  // Always fulfilled for non-TLA modules.
mod.error;  // If the module threw during evaluation, the error is here.
mod.namespace();  // { foo: 'foo', baz: 'barsync-root' }

2. Partial customization and registering globally

A loader can delegate to the base class for specifiers it doesn't need to customize, and can be registered for a context so that all ESM running in that context (including vm.Script dynamic import()s) uses it.

import { SourceTextModuleLoader, SyntheticModule } from 'vm/modules';
import { createContext, runInContext } from 'node:vm';

class CustomLoader extends SourceTextModuleLoader {
  getModules(requests, context, parent) {
    return requests.map((req) => {
      if (req.attributes?.type === 'foo') {
        return new SyntheticModule(['foo'], function() {
          this.setExport('foo', 'foo');
        }, { context });
      }
      return super.getModules([req], context, parent)[0];
    });
  }
  // Other methods are left to the default.
}

const loader = new CustomLoader();

// Register the loader for a vm context.
// Only one loader can be registered per context.
loader.register(context);

console.log(
  await runInContext(
    `import('foo', { with: { type: 'foo' } })`,
    context
  )
);  // { foo: 'foo' }

// Unregister when done.
loader.deregister(context);

// Register for the main context.
loader.register();
console.log(await import('foo', { with: { type: 'foo' } }));

Design details

Error model: loader errors vs. module errors

See #60242 for background.

In the current API, evaluate() wraps all errors (status errors, timeout, and actual module exceptions) into a single promise rejection. This makes it difficult to distinguish errors from the loader (implemented by e.g. a framework) from module-level errors (thrown from e.g. user-provided code being tested by a framework), and makes re-implementation of require(esm) awkward.

In the new API, evaluate() separates the two:

  • Synchronous throws for loader/infrastructure errors: wrong module status, timeout (ERR_SCRIPT_EXECUTION_TIMEOUT), signal interruption (ERR_SCRIPT_EXECUTION_INTERRUPTED).
  • Use module.topLevelCapability (maps to CyclicModuleRecord.[[TopLevelCapability]] in the spec) to access evaluation resolution or rejections
  • Use module.error to access CyclicModuleRecord.[[EvaluationError]] once it's evaluated.

Source phase imports

Module requests carry phase information (request.phase). For source-phase static imports (import source x from 'y'), the module returned from getModules() must have a source object set via mod.setSourceObject(obj). For dynamic source-phase imports (import.source('y')), dynamicImport() receives request.phase === 'source' and should return the source object (e.g., a WebAssembly.Module) instead of a namespace.

dynamicImport(request, context, parent) {
  const mod = this.helperGetSingleModule(request, context, parent);
  if (request.phase === 'source') {
    return mod.sourceObject;
  }
  mod.evaluate();
  return mod.namespace();
}

TODO: figure out what to do for deferred imports, but it'll be a phase as well.

One loader per context

When a loader is registered for a context, it handles dynamic import() for all code evaluated in that context, including vm.Script. With the new design, omitting importModuleDynamically from a vm.Script should mean "delegate to the context's registered loader if any, otherwise throw." This makes loader.register(context) a single point of configuration for all ESM loading in a given context.

The new dynamicImport(request, context, parent) callback receives the context as a parameter, so it can construct modules in the right context. This addresses the issue where importModuleDynamically for vm.Script didn't have context information (#35714).

loader.register(context) throws if a loader is already registered for that context. For users who want hooks-style middleware composition, module.registerHooks(hooks, context) is still a a separate, orthogonal mechanism that allow nesting, and it runs in a lower layer underneath the loader's getModules()/dynamicImport().

Different level of customizations

See #31234, #35848, #43899 #61127.

The base SourceTextModuleLoader needs a meaningful default getModules() implementation for partial customization to work. The plan is to expose Node's built-in resolution and loading as composable functions (someting like module.resolve(specifier, parentURL, context) and module.load(url, context)) that the base getModules() use (see #55756). This way:

  • Partial customizers call super.getModules() for requests they don't handle.
  • Advanced users can call module.resolve / module.load directly to implement their getModules() override, these in turn runs the hooks registered by module.registerHooks() and/or the built-in resolution/loading logic, so the hooks and the loader are composable in a flexible way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    loadersIssues and PRs related to ES module loaders

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions