feat: add `progress_data` to `worker_metadata` by gasperzgonec · Pull Request #202 · devrev/adaas-sdk

gasperzgonec · 2026-05-19T11:03:47Z

This PR adds extraction progress timestamps so the platform can detect looping incremental syncs and stop runs that keep re-processing the same time range.

NormalizedAttachment.created_date (optional): lets attachment extraction contribute real source timestamps, not only item extraction.
Repo.itemTimestamps: while uploading batches, tracks the oldest and newest created_date (Unix ms) seen per repo/item type.
worker_metadata.progress_data: on data and attachment extraction progress/done events, sends per–item-type { min, max } timestamp bounds to the callback.

Data extraction already worked via required NormalizedItem.created_date. Attachments need connectors to populate the new optional field (see Migration note).

Connected Issues

#ISS-252036

Checklist

Tests added/updated and ran with npm run test OR no tests needed.
Ran backwards compatibility tests with npm run test:backwards-compatibility.
Code formatted and checked with npm run lint.
Tested airdrop-template linked to this PR.
Documentation updated and provided a link to PR / new docs OR no docs needed.

Migration note

If your connector normalizes attachments (custom normalize on an attachment repo, or a custom attachment processor that builds NormalizedAttachment objects), set created_date on each normalized attachment using the source system’s creation time (RFC3339 string, same as NormalizedItem.created_date).

// In your attachment normalize function (or equivalent)
const normalizedAttachment: NormalizedAttachment = {
  id: attachment.id,
  url: attachment.downloadUrl,
  file_name: attachment.name,
  parent_id: attachment.parentId,
  created_date: attachment.createdAt, // RFC3339, e.g. "2024-03-15T10:30:00Z"
  // ...other fields
};

Required for loop detection on attachments: without created_date, that repo’s progress_data entry stays { min: 0, max: 0 } and attachment incremental sync cannot be monitored by timestamp.
No change for normal item repos: NormalizedItem already requires created_date.
Backwards compatible: created_date is optional on NormalizedAttachment; existing connectors keep working, but attachment timestamp tracking is only accurate once they populate it.

What `progress_data` does

What it is

progress_data is sent on the callback HTTP payload under worker_metadata, alongside adaas_library_version. It is not part of event_data (artifacts, errors, etc.).

It is attached only for these extraction event types:

DATA_EXTRACTION_PROGRESS / DATA_EXTRACTION_DONE
ATTACHMENT_EXTRACTION_PROGRESS / ATTACHMENT_EXTRACTION_DONE

What it returns

Shape:

worker_metadata: {
  progress_data: Record<string, { min: number; max: number }>;
  adaas_library_version: string;
}

Keys: repo itemType strings (e.g. "issues", "comments", attachment metadata types).
Values: { min, max } — Unix timestamps in milliseconds for the oldest and newest created_date seen in that repo during the current worker run (so far).
If nothing with created_date was uploaded for a type, that entry is typically { min: 0, max: 0 }.

Example:

{
  "worker_metadata": {
    "progress_data": {
      "issues": { "min": 1704067200000, "max": 1714521600000 },
      "comments": { "min": 1710000000000, "max": 1714521600000 }
    },
    "adaas_library_version": "..."
  }
}

What it is based on

On each Repo.upload(), the SDK scans the batch for objects with a non-null created_date.
It parses them with new Date(created_date).getTime() and updates that repo’s running min / max.
On the progress/done events above, WorkerAdapter copies each repo’s itemTimestamps into progress_data.

For items, created_date is already required on NormalizedItem. For attachments, it only counts after connectors set the new optional NormalizedAttachment.created_date.

Why it exists

The platform can compare min/max across incremental sync runs and progress events. If extraction keeps revisiting the same timestamp window (or bounds stop advancing), that indicates a loop so incremental sync can be stopped instead of re-uploading the same data indefinitely.

Started returning `progress_data` in `worker_metadata`.

radovanjorgic

No tests at all? :/

radovanjorgic · 2026-05-20T05:29:32Z

    const itemsToUpload = batch || this.items;

    if (itemsToUpload.length > 0) {
+      for (const item of itemsToUpload) {


Huh, I have few questions:

Do we really need to do this for each item? How is that from performance perspective? Let's say you have 100+ repos at once each with 5000 items in it?

What if timeout comes at this point?

Can we offload this work to backend? After storing the files, extractor-adapter/snap-in manager scans them and picks progress from there.

For the #3, we'll need @GasperSenk 's input, but I don't think so.

For the #1 and #2, I think this is fine.
This logic takes O(n) in time and O(1) in memory use.
There's just integer comparisons, and it should be fast enough to not block any isTimeout checks.

No, adapter doesn't know the normalization function to find the correct field name for the dates.

radovanjorgic · 2026-05-20T05:29:57Z

    if (itemsToUpload.length > 0) {
+      for (const item of itemsToUpload) {
+        if (
+          item != null &&


Simply just if (item?.created_date)?

This didn't work before I added created_date to the NormalizedAttachment.
Good catch!

radovanjorgic · 2026-05-20T05:30:15Z

+          'created_date' in item &&
+          item.created_date != null
+        ) {
+          const created_date = new Date(item['created_date']).getTime();


Don't use snake case for variable names please.

radovanjorgic · 2026-05-20T05:31:21Z

+    min: 0,
+    max: 0,


What do min and max mean here? Maybe we should use oldest and newest instead?

These are min and max timestamps. Due to them being numbers, we decided to go with the min and max.
Discussed in the ISS comments.

Might make sense to use newest and oldest, we do that in the backend, but I see why you would use min and max when you are dealing with longs.

feat: added created_date to the NormalizedAttachment

d86af02

Started returning `progress_data` in `worker_metadata`.

gasperzgonec requested review from a team and radovanjorgic as code owners May 19, 2026 11:03

radovanjorgic requested changes May 20, 2026

View reviewed changes

gasperzgonec added 2 commits May 20, 2026 10:34

chore: renamed variable and improved nullability check

0d841c1

chore: added tests for timestamp reporting

785b327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `progress_data` to `worker_metadata`#202

feat: add `progress_data` to `worker_metadata`#202
gasperzgonec wants to merge 3 commits into
mainfrom
gasperz/ISS-252036

gasperzgonec commented May 19, 2026

Uh oh!

radovanjorgic left a comment

Uh oh!

radovanjorgic May 20, 2026

Uh oh!

gasperzgonec May 20, 2026

Uh oh!

gasperzgonec May 20, 2026

Uh oh!

GasperSenk May 25, 2026

Uh oh!

radovanjorgic May 20, 2026

Uh oh!

gasperzgonec May 20, 2026

Uh oh!

radovanjorgic May 20, 2026

Uh oh!

radovanjorgic May 20, 2026

Uh oh!

gasperzgonec May 20, 2026

Uh oh!

GasperSenk May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gasperzgonec commented May 19, 2026

Connected Issues

Checklist

Migration note

What progress_data does

What it is

What it returns

What it is based on

Why it exists

Uh oh!

radovanjorgic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What `progress_data` does