Skip to content

Conversation

@janbuchar
Copy link
Contributor

@janbuchar janbuchar commented Dec 12, 2025

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Dec 12, 2025
@janbuchar
Copy link
Contributor Author

@mhamas @metalwarrior665 this proved to be way more complicated than previously thought. I'll try to simplify it on Monday.

@janbuchar
Copy link
Contributor Author

An alternative solution to this could be just screaming bloody murder in case that there is a nonzero price for apify-default-dataset-item and another event type is present in the configuration. This would force the developer to start using a custom event to charge for dataset items. It is possible to implement it on the SDK level as well as on platform.

@janbuchar
Copy link
Contributor Author

Another note, even if we manage to fix this on the SDK level, it'll be difficult to make all users update. Granted, if they use PPE, they should be on a recent version already so an update should be easy enough, but still.

@B4nan B4nan added t-tooling Issues with this label are in the ownership of the tooling team. and removed t-tooling Issues with this label are in the ownership of the tooling team. labels Dec 15, 2025
Copy link
Member

@metalwarrior665 metalwarrior665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the need for this complexity as we have to handle Dataset.pushData that only connects to SDK via the client

Comment on lines 1626 to 1629
const chargeResult = await actor.chargingManager.charge({
eventName,
count: countToCharge,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be charging platform via API here. So maybe a new option to only increment locally or a new protected method?

@janbuchar
Copy link
Contributor Author

We need to prevent users from (internally) charging the synthetic event manually (API will not accept it, so that part is already done)

@github-actions github-actions bot added this to the 130th sprint - Tooling team milestone Dec 19, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Dec 19, 2025
@janbuchar
Copy link
Contributor Author

This is about as readable as I can make it. We should add more tests and documentation (after holidays).

@janbuchar janbuchar marked this pull request as ready for review December 19, 2025 21:23
@janbuchar janbuchar requested review from B4nan and barjin December 19, 2025 21:23
return await this.chargingManager.charge({
const dataset = await this.openDataset();

if (this.usesPushDataInterception(dataset)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why we need to handle this in two separate code paths

return context.chargeResult;
}

private async pushDataViaDirectCharging(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name could be improved here

chargeResult?: ChargeResult;
}

// Shared context used by Actor.pushData() and PatchedDatasetClient.pushItems().
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better explain the purpose of the context (propagate eventName to pushItems calls, gather results of multiple pushItems calls that were triggered by a single pushData call)

Copy link

@mhamas mhamas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the changes in person. In general they look good to me, but since I don't understand the specifics of SDK, I cannot give very solid review. Leaving a couple of extra comments.

perEventPrices: Record<string, number>;
}

export function mergeChargeResults(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this is good candidate for unit tests

Comment on lines +2366 to +2378
const { limitedItems, eventsToCharge } =
this.chargingManager.calculatePushDataLimits({
items,
eventName: explicitEventName,
isDefaultDataset,
});

if (limitedItems.length > 0) {
// Preserve original `Dataset.pushData()` call shape for single items.
await dataset.pushData(
Array.isArray(items) ? limitedItems : limitedItems[0],
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Looks duplicated to the intercepted client path. Can this be taken out and unified?

* Helper to calculate how many items can be pushed within charging limits.
* Returns the limited items and count to charge.
*/
calculatePushDataLimits<T>({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I'd add unit tests.

* The objects must be serializable to JSON and the JSON representation of each object must be smaller than 9MB.
* @ignore
*/
async pushData(item: Data | Data[]): Promise<void>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not relevant anymore, the signature on L1100 reflects the return type better

Copy link
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed the design with @janbuchar offline, IMO we should still drop the local total and fetch it somehow from the API. Regarding the implementation, I found two talking points:

Comment on lines +2396 to +2401
if (explicitEventName !== undefined) {
return results[explicitEventName];
}
}

return undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actor.pushData() will charge the synthetic event, but will return undefined here.

See e.g. the following code, run with ACTOR_MAX_TOTAL_CHARGE_USD=2:

import { Actor } from 'apify';

process.env.ACTOR_TEST_PAY_PER_EVENT = '1';

await Actor.init();
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined

await Actor.exit();

This will run all the calls, but will store only two results into the dataset. The developer has no way of returning early, as they won't get the charging manager response here.

await Promise.all(
Object.entries(eventsToCharge).map(
async ([eventName, count]) => {
results[eventName] = await this.chargingManager.charge({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculatePushDataLimits will contain the apify-default-dataset-item key (see here). Here we'll manually charge the synthetic event. Is this expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track the number of apify-default-dataset-item events locally

6 participants