-
Notifications
You must be signed in to change notification settings - Fork 61
fix: Keep track of synthetic apify-default-dataset-item events #528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@mhamas @metalwarrior665 this proved to be way more complicated than previously thought. I'll try to simplify it on Monday. |
|
An alternative solution to this could be just screaming bloody murder in case that there is a nonzero price for |
|
Another note, even if we manage to fix this on the SDK level, it'll be difficult to make all users update. Granted, if they use PPE, they should be on a recent version already so an update should be easy enough, but still. |
metalwarrior665
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the need for this complexity as we have to handle Dataset.pushData that only connects to SDK via the client
packages/apify/src/actor.ts
Outdated
| const chargeResult = await actor.chargingManager.charge({ | ||
| eventName, | ||
| count: countToCharge, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be charging platform via API here. So maybe a new option to only increment locally or a new protected method?
|
We need to prevent users from (internally) charging the synthetic event manually (API will not accept it, so that part is already done) |
|
This is about as readable as I can make it. We should add more tests and documentation (after holidays). |
| return await this.chargingManager.charge({ | ||
| const dataset = await this.openDataset(); | ||
|
|
||
| if (this.usesPushDataInterception(dataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why we need to handle this in two separate code paths
| return context.chargeResult; | ||
| } | ||
|
|
||
| private async pushDataViaDirectCharging( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name could be improved here
| chargeResult?: ChargeResult; | ||
| } | ||
|
|
||
| // Shared context used by Actor.pushData() and PatchedDatasetClient.pushItems(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better explain the purpose of the context (propagate eventName to pushItems calls, gather results of multiple pushItems calls that were triggered by a single pushData call)
mhamas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed the changes in person. In general they look good to me, but since I don't understand the specifics of SDK, I cannot give very solid review. Leaving a couple of extra comments.
| perEventPrices: Record<string, number>; | ||
| } | ||
|
|
||
| export function mergeChargeResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: this is good candidate for unit tests
| const { limitedItems, eventsToCharge } = | ||
| this.chargingManager.calculatePushDataLimits({ | ||
| items, | ||
| eventName: explicitEventName, | ||
| isDefaultDataset, | ||
| }); | ||
|
|
||
| if (limitedItems.length > 0) { | ||
| // Preserve original `Dataset.pushData()` call shape for single items. | ||
| await dataset.pushData( | ||
| Array.isArray(items) ? limitedItems : limitedItems[0], | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Looks duplicated to the intercepted client path. Can this be taken out and unified?
| * Helper to calculate how many items can be pushed within charging limits. | ||
| * Returns the limited items and count to charge. | ||
| */ | ||
| calculatePushDataLimits<T>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I'd add unit tests.
| * The objects must be serializable to JSON and the JSON representation of each object must be smaller than 9MB. | ||
| * @ignore | ||
| */ | ||
| async pushData(item: Data | Data[]): Promise<void>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not relevant anymore, the signature on L1100 reflects the return type better
barjin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed the design with @janbuchar offline, IMO we should still drop the local total and fetch it somehow from the API. Regarding the implementation, I found two talking points:
| if (explicitEventName !== undefined) { | ||
| return results[explicitEventName]; | ||
| } | ||
| } | ||
|
|
||
| return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actor.pushData() will charge the synthetic event, but will return undefined here.
See e.g. the following code, run with ACTOR_MAX_TOTAL_CHARGE_USD=2:
import { Actor } from 'apify';
process.env.ACTOR_TEST_PAY_PER_EVENT = '1';
await Actor.init();
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
console.log(await Actor.pushData([{ 'a': 1 }, { 'b': 2 }, { 'c': 3 }])); // prints undefined
await Actor.exit();This will run all the calls, but will store only two results into the dataset. The developer has no way of returning early, as they won't get the charging manager response here.
| await Promise.all( | ||
| Object.entries(eventsToCharge).map( | ||
| async ([eventName, count]) => { | ||
| results[eventName] = await this.chargingManager.charge({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculatePushDataLimits will contain the apify-default-dataset-item key (see here). Here we'll manually charge the synthetic event. Is this expected?
apify-default-dataset-itemevents locally #523