Add Meta Conversions API data source #34
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement a PySpark Custom Data Source to write event data to the Meta Conversions API (CAPI). This enables users to send server-side events directly to Meta for ad optimization and measurement.
Architecture
The implementation uses the Python Data Source API, specifically implementing a write-only data source.
Components
meta_capi) and creating the writer.Configuration Options
The data source will support the following options via
.option():access_token(Required): Meta System User Access Token.pixel_id(Required): The Meta Pixel ID (Dataset ID).api_version(Optional): Graph API version (default:v19.0).batch_size(Optional): Number of events per API request (default:1000, max is 1000).Schema & Data Mapping
The data source expects the input DataFrame to contain columns that map to the Meta CAPI Event structure.
To improve usability, the writer will support two modes:
user_datastruct column,custom_datastruct column).user_datastruct is missing, the writer looks for flat columns with specific prefixes or names and constructs the nested structure.email->user_data.em(will apply SHA256 if not already hashed - nice to have)phone->user_data.phclient_ip_address->user_data.client_ip_addressevent_name->event_nameevent_time->event_time(converts timestamp to Unix integer)value->custom_data.valuecurrency->custom_data.currencyDecision: For the initial implementation, we will prioritize Structured Mode correctness but add basic Flat Mode mapping for common fields (
email,event_name,event_time,value,currency) to simplify the user experience.API Details
https://graph.facebook.com/{api_version}/{pixel_id}/eventsPOSTContent-Type: application/json{ "access_token": "...", "data": [ { "event_name": "Purchase", "event_time": 1698765432, "action_source": "website", "user_data": { "em": ["7b..."], "ph": ["..."] }, "custom_data": { "currency": "USD", "value": 100.0 } } ] }access_tokencan be in the query param or body. We will use query param or body as recommended.