Add parallelize class method to maintenance_tasks::task #1337

Devake · 2025-12-03T18:06:51Z

Add `parallelize` class method to `MaintenanceTasks::Task`

Summary

This PR adds a parallelize class method to MaintenanceTasks::Task that enables parallel processing of batch items using threads. This provides a cleaner, more Rails-like API compared to including a concern.

Usage

class Maintenance::UpdateUsersTask < MaintenanceTasks::Task
  parallelize

  def collection
    User.where(status: 'pending').in_batches(of: 10)
  end

  def process_item(user)
    # Called in parallel (10 concurrent threads per batch)
    user.update!(status: 'processed')
  end
end

Changes

Added parallelized class attribute to track parallel processing mode
Added parallelize class method to enable parallel processing
Added parallelized? class and instance methods
Added process_item instance method placeholder
Modified process to route to parallel execution when enabled
Added ParallelExecutor for thread-safe parallel item processing
Added comprehensive test coverage

Notes

Cursor granularity: The cursor tracks batches, not individual items. If interrupted mid-batch, items will be reprocessed on resume. Ensure process_item is idempotent.
Thread safety: process_item must be thread-safe. Avoid shared mutable state.
Error handling: If any thread raises an exception, the entire batch fails and the first exception is propagated.
Progress tracking: Progress is tracked per batch, not per item.

adrianna-chang-shopify

I think this feature could make sense, I do think we'll need to make it a bit safer to use so that we don't risk creating a ton of threads or exhausting AR's connection pool. Left some thoughts inline!

adrianna-chang-shopify · 2025-12-03T19:35:47Z

app/models/maintenance_tasks/parallel_executor.rb

+        exceptions = []
+        exception_mutex = Mutex.new
+
+        threads = items.map do |item|


Batches can be of arbitrary size, e.g. 1000+ items. There are risks of performance degradation / system instability in generating an unbounded number of threads. Should we implement some sort of thread pool with a configurable size?

(We may also want to coordinate with Rails' connection pool size, which defaults to 5 connections)

Another idea is to make the thread count part of the API, ie. parallelize(threads: 5). I don't think we should tie thread count to the batch size though.

+1, I don't want to allow people spawning unbounded number of threads if they just follow the conventions which for in_batches is 1000 elements per batch

adrianna-chang-shopify · 2025-12-03T19:42:09Z

app/models/maintenance_tasks/task.rb

    #   implement an override for this method.
-    def process(_item)
-      raise NoMethodError, "#{self.class.name} must implement `process`."
+    def process_item(_item)


I'm not sure exactly what to name this, but I think we need an API that's more distinct from #process that indicates that this is for parallel processing in a batch. Maybe #process_for_batch?

adrianna-chang-shopify · 2025-12-03T20:00:13Z

app/models/maintenance_tasks/task.rb

+      items = batch.respond_to?(:to_a) ? batch.to_a : Array(batch)
+
+      # Execute items in parallel, storing errored item for context
+      ParallelExecutor.execute(items) do |item|


I feel like we could return the exceptions array ([{ item: <item> , error: <error> }]) directly from .execute instead of raising the error. This would simplify things a lot, ie.

class ParallelExecutor class << self def execute(items, &block) ... threads = items.map do |item| Thread.new do ActiveRecord::Base.connection_pool.with_connection do block.call(item) rescue => error exception_mutex.synchronize do exceptions << { item: item, error: error } end end end end threads.each(&:join) exceptions end ...

And then here:

exceptions = ParallelExecutor.execute(items) do |item| process_item(item) end if exceptions.any? @errored_element = exceptions.first[:item] raise exceptions.first.error end

etiennebarrie · 2025-12-04T10:29:25Z

I think it's a bad idea, we already have a unit of work, it's the job, and we don't have to handle it, the queue does, and it's not something we should get into IMO.

add parallelize class method to maintenance_tasks::task

5715a10

Devake self-assigned this Dec 3, 2025

adrianna-chang-shopify reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add parallelize class method to maintenance_tasks::task #1337

Add parallelize class method to maintenance_tasks::task #1337

Uh oh!

Devake commented Dec 3, 2025

Uh oh!

adrianna-chang-shopify left a comment

Uh oh!

adrianna-chang-shopify Dec 3, 2025

Uh oh!

adrianna-chang-shopify Dec 3, 2025

Uh oh!

adrianna-chang-shopify Dec 3, 2025

Uh oh!

nvasilevski Dec 3, 2025

Uh oh!

adrianna-chang-shopify Dec 3, 2025

Uh oh!

adrianna-chang-shopify Dec 3, 2025

Uh oh!

etiennebarrie commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add parallelize class method to maintenance_tasks::task #1337

Are you sure you want to change the base?

Add parallelize class method to maintenance_tasks::task #1337

Uh oh!

Conversation

Devake commented Dec 3, 2025