Skip to content

Should marshmallow v4 set required names for hooks? #1841

@sirosen

Description

@sirosen

I've been thinking about #600 . One of the primary use-cases for work like #600 is to be able to subclass a schema and append or prepend transformations to post_load, pre_load, post_dump, and pre_dump.

So, here is my proposal for how to handle this:

  • marshmallow.decorators.set_hook takes a string argument, hook_name
  • if the decorated method's name does not match hook_name, that is an error

Then normal inheritance rules come into play. This means that you can call a superclass hook with super() and pass args.

Some additional ideas/consequences based on playing around with this:

  • Hook names are decorator names with a suffixed _h (for "hook") or _hook so that post_load (the decorator) and post_load_h cannot conflict within a class. This is ugly, but avoids @post_load changing in meaning within a class body, which could be confusing.
  • Hook names include many so that e.g. post_load_h and post_load_many_h may be written on the same schema class.
  • Field validation hooks are named by field, as in validates_field_{fieldname}_h
  • If the base class hook sets pass_original=False, a child hook can set pass_original=True and then discard original data in the super call, but if pass_original=True in the base class, child class hooks must also set it (seems like a fine constraint to me?)

Usage demo:

class A(ma.Schema):
    foo = ma.fields.Str()

    @ma.post_load
    def post_load_h(self, data, **kwargs):
        return {"bar": 1, **data}

class B(A):
    @ma.post_load
    def post_load_h(self, data, **kwargs):
        d = super().post_load_h(data, **kwargs)
        del d["foo"]
        return d

class C(B):
    @ma.post_load
    def post_load_h(self, data, **kwargs):
        data["save"] = dict(data)
        return super().post_load_h(data, **kwargs)

print(A().load({"foo": "baz"}))  # {'bar': 1, 'foo': 'baz'}
print(B().load({"foo": "baz"}))  # {'bar': 1}
print(C().load({"foo": "baz"}))  # {'bar': 1, 'save': {'foo': 'baz'}}

Initial implementation is primarily:

--- a/src/marshmallow/decorators.py
+++ b/src/marshmallow/decorators.py
@@ -195,3 +218,7 @@ def post_load(
 def set_hook(
-    fn: Optional[Callable[..., Any]], key: Union[Tuple[str, bool], str], **kwargs: Any
+    fn: Optional[Callable[..., Any]],
+    *,
+    hook_name: str,
+    key: Union[Tuple[str, bool], str],
+    **kwargs: Any
 ) -> Callable[..., Any]:
@@ -209,3 +236,10 @@ def set_hook(
     if fn is None:
-        return functools.partial(set_hook, key=key, **kwargs)
+        return functools.partial(set_hook, hook_name=hook_name, key=key, **kwargs)
+
+    hook_name = f"{hook_name}_h"
+    if fn.__name__ != hook_name:
+        raise ValueError(
+            f"{hook_name} hook had incorrect name: {fn.__name__}. "
+            f"Rename it to {hook_name} to resolve."
+        )

I'd like to know if this is a good path to pursue before trying to fixup the whole testsuite.
The ValueError could be a warning in v3, and a lot of the desired value would be delivered immediately because it would push people to make their hook names consistent.

Another question is whether or not Schema itself should implement (stub/no-op) versions of these methods. That might be important for mixin schema use-cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions