apache · zhengruifeng · Feb 10, 2026 · Feb 11, 2026
diff --git a/docs/_includes/nav-left-wrapper-ml.html b/docs/_includes/nav-left-wrapper-ml.html
@@ -4,5 +4,6 @@ <h3><a href="ml-guide.html">MLlib: Main Guide</a></h3>
         {% include nav-left.html nav=include.nav-ml %}
         <h3><a href="mllib-guide.html">MLlib: RDD-based API Guide</a></h3>
         {% include nav-left.html nav=include.nav-mllib %}
+        <h3><a href="ml-security.html">ML Model Security</a></h3>
     </div>
 </div>
diff --git a/docs/ml-security.md b/docs/ml-security.md
@@ -0,0 +1,74 @@
+---
+layout: global
+title: "ML Model Security"
+displayTitle: "Spark ML Model Security"
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# Overview
+
+In Apache Spark, loading a machine learning (ML) model is fundamentally equivalent to loading and executing code.
+Spark ML models often contain serialized objects, transformation logic, and execution graphs that are evaluated by the Spark runtime
+during model loading and inference.
+The principle is not unique to Spark, it applies equally to scikit-learn, PyTorch, TensorFlow, and other modern ML ecosystems.
+As a result, loading a model from an untrusted source introduces the same security risks as executing untrusted software.
+
+# Why Loading an ML Model Is Equivalent to Loading Code?
+
+Spark ML frameworks serialize not only data (such as weights and parameters) but also executable structures and behaviors.
+Because of this, model loading is not merely data parsing. It involves interpreting and executing instructions, which means a malicious model can:
+
+* Execute arbitrary commands
+* Access or exfiltrate data
+* Modify system state
+* Install backdoors or malware
+
+In practice, loading a model from an untrusted source is equivalent to running a program downloaded from the internet.
+
+# Security Implications
+
+Because Spark ML models can embed executable logic, loading untrusted models can lead to:
+
+* Remote code execution (RCE)
+* Data exfiltration from Spark jobs
+* Compromise of cluster nodes
+* Privilege escalation within Spark environments
+* Supply-chain attacks through model distribution
+
+These risks are amplified in distributed environments, where a malicious model may execute across multiple cluster nodes.
+
+# Responsibility of End Users
+
+Because loading ML models is equivalent to loading executable code, the responsibility for security ultimately lies with the end user or deploying organization.
+End users are responsible for ensuring that ML models are subject to the same security assessment, validation, and operational controls as any third-party software.
+This includes:
+
+* Verifying the source and authenticity of the model
+* Ensuring integrity and provenance
+* Applying organizational security policies
+* Performing risk assessments before deployment
+
+Frameworks and libraries can provide safeguards, but they cannot guarantee security when loading arbitrary third-party models.
+
+# Best Practices
+
+* Load models only from trusted and verified sources
+* Validate cryptographic hashes or digital signatures
+* Execute models in isolated environments
+* Restrict filesystem, network, and credential access
+* Keep Spark, ML libraries, and dependencies fully patched
+