AliceO2Group · nicolaspoffley · Mar 10, 2025 · Mar 10, 2025 · Mar 10, 2025 · Mar 10, 2025
diff --git a/docs/faq/README.md b/docs/faq/README.md
@@ -34,3 +34,7 @@
 ### What is the job error ERROR_EW?
 
 Hyperloop trains have a so-called express train feature. This feature is based on the fact that the last few percent of jobs usually take the longest time (not in execution time but to be scheduled on a site) and therefore trains can take the double total time just to process the last few percent. Therefore, up to 2% of the jobs are removed from the queue, in order for your train to finish. Those are marked with ERROR_EW in the job overview. In case you want the maximal statistics and you don't mind that your train will be slow, you can ask for a "slow train" submission to the operators.
+
+### Why is it that my train test has a CPU warning but my wagon test was fine?
+
+This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset. 
diff --git a/docs/hyperloop/operatordocumentation.md b/docs/hyperloop/operatordocumentation.md
@@ -57,6 +57,18 @@ There are a number of settings that you can decide on when composing a train:
 
 * The train will be automatically tested, and its progress can be followed in the _Train Runs_ table, or in the [**Train Runs**](#train-runs) page by clicking on the TRAIN_ID link.
 
+
+### <a name="wagonscheduling"></a>Scheduling of derived data wagons
+
+* Wagons with derived data can be scheduled by operators to be automatically composed at the next composition schedule.
+* This is supported for standard and linked derived data wagons on any dataset with a composition schedule.
+* Multiple standard derived data wagons can be combined into one train automatically by Hyperloop, but linked derived data wagons are run separately.
+* Operators can simply choose to enable or disable the automatic *submission* and *slow train* options. The schedule is automatically determined by Hyperloop (the next scheduled slot in the dataset is used).
+
+<div align="center">
+<img src="../images/scheduledWagon.png" width="40%">
+</div>
+
 ### <a name="stagedsubmission"></a>Staged Submission
 
 * Short datasets are subsets of a big dataset

diff --git a/docs/hyperloop/userdocumentation.md b/docs/hyperloop/userdocumentation.md
@@ -7,7 +7,7 @@

 When opening a page in Hyperloop which has not been visited before, a guided tour will explain key concepts. These tours provide an interactive learning experience for Hyperloop, easily activated with a single click. They are ideal for beginners and for refreshing knowledge.

 Where appropriate, when one tour ends, the next will begin to explain the next section of Hyperloop. Tours can be exited at any time. Once closed, they will not automatically begin on future page visits. 

 <div align="center">
 <img src="../images/JoyrideWelcome.png" width="35%">
@@ -85,7 +85,7 @@
 <div align="center">
 <img src="../images/wagonShortcuts.png" width="80%">
 </div>

 ## <a name="wagon-settings"></a> Wagon Settings

 * <a name="wagonsettings"></a>In _Wagon settings_ you can modify the wagon name, work flow name, and select wagon's dependencies. The dependencies offered are wagons from the same _Analysis_ or from [_Service wagons_](#servicewagons).
@@ -93,7 +93,7 @@
 <div align="center">
 <img src="../images/wagonSettings.png" width="70%">
 </div>

 ## <a name="wagon-configuration"></a> Wagon Configuration

 * <a name="wagonconfiguration"></a>In _Configuration_ the wagon configuration corresponding to the workflow will be available in the _Base_. The configuration is divided per _Task_, hence if you need to add a new parameter, you will need add it in the following order: task, parameter and value.
@@ -120,15 +120,16 @@

 * In order to update the base and subwagon configuration with the latest version of the workflow, click on the button `↻ sync` in _Configuration_. By synchronizing the configuration, the parameters which no longer belong to the workflow will be removed, and the values of the wagon's _Base_ will be updated as well if they have not been modified by the user.

 ## <a name="wagon-derived-data"></a> Derived data 

 * <a name="wagonderived"></a>In _Derived Data_ the tables which are produced by the task are displayed. If activated, these are saved to the output if the train is run as a derived data production. The produced derived data can be made available by the operators and serve as input for subsequent trains. 
 
 ### <a name="deriveddatatypes"></a> Derived data types
-* At the moment, there are two types of derived data specifications:
+* There are three types of derived data specifications:
   * Standard derived data (marked with 🗂️)- if the wagon is used in a train, this will produce derived data to be used for further analysis. The results will not be merged across runs and can be used as input for future train runs. Note that standard derived data trains do not submit automatically and may need additional approval. If in doubt, please seek advise before enabling derived data tables in your wagon configuration.
   * Slim derived data (marked with green bordered 🗂️) - similarly to the standard derived data case, if used in a train, this will produce derived data to be used for further analysis. This is reserved for derived data of small output size. The results will be merged across runs and are not available to use in future train runs. The data will be automatically deleted after a preset period of time. You can mark a wagon for running as slim derived data by checking `Ready for slim derived data`.
+  * Linked derived data (marked with red bordered 🗂️) - linked derived data trains will also produce derived data to be used for further analysis. Linked derived data has access to the parent AO2D - this is not the case for other derived data types. Like standard derived data, results are not merged across runs.
 
 * For wagons set as ready for slim derived data, two more fields need to be correctly set:
   * Max DF size - This sets the maximal dataframe size in the merging step. Has to be 0 for not-self contained derived data (which need parent file access).
  * Max derived file size - Sets the size limit for the output file size of the derived data file. This is an expert parameter which usually does not have to be changed. Only change this value if the processing in subsequent trains takes so long that the jobs fail. If set to 0 a good value will be automatically determined.
@@ -143,7 +144,7 @@
 <img src="../images/derivedDataEx.png" width="70%">
 </div>

 ## <a name="wagon-test-statistics"></a> Test Statistics 

 * <a name="wagonteststatistics"></a>_Test Statistics_ contains three graphs that display different metrics following the tests this wagon was part of. The first graph plots the _PSS Memory_ corresponding to each test run. The second one diplays the _CPU Time_, _Wall time_ and _Throughput_ along the test runs for this wagon. Finally, the third graph shows the _Output size_ at each test run.

@@ -292,7 +293,7 @@
 </div>

 * If you only want to see the top 10 graph with the highest average, check the Show top 10 largest box.

 * To produce this type of performance graphs for a local O2 execution, follow the instructions [here](#producing-performance-graphs-for-a-local-o2-execution).

 * Whenever a wagon configuration is changed, if there are enabled wagons (including wagons that depend on it), then the test is automatically reset and a new test is launched. However, if the enabled wagon was already composed in a train, the train will run with the wagons and dataset configuration of the time at which the train was created.
@@ -337,6 +338,7 @@
   </div>
 
  * The CPU usage limit is set per dataset and all trains running on a specific dataset must respect this constraint. If the limit is not respected, the train cannot be composed without PWG approval. Therefore, the user should discuss the details and requirements for this train with the PWG before requesting again. Depending on the amount of total resources, an approval in the Physics Board (PB) may also be needed. The CPU limit of a dataset may be viewed on the dataset page.
+ * It is possible for a train to have a CPU warning when composed despite the wagon test not having a CPU warning. This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset. 
 
 ### 4. <a name="warning-ccdb"></a> Too many CCDB calls
 

diff --git a/docs/images/scheduledWagon.png b/docs/images/scheduledWagon.png