-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathscorer.html
More file actions
230 lines (192 loc) · 15.1 KB
/
scorer.html
File metadata and controls
230 lines (192 loc) · 15.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
<!DOCTYPE html>
<html class="writer-html5" lang="English" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Metrics: Compute score for domain adaptation problems — SKADA: Scikit Adaptation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=b86133f3" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=e59714d7" />
<link rel="stylesheet" type="text/css" href="_static/graphviz.css?v=4ae1632d" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery.css?v=d2d258e8" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-binder.css?v=f4aeca0c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-dataframe.css?v=2082cf3c" />
<link rel="stylesheet" type="text/css" href="_static/sg_gallery-rendered-html.css?v=1277b6f3" />
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=5e5be09d"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
SKADA
<img src="_static/skada_logo_full_white.svg" class="logo" alt="Logo"/>
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<ul>
<li class="toctree-l1"><a class="reference internal" href="index.html">SKADA: SciKit Adaptation</a></li>
<li class="toctree-l1"><a class="reference internal" href="auto_examples/plot_how_to_use_skada.html">How to use SKADA</a></li>
<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Users Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="all.html">API and modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="auto_examples/index.html">Examples gallery</a></li>
<li class="toctree-l1"><a class="reference internal" href="releases.html">Release of SKADA</a></li>
<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing to SKADA</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">SKADA</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Metrics: Compute score for domain adaptation problems</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/scorer.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="metrics-compute-score-for-domain-adaptation-problems">
<span id="scorer"></span><h1>Metrics: Compute score for domain adaptation problems<a class="headerlink" href="#metrics-compute-score-for-domain-adaptation-problems" title="Link to this heading"></a></h1>
<p>To evaluate an estimator or to select the best parameters for it, it is necessary to define a score.
In <a class="reference external" href="https://scikit-learn.org/">sklearn</a>, several functions and objects can make use of the
scoring API like <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score">cross_val_score</a>
or <a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV">GridSearchCV</a>.
To avoid overfitting, these methods split the initial data into training set and test set.
The training set is used to fit the estimator and the test set is used to compute the score.</p>
<p>In domain adaptation (DA) problems, source data and target data have a shift in their distributions.</p>
<p>Let's load a DA dataset:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">skada.datasets</span><span class="w"> </span><span class="kn">import</span> <span class="n">make_shifted_datasets</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">skada</span><span class="w"> </span><span class="kn">import</span> <span class="n">EntropicOTmapping</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">skada.metrics</span><span class="w"> </span><span class="kn">import</span> <span class="n">TargetAccuracyScorer</span>
<span class="gp">>>> </span><span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">0</span>
<span class="gp">>>> </span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">X_target</span><span class="p">,</span> <span class="n">y_target</span> <span class="o">=</span> <span class="n">make_shifted_datasets</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">n_samples_source</span><span class="o">=</span><span class="mi">30</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">n_samples_target</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">shift</span><span class="o">=</span><span class="s2">"covariate_shift"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">label</span><span class="o">=</span><span class="s2">"binary"</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">noise</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">random_state</span><span class="o">=</span><span class="n">RANDOM_SEED</span><span class="p">,</span>
<span class="gp">... </span><span class="p">)</span>
</pre></div>
</div>
<p>Now let's define a DA estimator to evaluate on this data:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">skada</span><span class="w"> </span><span class="kn">import</span> <span class="n">DensityReweight</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.linear_model</span><span class="w"> </span><span class="kn">import</span> <span class="n">LogisticRegression</span>
<span class="gp">>>> </span><span class="n">base_estimator</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">estimator</span> <span class="o">=</span> <span class="n">DensityReweight</span><span class="p">(</span><span class="n">base_estimator</span><span class="o">=</span><span class="n">base_estimator</span><span class="p">)</span>
</pre></div>
</div>
<p>Having a distribution shift between the two domains means that if the validation
is done on samples from source like shown in the images below, there is high
chance that the score does not reflect the score on target because the distributions are different.</p>
<a class="reference internal image-reference" href="_images/source_only_scorer.png"><img alt="Source Only Scorer" class="align-center" src="_images/source_only_scorer.png" style="width: 400px; height: 240px;" />
</a>
<p>To evaluate the estimator on the source data, one can use:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.model_selection</span><span class="w"> </span><span class="kn">import</span> <span class="n">cross_val_score</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.model_selection</span><span class="w"> </span><span class="kn">import</span> <span class="n">ShuffleSplit</span>
<span class="gp">>>> </span><span class="n">cv</span> <span class="o">=</span> <span class="n">ShuffleSplit</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">cross_val_score</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">estimator</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">X</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">y</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">fit_params</span><span class="o">=</span><span class="p">{</span><span class="s1">'X_target'</span><span class="p">:</span> <span class="n">X_target</span><span class="p">},</span>
<span class="gp">... </span> <span class="n">scoring</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="gp">... </span><span class="p">)</span>
<span class="go">array([0.72222222, 0.83333333, 0.81944444])</span>
</pre></div>
</div>
<p>skada offers a way to do the evaluation on the target data, while
reusing the scikit-learn methods and scoring API.</p>
<p>Different methods are available, to start we will use <a class="reference internal" href="gen_modules/skada.metrics.SupervisedScorer.html#skada.metrics.SupervisedScorer" title="skada.metrics.SupervisedScorer"><code class="xref py py-class docutils literal notranslate"><span class="pre">skada.metrics.SupervisedScorer</span></code></a>
that computes the score on the target domain:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">skada.metrics</span><span class="w"> </span><span class="kn">import</span> <span class="n">SupervisedScorer</span>
<span class="gp">>>> </span><span class="n">cv</span> <span class="o">=</span> <span class="n">ShuffleSplit</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">cross_val_score</span><span class="p">(</span>
<span class="gp">... </span> <span class="n">estimator</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">X</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">y</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">cv</span><span class="o">=</span><span class="n">cv</span><span class="p">,</span>
<span class="gp">... </span> <span class="n">fit_params</span><span class="o">=</span><span class="p">{</span><span class="s1">'X_target'</span><span class="p">:</span> <span class="n">X_target</span><span class="p">},</span>
<span class="gp">... </span> <span class="n">scoring</span><span class="o">=</span><span class="n">SupervisedScorer</span><span class="p">(</span><span class="n">X_target</span><span class="p">,</span> <span class="n">y_target</span><span class="p">),</span>
<span class="gp">... </span><span class="p">)</span>
<span class="go">array([0.975 , 0.95625, 0.95625])</span>
</pre></div>
</div>
</section>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<p>© Copyright 2023, The SKADA team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" data-toggle="rst-versions" role="note"
aria-label="versions">
<!-- add shift_up to the class for force viewing ,
data-toggle="rst-current-version" -->
<span class="rst-current-version" style="margin-bottom:1mm;">
<span class="fa fa-book"> SKADA: SciKit-ADAptation</span> 0.5.0
<hr style="margin-bottom:1.5mm;margin-top:5mm;">
<!-- versions
<span class="fa fa-caret-down"></span>-->
<span class="rst-current-version" style="display: inline-block;padding:
0px;color:#fcfcfcab;float:left;font-size: 100%;">
Versions:
<a href="https://scikit-adaptation.github.io/"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Release</a>
<a href="https://scikit-adaptation.github.io/dev"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Development</a>
<a href="https://github.com/scikit-adaptation/skada"
style="padding: 3px;color:#fcfcfc;font-size: 100%;">Code</a>
</span>
</span>
<!--
<div class="rst-other-versions">
<div class="injected">
<dl>
<dt>Versions</dt>
<dd><a href="https://pythonot.github.io/">Release</a></dd>
<dd><a href="https://pythonot.github.io/master">Development</a></dd>
<dt><a href="https://github.com/PythonOT/POT">Code on Github</a></dt>
</dl>
<hr>
</div>
</div>-->
</div><script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>