Leaderboard

Leaderboard#

We showcase results of existing state-of-the-art algorithms evaluated on our preprocessed dataset.

Evaluation is done after running e.g., python eval.py --data_dir data/coupled --causal_model pcmciplus script. The default setup for simple case refers to ODE systems (\(\delta = 0\)), and no unobserved confounder (585 graphs). In the coupled system, the default setup refers to dynamics with \(n=10\) coupled ODEs (\(\delta = 0\)), no confounder, no time-lag (\(\tau = 0\)), and no internal standardization (4745 graphs). Each experiment type varies the mentioned configuration while keeping other unchanged.

Table 1: Baseline AUROC (↑) / AUPRC (↑) scores across different experiments in the hierarchy of increasingly complex dynamical systems. The scores are averaged over all generated graphs within each experiment.

Experiments

PCMCI+

FPCMCI

Varlingam

DYNOTEARS

NGC

TSCI

CUTS+

— Simple —

Default

.47 / .68

.51 / .70

.50 / .69

.43 / .67

.50 / .68

.46 / .68

.48 / .68

Noise

.50 / .69

.52 / .70

.53 / .69

.48 / .68

.50 / .68

.49 / .68

.50 / .68

Confounder

.48 / .59

.50 / .59

.48 / .57

.52 / .63

.50 / .57

.53 / .65

.52 / .59

— Coupled —

Default

.68 / .27

.67 / .24

.57 / .18

.66 / .32

.50 / .16

.69 / .36

.50 / .16

Noise

.66 / .30

.62 / .25

.58 / .19

.62 / .29

.50 / .16

.52 / .18

.50 / .16

Confounder

.56 / .20

.57 / .19

.51 / .17

.49 / .17

.50 / .17

.49 / .18

.50 / .17

Lag

.55 / .23

.55 / .23

.52 / .21

.52 / .22

.50 / .20

.53 / .23

.50 / .20

Standardize

.69 / .27

.67 / .23

.57 / .18

.68 / .34

.50 / .15

.69 / .35

.51 / .16

— Climate —

Atmos-ocean

.69 / .88

.50 / .81

.50 / .81

.62 / .86

.49 / .81

.58 / .84

.50 / .81

ENSO modes

.50 / .81

.51 / .81

.51 / .81

.50 / .81

.50 / .81

.50 / .81

.49 / .81