Leaderboard

Leaderboard#

We showcase results of existing state-of-the-art algorithms evaluated on our preprocessed dataset.

Evaluation is done after running e.g., python eval.py --data_dir data/coupled --causal_model pcmciplus script. The default setup for simple case refers to ODE systems (\(\delta = 0\)), and no unobserved confounder (585 graphs). In the coupled system, the default setup refers to dynamics with \(n=10\) coupled ODEs (\(\delta = 0\)), no confounder, no time-lag (\(\tau = 0\)), and no internal standardization (4745 graphs). Each experiment type varies the mentioned configuration while keeping other unchanged.

Table 1: Baseline AUROC (↑) / AUPRC (↑) scores across different experiments in the hierarchy of increasingly complex dynamical systems. The scores are averaged over all generated graphs within each experiment.

Experiments	PCMCI+	FPCMCI	Varlingam	DYNOTEARS	NGC	TSCI	CUTS+
— Simple —
Default	.47 / .68	.51 / .70	.50 / .69	.43 / .67	.50 / .68	.46 / .68	.48 / .68
Noise	.50 / .69	.52 / .70	.53 / .69	.48 / .68	.50 / .68	.49 / .68	.50 / .68
Confounder	.48 / .59	.50 / .59	.48 / .57	.52 / .63	.50 / .57	.53 / .65	.52 / .59
— Coupled —
Default	.68 / .27	.67 / .24	.57 / .18	.66 / .32	.50 / .16	.69 / .36	.50 / .16
Noise	.66 / .30	.62 / .25	.58 / .19	.62 / .29	.50 / .16	.52 / .18	.50 / .16
Confounder	.56 / .20	.57 / .19	.51 / .17	.49 / .17	.50 / .17	.49 / .18	.50 / .17
Lag	.55 / .23	.55 / .23	.52 / .21	.52 / .22	.50 / .20	.53 / .23	.50 / .20
Standardize	.69 / .27	.67 / .23	.57 / .18	.68 / .34	.50 / .15	.69 / .35	.51 / .16
— Climate —
Atmos-ocean	.69 / .88	.50 / .81	.50 / .81	.62 / .86	.49 / .81	.58 / .84	.50 / .81
ENSO modes	.50 / .81	.51 / .81	.51 / .81	.50 / .81	.50 / .81	.50 / .81	.49 / .81