The goal of extra trees is to partition space in order to subcluster data

Split step a node

Given a subset S, try to partition it.

If stop split(S) : do nothing Else

Select K attributes in the non-constant candidates attributes

Try K split [s1, ..., sK] given the pick_random_split(S, ai)

Return the split si that maximize Score(si, S)

Pick_random_split (S, a)

Find the extremum attribute values amin and amax

Pick ac ∈ [amin, amax]

Return the split [a < ac]

Stop Split

Given a subset S

if :

  • The number of elements of S (|S|) is inf to nmin then TRUE
  • All attribute are constant in S, then TRUE (all same X in the cluster, so no need to split)
  • All output are constant in s, then TRUE (all same y in the local cluster)
  • ELSE : FALSE