The indices described above single out one group of students (for instance, disadvantaged students), compared to all other students (for instance, all non-disadvantaged students, including advantaged students and those of average socio-economic status). However, the two-group analysis may be inadequate for describing more complex patterns of social segregation. In PISA, for instance, the social background of students is defined by a continuous variable. Using binary variables for describing this situation loses a lot of information. For instance, if one focuses on disadvantaged students, a binary outcome will result in contrasting those disadvantaged students (defined as students whose socio-economic status is below the first quartile of the national distribution of this variable) to all other students (from the second to the fourth quartile). Assume that in some countries the most disadvantaged students (those below the first quartile) are never enrolled in the same schools as the most advantaged students (those above the fourth quartile) but often with the slightly more advantaged students. Contrasting the disadvantaged students with all other students will provide only a partial view of the segregation across social categories, and how social diversity observed in the population is reflected in schools. One could use several distinct indices, but another option is to use multi-group indices. Several ones have been proposed for categorical data in order to obtain a more complete description of the segregation (Reardon and Firebaugh, 2002[3]; Frankel and Volij, 2011[1]).
One may also consider the no-diversity index, which has the advantage of being decomposable. This index is based on a measure of entropy, meaning of the diversity in a group, and for this reason is often referred as the entropy index, or mutual information index. It is based on the Theil index commonly used in inequality analysis. When analysing a distribution of the population in four categories, in proportion
, the diversity of the population may be related to the measure:
The no-diversity index compares this measure to the average obtained at the school level:
In the equation above,
is the proportion of the four categories of the students amongst the nj students in school j (and N the total number of students).
The no-diversity index goes from 0 (no segregation) to 1 (full segregation). One of its advantages is that it is additively decomposable.2 If one aggregates schools at a higher level, typically comparing private schools to public schools, the no-diversity index can be decomposed into three components. One component corresponds to the social segregation within private schools, the second to the segregation within public schools, and the third to the additional segregation that reflects the fact that the social composition in the public sector could be distinct from that of the private sector.
Formally, this can be written as:
, with
interpreted as the segregation due specifically to the coexistence of private and public sectors.
The indicator is based on the same idea as the social inclusion index commonly used in PISA publications, from the ratio of the within- and between-school variances (of the continuous social index or performance). The inclusiveness indicator measure relies on a multilevel model (or hierarchical model) that decomposes the variance (modelled by a normal distribution) with one component corresponding to the schools and another to the students. In multilevel models, the estimator of the variance between schools is corrected in order to take into account the part of this variability that is due to students. However, this estimator cannot be additively decomposed in a direct way.