Hyperphronesis: Bias in Statistical Judgment

Bias in Performance Evaluation

Suppose you are an employer. You are looking to fill a position and you want the best person for the job. To do this, you take a pool of applicants, and for each one, you test them N times on some metric X. From these N tests, you will develop some idea of what each applicant's performance will look like, and based on that, you will hire the applicant or applicants with the best probable performance. However, you know that each applicant comes from one of two populations which you believe to have different statistical characteristics, and you know immediately which population each applicant comes from.

We will use the following model: We will assume that the population from which the applicants are taken is made up of two sub-populations A and B. These two sub-populations have different distributions of individual mean performance that are both Gaussian. That is, an individual drawn from sub-population A will have an expected performance that is normally distributed with mean \(\mu_A\) and variance \(\sigma_A^2\). Likewise, an individual drawn from sub-population B will have an expected performance that is normally distributed with mean \(\mu_B\) and variance \(\sigma_B^2\). Individual performances are then taken to be normally distributed with the individual mean and individual variance \(\sigma_i^2\).

Suppose we take a given applicant who we know comes from sub-population B. We sample her performance N times and get performances of \(\{x_1,x_2,x_3,...,x_N\}=\textbf{x}\). We form the following complete pdf for the (N+1) variables of the individual mean and the N performances: \[ f_{\mu_i,\textbf{x}|B}(\mu_i,x_1,x_2,...,x_N)=\frac{1}{\sqrt{2\pi}^{N+1}}\frac{1}{\sigma_B \sigma_i^N} \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] It follows that the distribution conditioned on the test results is proportional to: \[ f_{\mu_i|,\textbf{x},B}(\mu_i)\propto \exp\left ({-\frac{(\mu_i-\mu_B)^2}{2\sigma_B^2}} \right ) \prod_{k=1}^N\exp\left ({-\frac{(x_k-\mu_i)^2}{2\sigma_i^2}} \right ) \] By normalizing we find that this implies that the individual mean, given that it comes from sub-population B and given the N test results, is normally distributed with variance \[ \sigma_{\tilde{\mu_i}}^2=\left ( {\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \right )^{-1} \] and mean \[ \tilde{\mu_i}=\frac{\frac{\mu_B}{\sigma_B^2}+\frac{1}{\sigma_i^2}\sum_{k=1}^{N}x_k}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} =\frac{\frac{\mu_B}{\sigma_B^2}+\frac{N}{\sigma_i^2}\bar{\textbf{x}}}{\frac{1}{\sigma_B^2}+\frac{N}{\sigma_i^2}} \] We will assume that this mean and variance are used as estimators to predict performance. Note that, in the limit of large N, \(\sigma_{\tilde{\mu_i}}^2\rightarrow \sigma_i^2/N\) and \(\tilde{\mu_i}\rightarrow \bar{\textbf{x}}\rightarrow \mu_i\), as expected.

Suppose we assume sub-populations A and B have the same variance \(\sigma_{AB}^2\), but \(\mu_A>\mu_B\). then we can note the following few implications:

The belief about the sub-population the applicant comes from acts effectively as another performance sample of weight \(\sigma_i^2/\sigma_{AB}^2\).
If applicant 1 comes from sub-population A and applicant 2 comes from sub-population B, even if they perform identically in their samples, applicant 1 would nevertheless still be preferred.
The more samples are taken, the less the sub-population the applicant comes from matters.
The larger the difference in means between the sub-populations is assumed to be, the better the lesser-viewed applicant will need to perform in order to be selected over the better-viewed applicant.
Suppose we compare \(\tilde{\mu_i}\) to \(\bar{\textbf{x}}\). Our selection criteria will simply be if the performance predictor is above \(x_m\). We want to find the probability of being from a given sub-population given that the applicant was selected by each predictor. For the sub-population-indifferent predictor: \[ P(A|\bar{\textbf{x}}\geq x_m)=\frac{P(\bar{\textbf{x}}\geq x_m|A)P(A)}{P(\bar{\textbf{x}}\geq x_m|A)P(A)+P(\bar{\textbf{x}}\geq x_m|B)P(B)} \\ \\ P(A|\bar{\textbf{x}}\geq x_m)= \frac{P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] Where \[ Q(z)=\int_{z}^{\infty}\frac{e^{-s^2/2}}{\sqrt{2\pi}}ds\approx \frac{e^{-z^2/2}}{z\sqrt{2\pi}} \] For the sub-population-sensitive predictor, we first note that \[ \tilde{\mu_i} \geq x_m \Rightarrow \bar{\textbf{x}}\geq x_m+(x_m-\mu_A)\frac{\sigma_i^2}{N\sigma_A^2}=x_m' \] Which then implies \[ P(A|\tilde{\mu_i}\geq x_m)=\frac{P(\tilde{\mu_i}\geq x_m|A)P(A)}{P(\tilde{\mu_i}\geq x_m|A)P(A)+P(\tilde{\mu_i}\geq x_m|B)P(B)} \\ \\ P(A|\tilde{\mu_i}\geq x_m)= \frac{P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} {P(A)Q\left (\frac{x_m'-\mu_A}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right ) + P(B)Q\left (\frac{x_m'-\mu_B}{\sqrt{\sigma_{AB}^2+\sigma_i^2/N}} \right )} \] As \(x_m > \mu_A\) and thus \(x_m' > x_m\), it is easy to see that \(P(A) < P(A|\bar{\textbf{x}}\geq x_m) < P(A|\tilde{\mu_i}\geq x_m) \). Thus the sensitivity further biases the selection towards sub-population A. We can call \(\bar{\textbf{x}}\) the meritocratic predictor and \(\tilde{\mu_i}\) the semi-meritocratic predictor.

Some Sociological Implications

Though the above effects may, in theory, be small, their effects in practice may not be. Humans are not perfectly rational and are not perfect statistical computers. The above is meant to give motivation for taking seriously effects that are often much more pronounced. If there is a perceived difference in means, there is likely a tendency to exaggerate it, to think that the difference in means should be visible, and hence that the two distributions should be statistically separable. Likewise, population variances are often perceived as narrower than they really are, leading to further amplification of the biasing effect. Moreover, the parameter estimations are not based simply on objective observation of the sub-populations, but also if not mainly on subjective, sociological, psychological, and cultural factors. As high confidence in one's initial estimates makes one less likely to take more samples, the employer's judgment may rest heavily on subjective biases. Given this, if the employer's objective is simply to hire the best candidates, she should simply use the meritocratic predictor (or perhaps at least invest some time into getting accurate sub-population parameters).

However, it is worth noting some effects on the candidates themselves. As a rule, the candidates are not subjected to this bias just in this bid for employment alone, but rather serially and repeatedly, in bid after bid. This may have any of the following effects: driving applicants toward jobs where they will be more favored (or less dis-favored) by the bias; affecting the applicant's self-evaluations, making them think their personal mean is closer to the broadly perceived sub-population mean; normalizing the broadly perceived sub-population mean, with an implicit devaluation of deviation from it. Also, we can note the following well-known problem: personal means tend to increase in challenging jobs, meaning that the unfavorable bias will perpetually stand in the way of the development of the negatively biased candidate, which then only serves to further feed into the bias. Both advantages and disadvantages tend to widen, making this a subtle case of "the rich get richer and the poor get poorer".

The moral of all this can be summarized as: the semi-meritocratic predictor should be avoided if possible as it is very difficult to implement effectively and has a tendency to introduce a host of detrimental effects. Fortunately, the meritocratic predictor loses only a small amount by way of informative-ness, and avoids the drawbacks mentioned above. Care should then be taken to ensure that the meritocratic selection system is implemented as carefully as can be managed to preclude the introduction of biasing effects. one way of washing out the effects of biasing in general is simply to give the applicants many opportunities to demonstrate their abilities.

Hyperphronesis

Tuesday, February 6, 2018

Bias in Statistical Judgment

Bias in Performance Evaluation

Some Sociological Implications

1 comment: