Solving Sample Selection Bias in Credit Scoring: The Reject Inference

By Gabriele Sabato


Nonrandom samples may present a significant problem in credit scoring. In general, the developer of a credit scoring system possesses only the behavioural information of accepted applicants. However, the scoring model is to be used to evaluate applicants who are drawn, arguable randomly, from the entire population. Assuming that accepted applicants were qualitatively different from individuals whose application were rejected, developing a scoring model on a sample that includes only accepted applicants may introduce sample selection bias and lead to inferior classification results (see Hand (1998) and Greene (1998)). Methods for coping with this problem are known as reject inference techniques.

Some statisticians argue that reject inference can solve the nonrandom sample selection problem (e.g. Copas and Li (1997), Joanes (1994), Donald (1995) and Green (1998)). In particular, reject inference techniques attempt to get additional data for rejected applicants or try to infer the missing performance (good/bad) information. The most common methods explored in the literature are: enlargement, reweighting and extrapolation (see Ash and Meester (2002), Banasik et al. (2003), Crook and Banasik (2004) and Parnitzke (2005)). However, some authors (e.g. Hand and Henley (1993)) demonstrate that the reject inference methods typically employed in the industry are often not sound and rest on very tenuous assumptions. They point out that reliable reject inference is impossible and that the only robust approach to reject inference is to accept a sample of rejected applications and observe their behaviour.

In this paper, we analyze the reasons to use reject inference and we assess the different proposed solutions from a statistical and business related point of view. However, in contrast with most of the available literature, we consider the business perspective more relevant than the statistical one in the financial industry context. As such, we conclude that increasing the prediction accuracy of scoring models should not be regarded as the main goal of reject inference techniques. The possibility of including rejects in the development sample should be considered, instead, as an opportunity to replicate the experience and the decision taken by underwriters, credit analysts or branch managers when assessing applicants’ creditworthiness.

Aligning a new scoring model to underwriters’ risk assessment will help them to better understand the way the model works and takes the accept/reject decision. This will likely facilitate the introduction of an automated decision system for a product that was previously manually underwritten and will lower the number of requests to override the system decision increasing the efficiency of the acquisition process.

With regards to reject inference methodologies, most of the literature focuses on how to infer the missing performance of the rejected clients without considering the significant value of the accept/reject information. Although the most common approaches to reject inference (e.g. Hand (2002), Ash and Meester (2002) and Crook and Banasik (2004)) are extremely valuable from the statistical point of view, we believe that financial institutions should follow a more practical method when developing their application models in order to guarantee the successful implementation of their systems. We are convinced that scoring models should not be judged only looking at their performance metrics (e.g. discriminatory power, accuracy, stability), but also based on their comprehensibility, simplicity, level of implementation efforts required and level of overrides that would generate.

Finally, we propose a practical approach that allows to make use of the rejected applicants when developing a new scoring model. First, we develop a model to predict the probability of default using only accepted clients and we apply it on the entire sample (accepted and rejected clients). Then, we use the reject rate (RR) to “correct” the observed good/bad odds (O-G/B odds) and find out what would have been the rejected good/bad odds (I-G/B odds). Ultimately, we combine the O-G/B odds and the I-G/B odds in order to derive the real good/bad odds (R-G/B odds), similar to the one that we would have observed if rejected clients would have been accepted. The remainder of the article is structured as follows. In Section 2, we review some of the most relevant research related to reject inference methodologies for credit scoring. In Section 3, we extensively analyze the proposed methodology from both a theoretical and an empirical point of view. Data from an unsecured personal loans portfolio of a Brazilian bank is used to test the proposed technique. In Section 4, we submit our conclusions.