Transparency and model gaming February 5, 2012 at 12:14 pm
A site with a rather tacky name suggests:
One of the most common reasons I hear for not letting a model be more transparent is that, if they did that, then people would game the model. I’d like to argue that that’s exactly what they should do, and it’s not a valid argument against transparency.
Take as an example the Value-added model for teachers. I don’t think there’s any excuse for this model to be opaque: it is widely used (all of New York City public middle and high schools for example), the scores are important to teachers, especially when they are up for tenure, and the community responds to the corresponding scores for the schools by taking their kids out or putting their kids into those schools. There’s lots at stake.
Why would you not want this to be transparent? Don’t we usually like to know how to evaluate our performance on the job? I’d like to know it if being 4 minutes late to work was a big deal, or if I need to stay late on Tuesdays in order to be perceived as working hard. In other words, given that it’s high stakes it’s only fair to let people know how they are being measured and, thus, how to “improve” with respect to that measurement.
Instead of calling it “gaming the model”, we should see it as improving our scores, which, if it’s a good model, should mean being better teachers (or whatever you’re testing).
This is an interesting point. I certainly agree that is you are going to measure people on x then telling them what x is is only fair. But I would never promise that x was my only criteria for measuring a real world job, as I don’t believe we can write the specification for many activities well enough to always know that maximizing the x-score is equivalent to doing the job well.
(PFI contracts are of course a great example of this; one of the reasons that PFI is a terrible idea is that you can’t write a contract that defines what it means to run a railway well for ten years that stands up to the harsh light of events.)
Thus I would argue that the problem in the situation outlined above isn’t lack of transparency, it is using a fixed formula to evaluate something complicated and contingent. Sure, by all means say ‘these scores are important’, but leave some room for judgement and user feedback too. Humility about how much you can measure is important too.
There is also a good reason for keeping some models secret, and that is the use of proxies. Say I want to measure something but I can’t access the real data. I know that the proxy I use isn’t completely accurate – it does not have complete predictive power – but it is better than nothing. Here for instance is the FED in testimony to Congress on a feature of credit scoring models:
Results obtained with the model estimated especially for this study suggest that the credit characteristics included in credit history scoring models do not serve as substitutes, or proxies, for race, ethnicity, or sex. The analysis does suggest, however, that certain credit characteristics serve, in part, as limited proxies for age. A result of this limited proxying is that the credit scores for older individuals are slightly lower, and those of younger individuals somewhat higher, than would be the case had these credit characteristics not partially proxied for age. Analysis shows that mitigating this effect by dropping these credit characteristics from the model would come at a cost, as these credit characteristics have strong predictive power over and above their role as age proxies.
Credit scoring models are trying to get at ability and willingness to pay, but they have to use proxies, such as disposable income and prior history, to do that. Some of those proxies inadvertently measure things that you don’t want them to too, like age, but excluding them would decrease model performance.
Here, it is better that the proxies are not precisely known so that they are harder to game. The last thing you want in a credit scoring model is folks knowing how best to lie to you, especially if some of the data is hard to check. It is much better to ask for more than you need, as in psychometrics, and use the extra data as a consistency check (or just throw it away) than to tell people how your model works. Its predictive power may well decline markedly if people know how it works.
Of course, you need a regulatory framework around this so that models which try to measure, for instance, race, are banned, but that does not require model transparency. Sometimes it really is better to keep the model as a black box.