I was greeted recently with a BBC news article that claimed ‘Women write better code, study suggests’.
As a coder of over 30 years of experience, I have known quite a few women who have been excellent coders, some who have been good solid coders and some who have been rather poor. Similar to my experience of male coders.
With Oliver Frost’s article Gender Gap in A Level Computing Results ringing in my ears, the BBC’s headline didn’t sit well with me; it seemed like some further thought was necessary.
With my data analytics ‘hat’ on I felt there were a few problems with both the headline and the framing of the ‘statistics’ quoted within the BBC article.
The main problem is that looking for statistical relationships in data not collected for the purpose of testing a specific hypothesis means there will be all sorts of confounding factors that haven’t been understood or planned for. I think the GitHub data is rife with confounding factors – throwing some ideas forward that could be problems:
- The coding population of women is lower than that of men. It is possible that women who code need to be better than men in order to progress in a male dominated area.
- It might be that there is a gender bias (for or against) women within the GitHub community and so that may make it easier or harder for women to have code accepted. Women’s perceptions of this will make the quality of code submitted different to the norm.
In addition, the headline stats quoted showed no comparison of the distribution:
‘The team found that 78.6% of pull requests made by women were accepted compared with 74.6% of those by men.’
This a fairly poor statement and the difference could easily be due to sampling difference within the data or distributions. Certainly the article is written for the population at large, but there needs to be elements of good data science and mathematics here, otherwise we are allowing people to continue in ignorance.
I am certain there are gender differences in coding, just as there are gender differences in spatial awareness and fine motor skills, but really I think to prove the article’s conjecture, you need to do a baseline measurement of programming ability that takes into account gender, ethnicity, race, age, etc., and then on that basis analyse the acceptance or rejection on the basis of the various skill levels of individuals.
Perhaps without that the title of the article should read ‘Women Who Submit Code on GitHub Have a Higher Average Code Acceptance Compared to Men.’