Sam Rickman
New artificial intelligence (AI) tools, known as large language models (LLMs), are being used in local authorities to automatically write social care notes. These tools – which are like ChatGPT – can generate human like text responses to user prompts. AI can be designed create text for a wide range of tasks, resulting in the reduction of paperwork freeing up social work time. But AI can also reflect unfair biases.
The aim of the study is to evaluate if there is gender bias in the LLMs that are used to evaluate the care needs of the elderly.
The study asked three questions:
Real anonymised case notes from older people receiving care were used. Each note was re-written with the gender swapped (e.g. “Mr Smith” instead of “Mrs Smith”). Two modern AI models (Google’s Gemma and Meta’s Llama 3) and two older benchmark models were asked to summarise the notes. The male and female versions were compared.
Meta’s Llama 3 showed no gender differences. Google’s Gemma produced the most unequal summaries. It emphasised men’s physical and mental health problems more strongly, using words like “disabled” or “unable”, while women’s needs were downplayed or described more vaguely. This could make men appear more in need of support, even when their situations were identical. If women’s needs are recorded in less serious terms, they could receive less support.
The study shows that not all AI systems are the same. Local authorities should test LLMs for bias, as care services awards are based on need and this could impact allocation decisions. Further research to assess whether similar patterns arise in other health and care settings, such as hospitals or mental health, is required. Finally, if the government wishes to ensure that AI models are fair, it may need to introduce legislation to make sure fairness is tested.