Innovation Lab > Using auto-classification to paint a picture of social sector trends
CLASSIEfier: Using auto-classification to paint a picture of social sector trends
By Paola Oliva-Altamirano, Data Scientist, Our Community
SUMMARY: Tracking the flow of funding and other support to social sector organisations in Australia has historically been difficult because of inconsistencies in categorisation, or the absence of categorisation entirely.
Our Community developed CLASSIE to serve as a universal classification system for Australian social sector initiatives and entities.
Alongside we also developed an algorithm, CLASSIEfier, to reduce or remove the need for manual (human) classification. CLASSIEfier allows us to classify historical records on behalf of grantmakers and other social sector supporters, and reduce the need for human intervention in classification of current and future records.
In 2016, Our Community launched the Classification system for Social Sector Initiatives and Entities (CLASSIE). This taxonomy – based on the US-based Foundation Center’s Philanthropy Classification System – provides a tool for classifying information related to Australia’s social sector in a standardised way.
Roughly $80 billion is given away in grants every year in Australia (Grants in Australia 2018 research study), but we are yet to establish a clear overall picture of the flow of money by sector, location and beneficiary. Using CLASSIE in the Australian grantmaking environment is enabling us to start filling in the blanks.
Soon after CLASSIE was developed, its "subject" and "population" sections were incorporated into Our Community’s grants administration platform, SmartyGrants, enabling grantseekers to select the subjects and populations of their grant applications, or the grantmakers to do that on their behalf.
Currently, 15% of SmartyGrants grantmakers use CLASSIE, and they have generated around 10,000+ classified grant applications to date.
While we are confident grantmakers’ use of CLASSIE will continue to expand, our current system relies on users to classify their own grants, and we know from experience that manual classification – using humans to read and classify each application – is time consuming.
Furthermore, current and future grant applications represent only a fraction of the data we would like to classify, given that SmartyGrants holds more than 400,000 historical grant application records.
Against this background, CLASSIEfier was born. CLASSIEfier is an algorithm designed to automatically classify grant applications, or indeed any relevant social sector data. Initially, CLASSIEfier classified against the CLASSIE taxonomy, but later versions of the algorithm now offer other dictionaries to classify against in addition to CLASSIE.
CLASSIEfier 2.4 launched in 2020 and CLASSIEfier 3.0 launched in 2022, with the United Nations' Sustainable Development Goals (SDGs) incorporated. Grantmakers can now use CLASSIEfier to track the progress of their own goals towards the SDGs.
Dr Paola Oliva-Altamirano
CLASSIEfier: How does it work?
CLASSIEfier is an algorithm that reads a grant application - or any text related to the social sector - and predicts the main subjects, populations and SDGs involved.
Initially, we considered using a machine learning algorithm to build CLASSIEfier, however, machine learning algorithms must be trained with a large set of already labelled applications to learn the representative writing patterns and vocabulary of each CLASSIE category - something that we didn't have when building the algorithm.
Our testing has shown that at least 2000 applications per CLASSIE category are needed to generate good results. CLASSIE subject has 900+ categories, resulting in in more than 180,000 labelled applications needed. It is impractical to reach those kinds of targets by manually classifying the data.
After extensive trials and research, we discovered that we could successfully extract keywords from the SmartyGrants database and create a controlled vocabulary to describe each CLASSIE category. We landed in creating an algorithm which follows a keyword-matching model to perform auto-classification.
A keyword-matching model
Keyword-matching is a common technique used to find keywords in text. We use keyword-matching relying on the hypothesis that certain combination of keywords can be used to describe a CLASSIE category.
The model uses three different groups of keywords and applies certain rules for each category:
- Unique keywords: They would be a clear and distinct representation of a CLASSIE category.
- Context keywords: They can be general keywords but will complement the unique keywords and give meaning to the text.
- Exclusion keyword: When these keywords are found in the text a category can be excluded even if there was a match of unique and context keywords.
See the example below. With the keyword-matching algorithm we can classify social sector text with 80% accuracy.
This is how the keyword-matching model works for the CLASSIE category "Cancers".
While developing CLASSIEfier, we concluded that it is not feasible to classify human natural languages with 100% accuracy. We found many cases where keyword matches led to a wrong classification. For example, an application containing the words "church", "religious" and "Christian" would be classified under "Religion" even if the application concerned a fete at a Catholic school.
We are exploring this issue by constantly searching for biases and involving third parties in CLASSIEfier's testing. Read a summary about our examination of biases in CLASSIEfier and our attempts to address them in "Ethical considerations in multilabel text classification."
The hierarchy of classifications
CLASSIE comprises a hierarchical taxonomy, where many categories themselves have “child” categories.
This is a simplified view of how CLASSIE subjects are structured, with the actual taxonomy including many more categories.
Consider a grant application aimed at helping teenagers with autism. This application will have the following classifications:
- “Health” at level 1
- “Diseases and conditions” at level 2
- “Brain and nervous system disorders” at level 3
- “Autism” at level 4
In classifying this application, the grantmaker or grantseeker may select the level 4 category “Autism”; doing so will automatically nest the application in the corresponding classification at higher levels (“Brain and nervous system disorders”; “Diseases and conditions”; “Health”).
This application will have two beneficiaries:
- “Children and youth (age 0-17)” at level 1
- “Adolescents (people aged 13-17)” at level 2
And also, perhaps:
- “People with disabilities” at level 1
- “People with intellectual disabilities” at level 2
As this example shows, most grant applications can be categorised by more than one label, which of course increases the complexity of CLASSIEfier.
To overcome this challenge, the algorithm runs from the higher levels to the lowest levels. It first matches the keywords in the most detailed categories (level 4 in Subjects and level 3 in Populations) and rolls the classification back to less detail if needed.
Additional taxonomies have been incorporated into the algorithm by mapping and re-using CLASSIE keywords. For example, in adding the SDGs to CLASSIEfier we first mapped the goals to CLASSIE categories. The SDG 3 "Good health and wellbeing" aligns with CLASSIE "Health" categories at all levels and "Sports and recreation" categories at some levels, particularly those related to fitness and wellbeing.
For more details on the SDG-CLASSIE correspondence see "The Future of Funding: How well are Australian grants addressing the UN Sustainable Development Goals?"
One CLASSIEfier, multiple uses
CLASSIEfier classifies almost any text relating to the social sector. We will offer this tool for use to grantmakers and other social sector supporters who wish to understand more about their own funding and support patterns, and by those who wish to know about and participate in mapping of universal trends.
The tool can be used to classify data not only within the SmartyGrants system but also across other enterprises, including GiveNow (Our Community’s donations platform), Funding Centre (our grantseeking database), and Good Jobs (our jobs search platform). External uses may be found for the tool too.
Thus we can further standardise how information is managed, allowing illumination of trends and comparisons within a specific account or domain as well as within and across sectors.
CLASSIEfier is the first of many artificial intelligence initiatives that Our Community is pursuing.
MORE: About the CLASSIE system | Other Innovation Lab news | Ethical Considerations in Multilabel Text Classifications (white paper) | The Future of Funding: How well are Australian grants addressing the UN Sustainable Development Goals? | The Future of Funding: What are the priorities and direction of Australian grantmakers?