Marking Your Own Homework

In academia, the quality and success your work is evaluated by your peers (and competitors). They scrutinise every detail of your assumptions, methods, findings. This creates a culture where the researcher must honestly self-criticise and justify why their work is valuable. If it’s flawed, it’s rejected.
We as humans recognise that independent unbiased critique is needed to properly vet proposals and ideas.
In the world of commercial AI, the law/policy makers and governing bodies often lack the technical depth and resourcing to properly challenge practitioners. This problem exists on a micro and macro scale.
From my experience, this is due to the practitioners actively researching and creating new mathematical concepts and techniques, requiring knowledge about the state-of-the-art theories and practises, but governance are not in this world and are blind-sided by new developments only getting across it once problems have occurred.
So, what’s the alternative? If it were up to me, we’d have practitioners governing practitioners. This is easier inside a company; you add it to part of the duties of the ML/AI practitioner to review X number of other internal deployments a year. At a national/international level it’s a lot more challenging, ideally a method could be developed whereby giants like Anthropic and OpenAI can vet each other’s models and practices.
Without something like this, AI companies will continue telling you that training on copyrighted works is “fine” or “transformative”, while shovelling your data through their chatbots. It will not be until people suffer from financial loss that they will take their case to court for a remedy.
A good example of this was last week when Australia’s competition regulator has taken Microsoft to court for allegedly steering 2.7 million people onto pricier Microsoft 365 plans by bundling Copilot and failing to disclose a third “Classic” option (i.e., the practical opt-out from Copilot) at the point of choice. According to the ACCC, many customers only learned about “Classic” when they tried to cancel. The regulator is seeking penalties that could reach tens of millions of dollars. Whatever the eventual judgment, the case illustrates the cost of letting vendors grade their own user-experience ethics.
In my work at Traffyk, I keep seeing variations on this theme: dashboards that report only the flattering metrics; metrics that are constructed to only show success. No one to challenge whether the mathematics make sense.
If we want AI that earns trust, we need the equivalent of peer review in business and government.
Until then, the next best thing is for 3rd parties to make it their business model to keep these companies honest.
3 November 2025
Measure How Much Productivity You Could Gain With Our Calculator
Our productivity calculator reveals the potential costs Traffyk can save your business and improve productivity by when inefficient workforce communication is reduced.
