Grounding AI the webby way — Taxonomy Boot Camp London 2026 takeaways

Last month I attended Taxonomy Boot Camp London 2026. Contrary to its name, it is a conference rather than a bootcamp. And it was about more than taxonomy, partly because it was co-located with the KM World Europe conference.

I have attended one or two virtual Taxonomy Boot Camp events, but this was its first in-person event since the Covid-19 lockdown. I wasn’t sure how busy the conference would be, but it bustled like a pre-pandemic event.

There were around 250 attendees, an impressive number of sponsors, an abundance of buffet food, and lots of thought-expanding conversation with like-minded information practitioners. It’s a credit to the conference organisers, including Helen Lippell for Taxonomy Boot Camp London, and Dawn Brushammar for KM World Europe.

It was great to meet face-to-face with Noz Urbina and Rahel Anne Bailie, both of whom I recently worked with to present sessions with my colleagues at the Scottish Government. It was also wonderful to meet in-person with the information architects at the UK Government Digital Service, as well as make some new connections.

Running themes of the conference

This was an opportunity to spend 2½ days soaking up information about taxonomies and other semantic approaches. It has given me a great deal of confidence that we are going in the right direction in my work to get stronger information architecture practices in place at the Scottish Government.

Inevitably, there was a heavy focus on artificial intelligence. There was a general consensus that semantic information has become vital to improving the reliability of AI tools. There was also a fair bit of talk about how AI tools can help us build those semantic structures.

But what really struck me was another running theme that was less explicit, but could be detected throughout many of the sessions. The semantic approaches that are preparing us for our AI future are in fact well-established web standards that have been around for decades.

For a conference with “taxonomy” in its title, it was also notable that there was broad consensus about the need to build towards ontologies rather than taxonomies alone. There is an interesting tension around whether you should start with your taxonomies first, or your ontologies first.

But what is clear is that strong foundations are needed to set us up for future success. It is reassuring to know that our best tools are not experimental technologies that may cause unknown harms, in the league of the current generation of large language models. They are in fact proven technologies that have stood the test of time.

Coincidentally, the conference venue (1 America Square) has part of the Roman London Wall running through it and on display, making it apt to be talking about structures with longevity.

Related to that, there was lots to learn about governance, as well as the day-to-day operational practicalities of managing the semantic layer.

On the first half-day of the conference, I was lucky to participate in a workshop about taxonomy governance run by Stephanie Lemieux and Michele Ann Jenkins from Dovecot. This is a key topic for me at the moment, as I know that as soon as we get our first formal taxonomies off the ground, we will have to be on top of how we govern them. I took a huge amount from that session, which should stand us in good stead for our next steps.

I also got the chance to purchase at a discount one of the books on my to-buy list, the Accidental Taxonomist by Heather Hedden. She signed it: “Hope you enjoy taxonomies!” Fingers crossed…

This was one of those conferences with multiple tracks, which meant some hard decisions about which sessions to attend. But I have pulled together some thoughts following the conference, including a glance at the available slides from sessions I could not attend.

Grounding AI tools with a semantic layer

Both of the keynote speakers, Ben Clinch and Noz Urbina, sounded alarm bells around relying on artificial intelligence tools too heavily. The most powerful example was of ChatGPT attempting to replicate an image of Dwayne Johnson 101 times, to horrifying effect.

Ben Clinch listed “10 Cs that AI can’t fake”, but that semantic information can support. These included curation, context, consistency and codification.

Meanwhile, Noz Urbina said that structure and meaning live in the graph, not in the AI model. This reduces generative weaknesses: the hallucinations, inaccuracies, opacity and desperation that AI tools often exhibit.

Noz also noted Gartner’s recent statement:

By 2030, universal semantic layers will be treated as critical infrastructure.

Jo Chapman from ICP said that alignment across taxonomy and terminology reduces errors and improves findability for both humans and machines. She said

If you want an AI that actually understands your business, provide a single source of truth for the language and concepts you want it to use. Managing the concepts and aligning the disciplines will build a foundation that is ready for the future.

Limitations of traditional databases

Both keynotes also made a point about the semantic limitations of traditional and relational databases.

Ben Clinch said: “Relational databases lack relations.” They lack the verbs we need in our semantic systems, using shared IDs as a weak replacement.

Noz Urbina noted a “significant loss of semantics in a traditional databases.” He shared research that demonstrates that ontologies improve the accuracy of large language models more than approaches using SQL or knowledge graphs only.

Opportunities of knowledge graphs

Noz Urbina also noted that knowledge graphs are the fastest growing type of database in the world. He said they can:

…honour separations and silos that are there for a reason, for example, because of regulations or confidentiality across clients.

This got me thinking about whether knowledge graphs might help us offer greater personalisation in privacy-preserving ways.

Noz described knowledge graphs as “the secret weapon of the giants”:

Netflix’s semantic system drives a recommendation engine they estimate is worth $1 billion a year.
Amazon uses knowledge graphs for product recommendations.
Ikea uses them to formalise the anatomy of their home furnishing solutions.
Facebook maps our personal relationships in a “social graph”.

This is a strong reminder that the ideas behind the semantic web did not fail. Anyone who thinks the semantic web failed didn’t realise that they effectively outsourced the work — and the value — to Google.

We allowed tech giants to take control of our information’s relationships in the search and social eras. That was all very well when Google still pledged: “don’t be evil”. But since Facebook took control of our real-life social relationships, it has felt like we ran off the cliff and realised too late like Wile E Coyote.

As we sit on the cusp of the artificial intelligence era, we face this choice again. At this moment in history, do we want to allow emerging tech giants to control our information and our semantics that will define the accuracy of AI tools?

Do we take that risk, like we risked our social relationships with Facebook 20 years ago? Or shall we take sovereignty over our own information?

Towards context graphs

A few presenters talked about the role of context graphs.

Noz Urbina said:

There is a vicious debate already about whether the context graph is part of the semantic layer or just an extension of a knowledge graph.

But he noted that context graphs augment knowledge graphs with the “why, who, when and where of special conditions”.

Helmut Nagy said that a context graph is effectively:

knowledge graph + time + decision lineage

Possibilities of graph-based content

My favourite session was from Teodora Petkova, who talked about writing with knowledge graphs in mind. Her session spoke to exactly the vision I have for delivering more effective content — by thinking of it as a graph rather than a folder structure of pages.

Her session felt like a love letter to the possibilities of hypertext, some of which we have not yet realised. She even included an excerpt of a video interview with the hypertext pioneer Ted Nelson about intertwingularity.

We often lose sight of the fact that the web was conceived as a web and not a tree.

Teodora said: “Webby words make for webby experiences.”

We must stop treating our audience as passive consumers of our content. We should respect people as active seekers of knowledge, berrypicking the information they need to solve a problem.

Teodora said:

Content is for action, not just consumption.
Content is dynamic, not just an artefact.
Content is semantic capital, because without structured information, AI has nothing to work with.

Teodora says we should conceive of content to enable decisions, tasks and reuse — not just reading. We must structure content so it can be embedded in workflows — not locked up in documents. And we should be prepared for the fact that AI quality is proportional to content quality and structure.

This all aligns so well with how I would like us to rethink our content. It should not be conceived of as a set of siloed webpages. Instead, we should think of content as the lifeblood that flows through our information systems.

My big takeaway from Teodora’s session was the relevance of this idea for marketing content. Before, I was tempted to think that marketing is one area where this graph-inspired vision is less important, given the fact that marketing can be more bespoke, more transient and tending towards disinformation.

But Teodora so enthusiastically outlined the possibilities of graph thinking from a marketing perspective that it has made me feel even more ambitious in this space.

I asked Teodora a question, which she has since described in her conference write-up as:

…the million dollar question — how do we get writers to think about interconnectedness when creating content?

I’m not sure we have the answer yet. But Teodora and I had a great chat during the conference, and I hope we get the opportunity to explore this further with her.

Taxonomies in content management

I didn’t catch Rahel Anne Bailie’s session, although I can see from her slides that she did talk through her brilliant summary of different types of structure (which she delivered to our colleagues when she spoke at the Scottish Government). I love this point she made:

Why does the industry persist in saying that AI models are trained on “data” when they are clearly being trained on “content”?

Tom Alexander from Cancer Research UK talked about a major content migration project he has recently been part of. This was the case study I could relate to the most from my experience in content management.

He noted how a content migration is inevitably more than the “lift and shift” that is often sold to us. It also entailed:

Developing a new taxonomy function within their new content management system.
Adopting a new content model.
Upskilling and changing ways of working with editors.
Future-proofing for personalisation.

He said this required “building the plane while you’re flying”.

He described how Cancer Research UK now have a taxonomy sitting above multiple “spaces” (which I think means websites or channels). He also mentioned working towards a “content graph”, and exploring personalisation so serve content based on what we know about users.

He also drew a connection to the need to develop ontologies:

Good taxonomy means good ontology, means good content, means good experience.

Existing web standards help us achieve all this

I was particularly struck by this major running theme. Amid the rise of artificial intelligence and rapidly evolving user expectations, the challenges of new technologies can best be met by well-established web standards.

For example, Ben Clinch had a slide about iteratively growing an enterprise graph, which listed a host of well-established W3C and other web standards. Among others, this included:

Fair (the internationally recognised principle that digital assets should be findable, accessible, interoperable and reusable)
Owl (Web Ontology Language)
RDF (Resource Description Framework)
Sparql (RDF query language)
Skos (Simple Knowledge Organisation System)
Schema.org

(A bit more radically, Ben Clinch declared that Tim Berners-Lee’s Solid project is: “the greatest hope we have for privacy.”)

Achim Reiz used a similar illustration, which he described as the semantic stack. Here, URIs form the foundation, with RDF above it. Taxonomy forms a boundary, connecting this foundation to the other building blocks above it: RDF Schema, Sparql, Owl and Shacl (Shapes Constraint Language).

Dave McComb, CEO of Semantic Arts, noted that the power of foundational standards like RDF, Owl and Shacl is that they are open, with no vendor lock-in. These standards have been stable for decades, and he described the vendor adherence to these standards as remarkable.

He declared: “Building on this foundation is building for the future.”

The power (and limitations) of Skos

Being primarily a taxonomy conference, there was naturally a lot of focus on Skos, a W3C standard that defines how a taxonomy or thesaurus should be structured.

Multiple people proposed an expansive role for Skos. For example, Matt Hollidge of Kore advocated for using Skos concepts in the domain space, not just the taxonomy space.

Paul Appleby and Ravinder Singh from Graphifi proposed using Skos-XL (Skos extension for labels) as it can promote labels to named resources. It can also underpin provenance — recording who created a label, when and why. I wonder if this effectively creates the beginnings of a context graph?

Meanwhile Joyce van Aalten of Invenier said Skos-XL and labels can help navigate political difficulties and governance issues.

But Paul Appleby and Ravinder Singh also noted that Skos is ultimately limited in its semantics.

Heather Hedden and Joseph Busch held a session comparing Skos to other standards — (ISO 25964 and Ansi/Niso Z39.19). Skos is a model that defines the structure of a taxonomy for interoperability, but it does not define quality standards. This can give rise to inconsistencies that may be tolerated by humans, but not by artificial intelligence.

Moving towards, or starting with, ontologies?

Given the imperative to ground artificial intelligence with strong semantic information, many presenters throughout the conference noted that really we need to work towards ontologies, not just taxonomies.

Dave McComb provided the strongest plea to go ontology first. He himself noted that his position is contrary to the advice almost universally given: that you should start with the simplest knowledge systems, then develop your taxonomies, before tacking ontologies.

Many presenters outlined that sort of roadmap. A well-known example is Jessica Talisman’s ontology pipeline. But Ben Clinch’s illustration of it at this conference was creditably clear. The following approaches become capable of increasing expressiveness as we move through the list:

List = items
Glossary = List + Definition
Taxonomy = Glossary + Structure
Ontology = Taxonomy + Relationships (including rules and constraints)
Knowledge graph = Ontology + Instances

The idea is that you work your way through these approaches in order as you increase your maturity. But Dave McComb colourfully declared: “I’m here to disabuse you of that notion.” He said: “Most bad ontologies come from good taxonomies.”

Lego metaphors are overused in this space, so it was refreshing to see Dave McComb use Tinkertoys to describe how ontologies work.

In doing so, he provided some of the clearest explanations of some of the issues you must carefully consider when developing a taxonomy. Chief among them is: “what does the indent mean”? In other words, when you say a concept is narrower than (or a child of) another concept, what do you actually mean?

Is it part of its broader concept?
Does it roll up to its broader concept?
Or is it a “see also” type of relationship?

It reminded me of a great example from Bob Kasenchak on the difference between a taxonomy for navigation, and one for machine-readable classification. The distinction is in whether concepts inherit all the properties of their broader concepts, or whether they are just vaguely about each other.

For example, in many website navigation systems, it might make sense to put Dog food under Dogs. But this makes no semantic sense, because dog food is not a type of dog.

Dave McComb’s message was that we should categorise our things as if they have independent existence, not just as if they are part of one large unwieldy tree.

While I strongly appreciated his point, I also wondered if he wilfully misunderstood the reason why the advice to start with taxonomies exists. To reluctantly drag us back to a Lego metaphor, I wonder if diving into ontologies first is rather like trying to build a complex Lego Technics model before you have worked with Duplo.

Opportunities and limitations of using AI to develop our semantic layer

Inevitably, as well as recognising that stronger semantics are required to ground our artificial intelligence tools, there was a lot of discussion about using AI to help us develop our semantic layers.

Clemency Wright noted that AI can support with automating some routine autotagging tasks and drafting some descriptive text, but it struggles with subject matter expertise and metadata quality.

In a panel discussion with Stephanie Lemieux, Bob Kasenchak and Fran Alexander, they discussed how AI can be most helpful, with a lot of focus on term discovery and definitions.

Bob also enthusiastically said: “You can chat with your graph! You don’t have to learn Sparql.”

That felt like an echo of Noz Urbina’s point from his keynote: “AI should be a thin interface layer over semantic repositories.” That was not the first time I had heard it suggested that AI could become a user interface to our knowledge graphs.

But Bob cautioned against relying on AI for autotagging. He said large language models are not beating autotagging systems that have existed for decades.

Helmut Nagy said we can use knowledge graphs to enhance LLMs, and use LLMs to augment knowledge graphs — but that we need to find the synergies.

Panos Mitzias cautioned against the over-generation and irrelevant results that can be caused by using LLMs to create and validate a taxonomy at once. This means you must split the process into two steps: first generate, then separately validate the results.

I had some interesting chats with Selena Bryant of Indeed during the conference, though unfortunately I didn’t make it to her session. But her slides outlined how AI can help extract entities and connect meaning across contexts. But she also warned that AI lacks understanding of context, reflects biases in the model, and fails to pick up on nuances that humans would understand.

Fran Alexander said: “LLMs are cool but expensive. Older methods may be cheaper and more efficient.” This mirrored some of what I picked up from chatting to some of the sponsors. The AI functions of their tools may be getting the sales push, but they are also by far the most expensive parts.

Information architects as quiet plumbers

I want to end on another running theme that resonated with me: the idea that information architects are quiet plumbers.

In his opening keynote, Ben Clinch expressed his disdain at Geoffrey Hinton’s suggestion that the rise of artificial intelligence means people should become a plumber.

But Teodora Petkova said she heard “plumber” in a positive light. In the early days of the web, Tim Berners-Lee talked about information plumbing, and Teodora says we should think about our information plumbing more than our content.

I had a similar positive reading of the word “plumber”. I have written before about the tendency of some human-centred practitioners to see themselves simultaneously as world saviours and maintainers. I called this the Mario complex.

Aligned to this, in a session I did not attend, Lisa Riemers showed slides that talked about how we should be “quiet architects of collective intelligence”.

She said we should be quiet because if our work is good it will disappear into the background.

We are architects because our best work should go beyond just filing.

And it is about collective intelligence because the point is that a system can know and find things that no individual could alone.

Information architecture’s role in the era of AI

I came away from Taxonomy Boot Camp London with clarity — and excitement — around the importance of information architecture to help turn the challenges of artificial intelligence into opportunities.

I was also struck by the fact that the talking points went far beyond taxonomies, to ontologies, knowledge graphs and context graphs. Part of this will have been because it shared an agenda with KM World.

At around the same time, many of the North America-based folks I follow were posting about the Information Architecture Conference in Philadelphia.

It all made me wonder if there is room for a UK-based event to take a broader focus on the scope of information architecture that will be required if the artificial intelligence era is going to be a success.

It is certainly an interesting time to be working on information architecture.

2 comments on “Grounding AI the webby way — Taxonomy Boot Camp London 2026 takeaways”

Rik Williams

28 May 2026 at 10:03

Such a good, dense, connected read, Duncan. Personally, I found your write-up more useful to me than actually attending Taxonomy Bootcamp London.

And on IAC… in lieu of a 2027 IA event in the UK / Europe… perhaps we should organise an informal sojourn to next year’s IAC?

Loading...

- Duncan Stephen
  
  28 May 2026 at 20:50
  
  Thanks Rik, incredibly kind of you to say that! I’d love to pop over for IAC…
  
  Loading...

Likes

👍 Rik Williams
👍 Kev Mears
👍 James
👍 Dan