Fiona Bradley - From novelty to the new normal: the current and future state of generative AI

Keynote presentation at the Association of Australasian Parliamentary Libraries Conference, held at the Parliament of NSW, 1 February 2024.

Hello – delighted to be invited to talk to you today. It’s been about 18 months since the latest rollercoaster of interest and hype in AI began. For some, rollercoasters are exciting and novel experiences, for others they are terrifying. For anyone trying to follow what’s happening in AI and what it means for how we access and use information and what it means to libraries, it’s much the same experience. It’s been challenging to separate what’s truly exciting from what’s terrifying. It can seem at times that the ups and downs of a handful of influential companies is having an outsized impact on everything from the state of the English language to jobs and stability of society. How much of this is real right now and how much is exaggerated? What does it mean for libraries and access to quality information?

I’ll give you my response right up front: While the current era of generative AI is still very new, librarians have been encountering and working with some areas of AI for a long time. This should give us confidence in the roles we can play in supporting ethical, trustworthy development and use of AI. With that said, researchers, librarians, policymakers are all still trying to figure out what impact AI has already had and will have on the ways we access, understand, and use information. Precisely because of our longstanding roles in providing access to quality information, developing information skills, information synthesis – we are also acutely aware of some of the threats to the information environment that can be amplified by new tools and technologies.

So if that’s what I believe our response should be, let’s scroll back a little bit to go over what generative AI is and what the big issues are. When we’re talking about AI in the context of libraries, we’re mostly talking about generative AI. There are hundreds of different methods and branches of AI, and generative AI is just one of them. It involves generation of text, audio, video using models that are trained on large amounts of inputs. One of the first, and most significant examples of this was a research project called ImageNet, led by Fei-Fei Li. Projects like ImageNet, and later on the wider availability of tools like TensorFlow and other datasets laid the groundwork for an explosion of interest in research on image classification and text analysis. These have applications in research areas like developing autonomous lawn mowers, to identifying plant species. Over the last decade or so, many libraries, especially national libraries, have been doing work on the concept of collections as data. Collections as data is about making available catalogue records and sometimes also digitised collections, often as linked data, and seeing what uses other researchers can make of them. Another popular application of machine learning in library collections is the use of OCR to recognise handwriting in old manuscripts. And the National Library Board in Singapore has just launched a new pilot and immersive experience called StoryGen that,

“uses Generative AI to transform stories in text form into a visual and multimedia experience. Users can exercise their creativity to present well-loved stories in their own ways. For example, they can see a scene of the boy and the garfish in Sejarah Melayu come alive through images generated through AI, or take it further and present it in another genre like sci-fi.”

A further area of AI that is used a lot of libraries is NLP – Natural Language Processing – which can include looking at unstructured documents for popular terms, trends, sentiment analysis etc. In my own research, I frequently work with Named Entity Recognition to extract meaning from texts and use them for further analyses to identify relationships between different people and organisations. These sorts of methods are also used in some of the literature searching tools that libraries subscribe to and provide to our clients. This is the first point I’d like you to keep in mind – AI has been used in some libraries and systems we use for some time. What’s important is to understand how AI is implemented in these services, and how our personal and browsing data is used.

Something all these projects have in common is that they all require larger and larger sets of inputs, or training data, to make models more accurate. Examples of some specialised datasets can be found on Wikipedia. These datasets include research publications, government data, social media, music and movie ratings, and other sources for learning. Overcoming barriers to accessing large corpora of data, text and other inputs for non-profit research is one of the reasons why librarians and researchers in some countries, especially in Europe, have been advocating for amendments to copyright law to create more flexible exceptions and limitations, including for text and data mining. Text and data mining, or TDM, is effectively a pre-requisite for getting legal access to a large corpus for analysis. And this is the second point I’d like you to keep in mind – what impact do debates about licensing and copyright mean for AI and what we, and AI companies alike can do in the future?

Generative AI launched into the public consciousness after the launch of OpenAI’s ChatGPT, Google’s Bard, Meta’s LLaMA and many other large language models. These launches were followed by a torrent of other product announcements and add-ons. Lately it can seem like AI is everywhere. I bought a new rice cooker last year that says it’s got AI. I’m pretty sure it’s just a new term for fuzzy logic. Generative AI has moved very quickly from a standalone service to being integrated with everyday word processing, spreadsheet and office software. It feels like there is a real drive to normalise and integrate AI into everything, very quickly. Yet, there is a sense partially because of a “black-boxing” effect, and partially because skills and literacies take time to develop, that we don’t really understand how these technologies work, why and when they make mistakes, and what the consequences are.

Some recent claims that turned out to be hype at best, misleading at worst:

The call for a six month pause on development of large language models warning about threats to society, backed by a group of tech founders. Some of whom stood to profit from such a pause. The letter was heavily criticised by AI ethicists.
Scary threats about the potential near future of sentient AGI and frontier AI, much of this covered at the AI Safety Summit in the UK and the Bletchley Declaration (which the Australian government has signed)
Claims that generative AI would be able to summarise all of human knowledge and take out menial work of data cleansing analysis, reading etc analysing and interpreting research eg for policy. There are plenty of tools available now that claim to be able to do just this. Quality varies. Widely.

But setting aside the hype, it’s important to remember the reality – as many researchers have pointed out, including the DAIR, Distributed AI Research Institute, there are real world harms that exist now. Some of these include harms to people who work for low pay to manually classify images and data. Other harms include creators whose work is used without attribution or compensation. Fei-Fei Li, whose pioneering image classification work on ImageNet I mentioned earlier, reminds us that,

“First, to be clear, AI is”promising” nothing. It is people who are promising – or not promising. AI is a piece of software. It is made by people, deployed by people and governed by people.”

Generative AI as it currently exists is very good at some things, it’s also not very good at some things. Knowing the difference is key. AI is being used extensively in research for specific tasks like better weather predictions, meta analyses of past research studies, and literature reviews. Mostly, AI is speeding up work that used to be done manually by junior researchers. In my own research, it speeds up the process of ‘reading’ many texts and looking for the names of organisations and people. But no matter how generative AI is applied, expert knowledge of a field and its tools are still essential. Generative AI is nowhere near replacing human insight yet. Most of you have probably tried out general purpose writing tools like Bard or ChatGPT by now. Maybe you’ve tried generating images in Google Sheets or Midjourney, or attended an event with live captioning. Think about your own experience in using those tools. They are ok for an outline or some questions that you may have previously entered into a search engine. Their answers were probably ok, but not great. Instant image translations might get the gist, but they can also be wildly inaccurate. At the same time, some more specialised areas of AI research and industry have been making incredible progress in improving accessibility for people with visual impairments and other sensory loss. For many reasons therefore, has been some recent analysis suggesting that business interest in generative AI is waning a little bit, for now,

“Big tech firms love the technology, but are going to struggle to find customers for the products and services that they have spent tens of billions of dollars developing. It would not be the first time in recent history that technologists have overestimated demand for new innovations. Think of the metaverse.

The second interpretation is less gloomy, and more plausible. Adoption of new general-purpose tech tends to take time.”

For regulators also, there’s a lot at stake. I started researching the regulatory environment around AI in 2019 and the changes since that time have been rapid and complex. Getting the balance right between allowing industries and novel applications to flourish, while ensuring there are guardrails around the potentially damaging and harmful applications of AI is a very difficult task. Anticipatory governance is one way of thinking about this challenge – regulate too early and you may stifle innovation. Too late, and regulators can only play catch-up. At a higher political level are debates about the influence and power a small number of technology companies can or may wield and what this says about the state of geopolitics today and in the future. So significant is this shift that some suggest a major step change is needed in how policymakers engage with such actors when designing regulation,

“Like past technological waves, AI will pair extraordinary growth and opportunity with immense disruption and risk. But unlike previous waves, it will also initiate a seismic shift in the structure and balance of global power as it threatens the status of nation-states as the world’s primary geopolitical actors. Whether they admit it or not, AI’s creators are themselves geopolitical actors, and their sovereignty over AI further entrenches the emerging”technopolar” order—one in which technology companies wield the kind of power in their domains once reserved for nation-states.” (Bremmer and Suleyman, 2023)

But we’re not yet there. To date, concepts like safe and responsible, ethical, trustworthy, transparent AI have emerged as policy responses. Australia has policy and regulation at the state and federal level and has consulted widely on some of these. I contributed to two of ALIA’s responses last year together with colleagues from across the sector. An interim response to one of the consultations on responsible AI was published in mid-January 2024. Topics for libraries to pay attention include intentions to further amend privacy legislation, and to introduce regulations around misinformation and disinformation. Expectations that classroom teachers will be able to provide guidance to students on the use of generative AI are well-meaning, but there is still a critical need to provide skills to educators, teachers and librarians to do this with confidence. This is a skillset that is needed across the entire library sector.

A particular challenge when it comes to generative AI, as mentioned previously, is that a lot of attention is on the actions of a small number of companies – Alphabet/Google, Meta, and OpenAI, which has a significant investment by Microsoft. Yet it’s important to note that many of the questions about regulating AI are not new. They are the same ones, involving many of the same players, when we think about questions of social media, news media bargaining, eSafety and so on. At the centre of so many debates AI are the same questions that come up time and time again that are fundamental to us as librarians, our users, and the people that rely on quality information to make decisions:

How do we maximise the benefits of new technologies and means of disseminating information, while reducing the harms?
Who gets to profit from the work of others? This question is at the heart of many lawsuits launched by media companies, artists, writers, and rightsholders. OpenAI have acknowledged that ChatGPT would not be possible to create if they had to licence (and pay for) all the content that was ingested, legally or otherwise.
How do we make sure that legitimate uses for research, education, news – will not be damaged by efforts to deal with large scrapers?

One of the other issues that we’ve seen before that has come up again in questions about AI is about the use of algorithms. Again, many librarians have some familiarity with algorithms. Many of the databases and search engines we subscribe to for access to news reports or research publications and library catalogues have made use of recommender systems and ranking algorithms for many years. These systems work when well the underlying information is well described, has good metadata, and has been configured well. But when we add AI, there is a concern that volume of information being generated by these tools combined with a tendency of some, but not all, social media algorithms to recommend more extreme content has many worried that we could start to see a massive increase in the creation and dissemination of misinformation. This is setting off alarm bells especially in 2024 which is being called one of the most significant years for electoral democracies. So how is our knowledge of library databases and catalogues helpful here? It gives us a strong appreciation of the fact that attempts by regulators and activists to push companies to provide explainable AI or algorithmic transparency are quite difficult to achieve in practice.

So I said at the beginning that libraries have a lot to bring to the table when it comes to shaping the future of ethical and trustworthy AI. Some of the areas libraries are working on:

Bringing together experience in media and information literacy, and how this adapts to AI literacy
Experience in copyright reform, licensing, and TDM to shape a legal framework for AI that respects creators while ensuring sufficient flexibility for research, education, and related uses
Experimenting with AI – in collections, building new experiences for library users, and designing new ways to interact with services. In the IFLA AI Special Interest Group, we have been developing resources to help librarians to learn more about AI. AI4LAM is another active community with many members in Australia. If you have the time, I encourage you to have a look and try out a few tools.

In a recent government consultation on responses to responsible AI, ALIA and other organisations called out the need for AI literacy, a joined up approach across sectors, called for libraries to be involved, and to consider impacts on First Nations communities and explore opportunities for labelling, citation, and acknowledgement when works are AI generated.

Open Access Australasia’s response to a consultation on generative AI in education noted impacts on copyright reform – we don’t have TDM exceptions but big tech wants them, licensing, potential unintended consequences for open access to research, research integrity, and the concentration of power and resources among a small number of companies.

The question of trusted information is significant. When it comes to integrity, we’re already seeing AI be used for different purposes. On one hand, AI is being used to help detect research fraud, such as image manipulation. On the other, it’s now trivial for anyone to edit their photos to erase elements they don’t like. As Timnit Gebru said on Mastodon just last week, responding to a a quote within a press release from Google Pixel,

“Google’s new phones now use AI to let you edit photos to a degree never seen before, exchanging sad faces for happy ones and overcast afternoons for perfect sunsets.” Ok so do we want photos to remind us of that day, or makeup how the day actually went?

It’s not so far to jump from that scenario to the fears many have about the potential for deepfakes to be manufactured on a grand scale for more sinister purposes, such as to influence the outcomes of an election. This is where debates about how to use technology for good comes in. For example, there’s a lot of debate about how to apply watermarks or authoritative metadata to images, documents, and other digital objects to track the provenance and when it is altered.

A lot of the other work I do is in advocating for open access to research publications, so they can be accessed by policymakers, researchers outside universities, and the general public. One of the debates we are having in that community is how do we also help support access to trusted information, and ethical reuse. Again, this comes back to a conversation about who should benefit from the efforts of other researchers – what impacts might this have for licensing, use of open content, and so on. How do we use these discussions to also further issues around integrity and trust in research and quality information. It’s very early days and there are no clear answers yet. A point that is regularly made by decisionmakers at all levels is that relationships remain important in helping people to find and make sense of the right, trusted information – and this is where librarians remain essential to providing expertise to help people find and understand sources.

As I quoted before, Fei-Fei Li said something that is true of both AI and every technology that has ever been built: AI is, “made by people, deployed by people and governed by people.” This also means AI is neither all good or all bad and should give us confidence that there are ways we can use it for good, and ways we can work together across our libraries and across our sector to shape it. We’ve been using some aspects of AI in our systems and services for a while, but there is still much more to learn about how AI is applied and the intersections with copyright and licensing that will impact what we can provide, to whom, and how much it might cost. Regulation is developing quickly, and we have a lot to contribute to debates about how to create guardrails around AI to ensure it is designed and used ethically, and protects rather than reduces privacy. We don’t know quite yet where generative AI tools will be another 18 months on from now, but one thing for sure is that they will be everywhere which gives us time to experiment. What was novelty not so long ago is quickly becoming the new normal.