How businesses can ethically collect data to power generative AI tools

Rob Mason, CTO of Applause, examines legal and data privacy concerns of generative AI tools, and what companies should do to address them and still deliver great AI experiences for their customers.

ChatGPT and other generative AI tools have thrust AI into the mainstream. Although these useful tools can generate content and creative ideas in moments, saving a lot of time and effort, the data sources they’re based on have come under scrutiny.

With the ability to produce content quickly on a multitude of subjects, generative AI tools have many labour-saving benefits.

While ChatGPT caught people’s attention because it can simplify content creation and writing tasks, generative AI has a broad range of other capabilities. If you shop around, you’ll find art, music, and video generation tools, product design tools, and even tools that simplify coding. We’ve only begun to scratch the surface of generative AI solutions.

Generative AI tools are powered by sophisticated algorithms based on huge amounts of data in the form of text, images, video, or audio files.

While the technology has quickly become very popular, in some cases, data sources and collection practices have been called into question.

Even the best-designed tools can only perform as well as the quality and diversity of the training data they receive.

Also, with such a high volume of content now being generated, questions have arisen about potential privacy violations, copyright infringements, and other ethical and legal implications.

Data collection flaws and privacy concerns

We have seen several notable cases that have exposed flaws in the way training data is being extracted. One case involved an individual who discovered that a sensitive personal photo had been included among the data sets used to train an image-generating AI tool. The image had been used without the person’s consent.

OpenAI, the company behind ChatGPT, has insisted that its data complies with GDPR and other privacy laws.

However, they have chosen not to disclose information about the underlying algorithm and data sets powering the most recent version, GPT-4. They cited competitive reasons for this decision, which then raised questions and concerns about data bias and privacy.

Concerns over data privacy and personal data initially led the data-protection authority in Italy to ban use of ChatGPT until OpenAI implemented changes to satisfy regulators.

Meanwhile, trade unions in Germany representing the creative industries have also expressed concerns about potential copyright infringement, demanding new rules to restrict ChatGPT’s use of copyrighted material.

ChatGPT
© shutterstock/Ascannio

This raises the questions – who, if anyone, owns the copyrights of content generated by AI? And, who is legally responsible for content violations of intellectual property laws?

From a legal standpoint, only humans can hold copyrights. So, who is accountable for any copyright infringements, the AI or those behind it? The human who requested the content? Understandably, these questions remain unanswered.

Examining the ethics of AI

Advances in the space are happening so quickly that regulators are having to play catch-up.

However, the EU’s AI Act, which was recently approved by the European Parliament, places new restrictions on generative AI tools and proposes implementing a ban on copyrighted material used to train language models. Businesses will need to be cautious and thoughtful about ensuring they’re collecting training data legally and ethically.

This can be achieved by making sure your organisation’s Terms and Conditions and privacy policies cover AI training use cases.

Also, if you’re planning to use customer data to train AI, customers must be informed about how the data will be used. The participants need to opt in and agree to provide data that may be used to train AI algorithms.

To eliminate bias, the data should accurately reflect the diversity of the intended customer base and target audience.

Building ethical data sets

To better understand the sentiment around using generative AI tools and chatbots (like ChatGPT), interactive voice response services, and conversational assistants, a survey conducted earlier this year found that a large majority of people believe AI should be regulated.

Over half (53%) said AI should be regulated depending on its use, and 35% said it should always be regulated.

In addition, eight out of ten respondents said they were concerned about the likelihood that inherent bias could affect the accuracy, relevance, or tone of AI-generated content and chatbot responses.

Natural language processing failures can reflect gaps in training data, including limited data from various regional, generational, and ethnic groups. While consent is key, diversity of data and experience are also essential for training AI algorithms. Businesses need to ensure their data sets are diverse and include people with disabilities, different ages, genders, races, and other key demographics.

It’s also important to ask if contributors have granted permission to have their biometrics used to train body or facial recognition technology, voice applications or other AI products.

Diverse and high-quality data produces better experiences

Generative AI tools have the potential to create large amounts of synthetic data which can be used to predict customer behaviour leading to hyper-personalisation and enhanced customer experience.

However, the quality of experience depends on the accuracy of the models and the training data used. The concern on data privacy is real and when hyper-personalisation is overdone it can also lead to negative or inappropriate experiences. The key is to find the balance in using the right data for the right audience.

As generative AI continues to be used for an ever-growing number of new and different use cases, it is essential to ensure the quality and integrity of the experiences. Essentially, even the best-designed systems only perform as well as the quality and diversity of the training data they receive.

Businesses that focus on ethically collecting and training algorithms with diverse, quality data will release great AI experiences for their customers, and will be safe in the knowledge that they’re doing what’s right for all users.

Contributor Details

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Featured Topics

Partner News

Advertisements

Media Partners

Similar Articles

More from Innovation News Network