Virtual Help icon Virtual Help

  • Chat with library staff now
  • Contact your library
Skip to Main Content

Privacy, Security, Bias, & Hallucinations: Module 4 of 6

Generative AI tools can enhance productivity and creativity, but they also come with risks related to data privacy, information security, and inherent biases. This page explores how these tools handle user data, the potential for inaccurate or misleading content (hallucinations), and the ways biases can affect AI outputs. Knowing these aspects helps you navigate AI tools with a better understanding of their limitations and ethical considerations.

 

Accuracy & Misinformation

Since GenAI is trained on real-world data, text, and media from the internet, the content it provides to users may be misleading, factually inaccurate, or outright misinformation (like deep fakes, for example). Because it’s unknown exactly where the data used to train AI originates and AI cannot specify its sources, its output may not be credible or reliable for academic use. The information provided may be implicitly or explicitly biased, outdated, or a “hallucination.”

AI hallucination: Gen AI fabricating sources of information even though it is meant to be trained on real-world data. IBM (n.d.) examines the various causes of AI hallucinations, indicating that common factors include “overfitting, training data bias/inaccuracy and high model complexity.”

To avoid using or spreading misinformation, verify the accuracy of AI-generated content using reliable sources before including it in your work.

Examples:
  • Incorrect predictions: An AI model may predict that an event will occur when it is unlikely to happen. For example, an AI model that is used to predict the weather may predict that it will rain tomorrow when there is no rain in the forecast.
  • False positives: When working with an AI model, it may identify something as being a threat when it is not. For example, an AI model that is used to detect fraud may flag a transaction as fraudulent when it is not.
  • False negatives: An AI model may fail to identify something as being a threat when it is. For example, an AI model that is used to detect cancer may fail to identify a cancerous tumor.

Online Security & Privacy

Like other digital tools, generative AI tools collect and store data about users. Signing up to use generative AI tools allows companies to collect data about you. This data can be used to make changes to tools to keep you engaged.

User data may also be sold or given to third parties for marketing or surveillance purposes. When interacting with AI tools, you should be cautious about supplying sensitive information, including personal, confidential, or proprietary information or data.

Examples:
Data Collected by Bing Chat/Copilot when Using a Personal Account
  • Bing Chat collects user data, including prompts and chat interactions.
  • In the Edge browser’s sidebar, it gathers data from content loaded in the browser, such as PDFs.
  • Data is shared with Microsoft affiliates and subsidiaries.
  • Data may be shared with 3rd parties such as OpenAI.
  • Data is processed and stored outside Canada.
How Privacy is Affected when Using a Personal Account
  • IP Addresses are stored for six months and other identifiers for up to 18 months.
  • Users may not be able to delete this data.
  • Authentication is unnecessary, so users can use this without logging in.

Check whether you can use your college credentials to sign into apps like Copilot to ensure your data and privacy remain secure.

ChatGPT collects information/data from users via the following points:

Registration: Registering for a ChatGPT account requires users to provide their name, email address, and birthday. OpenAI's Privacy Policy- opens in a new tab states that your personal information may be shared with third parties without further notice to you. Data is stored outside Canada.

Prompts/Forms: Users interact with ChatGPT by asking questions, writing prompts, or uploading files. Do not upload, input, or disclose any personal information. Do not upload, input, or disclose any information that should not be made public.

Training Data: OpenAI uses data to train ChatGPT on how to respond. This includes anything you write or upload to ChatGPT, so you should not input any information you would not want to be made public. Learn how to opt out of having your data used for training- opens in a new tab.

Note: Always be sure to review the terms and conditions of any application they use and be aware that these agreements often include a clause allowing the company to modify the terms at any time.

Tell-Tale Signs of AI Generated Text

Tell-Tale Signs of AI Generated Text

Generic Style

Repetitive writing style that lacks a unique voice or perspective, and lacks specific details  

Missing Context

Lacks detailed information or a nuanced understanding of your specific topic 

Missing or Fake Sources

Citations are missing or the AI Tool may fabricate (e.g., “hallucinate”) citations. 

Overuse of Jargon

Overly reliant on certain works that are not commonly used in everyday language  

Inconsistencies

Statements may contradict one another or may be completely unrelated to the topic 

Outdated Information

Information is not always up-to-date and might contain inaccuracies 

Bias & Discrimination

One may think that technology is objective and neutral. Generative AI, however, is trained on real-world data and information, such as images and text scraped from the internet. This information is rife with human biases. Human biases may be embedded in the AI model during its creation, and biases in the datasets used for training can influence how it generates content. Additionally, AI can develop its own biases based on how it interprets the data, and user input may inadvertently guide it toward biased responses.

AI Bias: "also referred to as machine learning bias or algorithm bias, refers to AI systems that produce biased results that reflect and perpetuate human biases within a society" (IBM Data and AI Team, 2023). Some common biases include gender stereotypes and racial discrimination.

Recognizing these factors is essential for critically evaluating AI-generated content. By understanding potential biases in the data, model design, and user input, you can better assess the credibility and accuracy of the AI's output.

Examples:
  • Healthcare: Underrepresented data of women or minority groups can skew predictive AI algorithms. For example, computer-aided diagnosis (CAD) systems have been found to return lower accuracy results for black patients than white patients.
  • Applicant tracking systems: Issues with natural language processing algorithms can produce biased results within applicant tracking systems. For example, Amazon stopped using a hiring algorithm after finding it favored applicants based on words like “executed” or “captured,” which were more commonly found on men’s resumes.
  • Online advertising: Biases in search engine ad algorithms can reinforce job role gender bias. Independent research at Carnegie Mellon University in Pittsburgh revealed that Google’s online advertising system displayed high-paying positions to males more often than to women.
  • Image generation: Academic research found bias in the generative AI art generation application Midjourney. When asked to create images of people in specialized professions, it showed both younger and older people, but the older people were always men, reinforcing gendered bias of the role of women in the workplace.
  • Predictive policing tools: AI-powered predictive policing tools used by some organizations in the criminal justice system are supposed to identify areas where crime is likely to occur. However, they often rely on historical arrest data, which can reinforce existing patterns of racial profiling and disproportionate targeting of minority communities.

Try It with the Most Likely Machine!

Get hands-on experience with algorithm bias in this quick activity by The Artefact Group- opens in a new tab. Who will win the awards at Millennium Middle School? Will your predictions on the award winners align with the algorithm used by the Most Likely Machine to pick the winners?

Try the Algorithm Bias Activity- Opens in a new tab

Attributions

Information on this page was adapted, with permission from "Misinformation, Online Security and More- opens in a new tab" by Conestoga Library & Learning Services. Along with information from "Writing Support- opens in a new tab" by Seneca Polytechnic Libraries and "Ethical Considerations- opens in a new tab" by Fleming College.


Conestoga logo
Seneca logo
Fleming logo
With Additional Information from: