Authors:
Kristen Chung, Kirk Goldman, Theodore Clark, Manda Miller, Sam Champagne, Pranit Kotgire, Levent Kacan
Abstract:
This solution proposes a method to minimize bad data and inputs into in-house company Generative AI models. This proposal outlines a way to continuously audit and curate a company’s data so that the data feeding the in-house gen AI tool is up-to-date and accurate, by including/eliminating a data source based on metadata parameters and company crowd-sourced inputs.
Background:
The issue is how do you insure the data feeding into a generative AI tool is accurate and up to date. This idea outlines a way to continuously audit and curate a company’s data so that the data feeding the in-house gen AI tool is up-to-date and accurate, by including/eliminating a data source based on metadata parameters and company crowd-sourced inputs.
Description:
The disclosure requires the following actions to deliver the idea:
1. Consistent information collected in the metadata of each file within the business, such as but not limited to:
a. Author: The name of the original author
b. Subject: A topic or keyword that identifies the document's contents
c. Title: The name of the document
d. Creation date: When the document was created
e. Last saved: When the document was last saved
f. Last printed: When the document was last printed
g. Last opened: When the document was viewed
2. A generative AI tool that can tag sources and save metadata of those sources used in gen AI produced content and present those sources to a generative AI tool user
3. The information contained in the metadata of the sources used in gen AI content can be presented to a gen AI tool user in the form of a table with the option to “include” and “eliminate” that content from the gen AI produced output
a. Ex. Table of sources used to generate a presentation on elevator safety
Author |
Subject |
Title |
Creation Date |
Include/ Eliminate |
Cherie Berry |
Keeping Elevators Safe |
How to make Elevators go super-fast |
April 1, 2023 |
Include |
Taylor Swift |
Potential Ours lyrics for Speak Now |
Elevator buttons in morning air Strangers silence makes me wanna take the stairs |
April 1, 2010 |
Eliminate |
4. The ability of the user to “re-run” the generative AI prompt with the new rules around what sources can be “included”. In the case above, as a user, I don’t want to use the elevator reference in Taylor Swift’s file for my Elevator Safety presentation, so I have chosen “Eliminate”
5. Reiterate the above process until all sources have been validated and accepted as “included”.
6. The generative AI tool can inherently discontinue the use of any company files that meet certain parameters, such as:
a. Crowd-sourced content:
i. Have been “eliminated” from content at least X number of times
ii. Author has been flagged X number of times by other users
b. Metadata parameters such as: not opened for one year, created over one year ago, or written by flagged Authors that aren’t within the company
TGCS Reference 00166