The content of this post is solely the responsibility of the author. LevelBlue does not adopt or endorse any of the views, positions, or information provided by the author in this article.
As a natural language processing model, ChatGPT - and other similar machine learning-based language models - is trained on huge amounts of textual data. Processing all this data, ChatGPT can produce written responses that sound like they come from a real human being.
ChatGPT learns from the data it ingests. If this information includes your sensitive business data, then sharing it with ChatGPT could potentially be risky and lead to cybersecurity concerns.
For example, what if you feed ChatGPT pre-earnings company financial information, company proprietary software codeor materials used for internal presentations without realizing that practically anybody could obtain that sensitive information just by asking ChatGPT about it? If you use your smartphone to engage with ChatGPT, then a smartphone security breach could be all it takes to access your ChatGPT query history.
In light of these implications, let's discuss if - and how - ChatGPT stores its users' input data, as well as potential risks you may face when sharing sensitive business data with ChatGPT.
Does ChatGPT store users’ input data?
The answer is complicated. While ChatGPT does not automatically add data from queries to models specifically to make this data available for others to query, any prompt does become visible to OpenAI, the organization behind the large language model.
Although no membership inference attacks have yet been carried out against the large language learning models that drive ChatGPT, databases containing saved prompts as well as embedded learnings could be potentially compromised by a cybersecurity breach. OpenAI, the parent company that developed ChatGPT, is working with other companies to limit the general access that language learning models have to personal data and sensitive information.
But the technology is still in its nascent developing stages - ChatGPT was only just released to the public in November of last year. By just two months into its public release, ChatGPT had been accessed by over 100 million users, making it the fastest-growing consumer app ever at record-breaking speeds. With such rapid growth and expansion, regulations have been slow to keep up. The user base is so broad that there are abundant security gaps and vulnerabilities throughout the model.
Risks of sharing business data with ChatGPT
In June 2021, researchers from Apple, Stanford University, Google, Harvard University, and others published a paper that revealed that GPT-2, a language learning model similar to ChatGPT, could accurately recall sensitive information from training documents.
The report found that GPT-2 could call up information with specific personal identifiers, recreate exact sequences of text, and provide other sensitive information when prompted. These “training data extraction attacks” could present a growing threat to the security of researchers working on machine learning models, as hackers may be able to access machine learning researcher data and steal their protected intellectual property.
One data security company called Cyberhaven has released reports of ChatGPT cybersecurity vulnerabilities it has recently prevented. According to the reports, Cyberhaven has identified and prevented insecure requests to input data on ChatGPT’s platform from about 67,000 employees at the security firm’s client companies.
Statistics from the security platform cite that the average company is releasing sensitive data to ChatGPT hundreds of times per week. These requests have presented serious cybersecurity concerns, with employees attempting to input data that includes client or patient information, source codes, confidential data, and regulated information.
For example, medical clinics use private patient communication software to help protect patient data all the time. According to the team at Weave, this is important to ensure that medical clinics can gain actionable data and analytics so they can make the best decisions while ensuring that their patients’ sensitive information remains secure. But using ChatGPT can pose a threat to the security of this kind of information.
In one troubling example, a doctor typed their patient’s name and specific details about their medical condition into ChatGPT, prompting the LLM to compose a letter to that patient’s insurance company. In another worrying example, a business executive copied the entire 2023 strategy document of their firm into ChatGPT’s platform, causing the LLM to craft a PowerPoint presentation from the strategy document.
Data exposure
There are preventive measures you can take to protect your data in advance and some companies have already begun to impose regulatory measures to prevent data leaks from ChatGPT usage.
JP Morgan, for example, recently restricted ChatGPT usage for all of its employees, citing that it was impossible to determine who was accessing the tool, for what purposes, and how often. Restricting access to ChatGPT altogether is one blanket solution, but as the software continues to develop, companies will likely need to find other strategies that incorporate the new technology.
Boosting company-wide awareness about the possible risks and dangers, instead, can help make employees more sensitive about their interactions with ChatGPT. For example, Amazon employees have been publicly warned to be careful about what information they share with ChatGPT.
Employees have been warned not to copy and paste documents directly into ChatGPT and instructed to remove any personally identifiable information, such as names, addresses, credit card details, and specific positions at the company.
But limiting the information you and your colleagues share with ChatGPT is just the first step. The next step is to invest in secure communication software that provides robust security, ensuring that you have more control over where and how your data is shared. For example, building in-app chat with a secure chat messaging API ensures that your data stays away from prying eyes. By adding chat to your app, you ensure that users get context-rich, seamless, and most importantly secure chat experiences.
ChatGPT serves other functions for users. As well as composing natural, human-sounding language responses, it can also create code, answer questions, speed up research processes, and deliver specific information relevant to businesses.
Again, choosing a more secure and targeted software or platform to achieve the same aims is a good way for business owners to prevent cybersecurity breaches. Instead of using ChatGPT to look up current social media metrics, a brand can instead rely on an established social media monitoring tool to keep track of reach, conversion and engagement rates, and audience data.
Conclusion
ChatGPT and other similar natural language learning models provide companies with a quick and easy resource for productivity, writing, and other tasks. Since no training is needed to adopt this new AI technology, any employee can access ChatGPT. This means the possible risk of a cybersecurity breach becomes expanded.
Widespread education and public awareness campaigns within companies will be key to preventing damaging data leaks. In the meantime, businesses may want to adopt alternative apps and software for daily tasks such as interacting with clients and patients, drafting memos and emails, composing presentations, and responding to security incidents.
Since ChatGPT is still a new, developing platform it will take some time before the risks are effectively mitigated by developers. Taking preventive action is the best way to ensure your business is protected from potential data breaches.