Technology is reaching a point where it can nearly create fake video and audio content in real-time from handheld devices like smartphones.
In the near future, you will be able to Facetime someone and have a real-time conversation with more than just bunny ears or some other cartoon overlay. Imagine appearing as a person of your choosing, living or deceased, with a forged voice and facial expressions to match.
Image Source: http://niessnerlab.org/projects/thies2016face.html
It could be very entertaining for innocent use. I can only image the number of fake Elvis calls. Conversely, it could be a devastating tool for those with malicious intent. A call to accounting from the ‘boss’ demanding a check be immediately cut to an overseas company as part of a CEO Fraud or your manager calling to get your password for access to sensitive files needed for an urgent customer meeting.
Will digital media forgery undermine trust, amplify ‘fake’ news, be leverage for massive fraud, and shake the pillars of digital evidence? Will there be a trust crisis?
Tools are being developed to identify fakes, but like all cybersecurity endeavors it is a constant race between the forgers who strive for realism and those attempting to detect counterfeits. Giorgio Patrini has some interesting thoughts on the matter in his article: Commoditisation of AI, digital forgery and the end of trust: how we can fix it. I recommend reading it.
Although I don’t share the same concerns as the author, I do think we will see two advancements which will lead to a shift in expectations.
1. The fidelity of fake voice + video will increase to the point that humans will not be able to discern the difference between authentic and real. We are getting much closer. The algorithms are making leaps forward at an accelerating pace to forge the interactive identity of someone else.
2. The ability to create such fakes in real-time will allow complete interaction between a masquerading attacker and the victims. If holding a conversation becomes possible across broadly available devices, like smartphones, then we would have an effective tool on a massive scale for potential misuse.
Three dimensional facial models can be created with just a few pictures of someone. Advanced technologies are overlaying digital faces, replacing those of the people in videos. These clips, dubbed “Deep Fakes” are cropping up to face swap famous people into less-than-flattering videos. Recent research is showing how AI systems can mimic voices with just a small amount of sampling. Facial expressions can be aligned with audio, combining for a more seamless experience. Quality can be superb for major motion pictures, where this is painstakingly accomplished in post-production. But what if this can be done on everyone’s smartphone at a quality sufficient to fool victims?
Continuation along this trajectory of these two technical capabilities will result in a loss of confidence for voice/video conversations. As people learn not to trust what they see and hear, they will require other means of assurance. This is a natural response and a good adaptation. In situations where it is truly important to validate who you are conversing with, it will require additional authentication steps. Various options will span across technical, process, behavioral, or a combination thereof to provide multiple factors of verification, similar to how account logins can use 2-factor authentication.
As those methods become commonplace and a barrier to attackers, then systems and techniques will be developed to undermine those controls as well. The race never ends. Successful attacks lead to a loss in confidence, which results in a response to institute more controls to restore trust and the game begins anew.
Trust is always in jeopardy, both in the real and digital worlds. Finding ways to verify and authenticate people is part of the expected reaction to situations where confidence is undermined. Impersonation has been around since before antiquity. Society will evolve to these new digital trust challenges with better tools and processes, but the question remains: how fast?
Leave your comments
Post comment as a guest