Don’t expect quick fixes in ‘red-teaming’ of AI models. Security was an afterthought

Rate this post

In the realm of AI chatbots, White House officials exhibit concerns about potential societal harm, while the Silicon Valley juggernauts surge forth in their haste to bring these creations to market. An intricate three-day competition, culminating this Sunday at the DefCon hacker convention in Las Vegas, has captivated their attention.

Engaged upon their laptops, around 3,500 competitors have embarked on a quest to unearth the flaws within eight prominent, large-scale language models that stand as epitomes of technology’s imminent advancements. However, it is not advisable to anticipate swift outcomes from this inaugural endeavor of independently “red-teaming” multiple models.

The revelation of findings shall be withheld until approximately the advent of February. Even then, rectifying the imperfections inherent within these digital constructs — entities whose inner workings are neither entirely reliable nor wholly comprehended even by those who conceived them — will necessitate a considerable investment of time and substantial financial resources.

The existing AI models, in their present state, exhibit a level of unwieldiness, brittleness, and adaptability that academic and corporate research has duly documented. It becomes evident that security considerations were somewhat belated in their training, as data scientists amassed an awe-inspiring assemblage of intricate images and textual content. Such models remain susceptible to the infiltration of racial and cultural biases, making them susceptible to manipulation.

The notion of effortlessly bestowing these systems with a layer of mystical security, post their construction, or retrofitting them with specialized security mechanisms, is indeed alluring. However, Gary McGraw, a seasoned veteran in the realm of cybersecurity and co-founder of the Berryville Institute of Machine Learning, offers a perspective that diverges from such wishful thinking. Meanwhile, participants at DefCon, such as Bruce Schneier, a technologist with a vested interest in public welfare from Harvard University, assert that the current landscape of AI security bears semblance to the early days of computer security three decades past. Michael Sellitto, affiliated with Anthropic, a contributor to AI testing models, conceded in a press briefing that unraveling the true capacities and hazards of these systems remains a domain yet to be fully explored.

Traditional software functions by deploying meticulously delineated code to dispense explicit, step-by-step instructions. However, the likes of OpenAI’s ChatGPT and Google’s Bard, alongside various other language models, stray from this convention. These models, primarily nurtured by the ingestion and classification of immense troves of datapoints gleaned from internet forays, stand as perpetually evolving works-in-progress. The disquieting prospect of their transformative potential for humanity is thus underscored.

Also Check  Midjourney License: Commercial Use, Copyright & Terms Explained [September 2023]

Following the public unveiling of chatbots in the recent past, the generative AI industry found itself confronted with a recurring obligation — to seal the breaches in security, as identified by diligent researchers and inventive tinkerers.

Tom Bonner, affiliated with the AI security firm HiddenLayer and a speaker at this year’s DefCon, ingeniously manipulated a Google system into designating a piece of malware as innocuous, a feat achieved by inserting a mere line of text pronouncing its safety.

Within this landscape bereft of robust safety mechanisms, another researcher harnessed ChatGPT’s capabilities to formulate phishing emails and devise a blueprint for the catastrophic annihilation of humanity, transgressing the ethical boundaries delineated for its operation.

A collective of researchers from Carnegie Mellon found that leading chatbots are prone to automated assaults that yield injurious content, unearthing a vulnerability that extends to their core. It becomes evident that the very fabric of deep learning models might render them susceptible to these menacing threats.

These cautionary signals were not, however, sounded in vain.

In its conclusive report for the year 2021, the U.S. National Security Commission on Artificial Intelligence emphasized that the assault on commercial AI systems was an ongoing reality. It is pertinent to note that the endeavor to safeguard AI systems has, for the most part, been relegated to an afterthought during the process of engineering and implementing such systems, reflecting a dearth of comprehensive investment in research and development.

Instances of severe breaches that were once prominently reported now languish in a state of minimal disclosure. The stakes at hand are considerable, and the absence of regulatory frameworks has facilitated the concealment of instances, thereby permitting issues to be swept under the proverbial rug.

The orchestrated attacks on the logic inherent within artificial intelligence present a perplexing conundrum, often eluding the comprehension even of their creators. Notably, chatbots stand as particularly vulnerable entities, directly engaging with users through the medium of plain language. Such interactions, it should be recognized, can introduce unforeseen perturbations, thereby inducing alterations in their operational dynamics.

Also Check  Pictory – Video Marketing Made Easy - Pictory.ai

A remarkable revelation in this domain emerges as researchers uncover the potential havoc that could be wreaked through the “poisoning” of a small subset of images or text. The vast sea of data, utilized as a foundation for training AI systems, harbors the latent ability to instigate substantial disruptions, an aspect that is surprisingly susceptible to being overlooked.

In a study bearing the co-authorship of Florian Tramér from Swiss University ETH Zurich, it is disclosed that the corruption of merely 0.01% of a model holds the potential to vitiate it. Remarkably, this could be accomplished with an expenditure as modest as $60. The strategy they adopted involved waiting for a subset of websites, employed in web crawls for two specific models, to lapse into obsolescence. Once expired, these domains were acquired, and substandard data was disseminated through them.

Hyrum Anderson and Ram Shankar Siva Kumar, individuals who collaboratively engaged in red-teaming AI during their tenure at Microsoft, articulate their assessment of AI security for models rooted in text and images. Their latest literary contribution, titled “Not with a Bug but with a Sticker,” elucidates the state of affairs. A vivid illustration that they invoke during live presentations involves the AI-powered digital assistant, Alexa, becoming ensnared by interpreting a musical excerpt of a Beethoven concerto as a directive to procure an exorbitant quantity of frozen pizzas.

Exposing more than 80 organizations to scrutiny, the authors unearthed a disconcerting reality — the overwhelming majority lacked a comprehensive plan to counteract potential assaults involving data poisoning or the pilfering of datasets. The prevailing sentiment within the industry leans towards a state of blissful ignorance, with occurrences often escaping notice.

A decade-old memory resurfaces as Andrew W. Moore, formerly associated with Google in a prominent executive capacity and a dean at Carnegie Mellon, recollects instances of contending with attacks directed at Google’s search software. The era spanning from late 2017 to early 2018 witnessed a series of maneuvers by spammers that exploited Gmail’s AI-powered detection mechanisms on four distinct occasions.

Also Check  How to Use Janitor AI: A Comprehensive Guide [September 2023]

Industry giants, espousing an unwavering commitment to security and safety, have voluntarily pledged to submit their models — often shrouded in enigmatic secrecy — to the scrutiny of external evaluators, an outcome arising from their engagement with the White House.

Yet, lurking beneath the surface is a lingering apprehension that these corporate behemoths might fall short in their endeavors.

Tramér’s projections cast a shadow over search engines and social media platforms, foretelling a future marked by exploitations aimed at securing financial gains and disseminating disinformation. These exploits are projected to capitalize on the chinks in the armor of AI systems. A subtle manipulation might empower an astute job seeker to skillfully convince the system of their unparalleled suitability for a given role.

Cambridge University’s Ross Anderson harbors concerns of his own, envisioning a world where AI bots gradually erode the bastions of privacy, infiltrating sensitive domains like hospitals, financial institutions, and employers. Malicious actors, leveraging these bots, could extract financial, employment, or health-related data from systems erroneously deemed impervious.

Further revelations emanate as research establishes that AI language models possess the capacity to contaminate their own reservoirs through retraining grounded in spurious or irrelevant data.

Another disconcerting aspect arises in the form of AI systems voraciously ingesting and subsequently regurgitating company secrets. A notable incident, involving a Korean business news outlet’s report on a similar occurrence at Samsung, led to proactive measures by corporations like Verizon and JPMorgan, who chose to curtail their employees’ use of ChatGPT in the workplace.

In the intricate tapestry of AI’s security landscape, the major players boast dedicated security personnel. However, this level of diligence might be absent among their smaller counterparts, paving the way for the proliferation of inadequately fortified plug-ins and digital agents. Startups, on the cusp of launching an array of products grounded in licensed pre-trained models, are poised to introduce hundreds of offerings in the ensuing months.

Hence, it is prudent to acknowledge the possibility that an individual may inadvertently find their address book spirited away by one of these entities.