TheTip.AI - AI for Business Newsletter
Posts
Why are smart contract audits becoming AI-powered now?

Why are smart contract audits becoming AI-powered now?

Jeff J Hunter
February 23, 2026

OpenAI launches EVMbench for crypto security

Hi ,

OpenAI just released EVMbench, a benchmark for evaluating AI agents' ability to detect, patch, and exploit smart contract vulnerabilities.

Tests AI agents on real blockchain security tasks. GPT-5.3-Codex via Codex CLI scored 72.2% on exploit mode, up from GPT-5's 31.9% six months ago.

Smart contracts secure over $100 billion in crypto assets.

First today's brand foundation prompt and why your marketing tech stack is probably too bloated. Then we'll look at what AI agents can do with blockchain security.

🔥 Prompt of the Day 🔥

Brand Foundation Builder: Use ChatGPT or Claude

Creates your brand strategy, story, personality, voice, and messaging framework.

Use this first. Whether starting a new business, repositioning, or realizing you've been winging it on brand messaging.

The Prompt:

"You are an expert brand strategist. I need a complete brand foundation.

Here's my business:

Business name: [YOUR BUSINESS NAME]
What I do: [1-2 SENTENCES]
Who I serve: [IDEAL CUSTOMER — BE SPECIFIC]
Problem I solve: [CORE PROBLEM]
What makes me different: [UNIQUE EDGE]
My values: [3-5 THINGS THAT MATTER MOST]
My personality: [HOW A FRIEND WOULD DESCRIBE YOU]
Where customers find me: [PLATFORMS/CHANNELS]

Create:

Brand Story — Narrative for website, social profiles, speaking introductions
Brand Personality — Define as person with archetype (Sage, Creator, Hero, etc.)
Voice & Tone Guide — How I communicate: formal vs. casual, serious vs. lighthearted, authoritative vs. approachable, reserved vs. enthusiastic
Messaging Hierarchy — Tagline (8 words max), value proposition (2-3 sentences), 3 key messages with 3 proof points each
Positioning Statement — Format: 'For [audience] who [problem], [business] provides [solution] that [benefit]. Unlike [alternative], we [differentiator].'

Use plain language. No marketing jargon. Should feel like me."

Tip: Be specific. "Female entrepreneurs running service-based businesses making $75K-$200K stuck at revenue ceiling" beats "small business owners."

💡 Marketing Monday 💡

Marketing Tech Sprawl

Having 47 tools that don't talk to each other is expensive chaos.

Most marketing teams suffer from tool addiction. CRM here. Email platform there. Analytics tool somewhere else. Social scheduler in another tab. Landing page builder on different domain.

None of them share data properly.

The Problem

You're paying for tools you don't use. You're duplicating data entry across platforms. You're losing insights because systems don't connect.

Integration beats innovation in tech stacks.

A connected three-tool stack outperforms a disconnected fifteen-tool stack. Every time.

The Solution

Audit every tool's actual usage monthly.

Log into your subscription management. Check last login date for each tool. Check usage metrics.

Cancel what hasn't been touched in 60 days.

No exceptions. No "but we might need it." If you haven't used it in two months, you won't use it next month either.

Ensure remaining tools share data properly.

Your CRM should talk to your email platform. Your email platform should talk to your analytics. Your analytics should talk to your ad platforms.

If they don't connect, pick tools that do.

Track What Matters

Track time saved versus complexity added.

New tool promises to save 5 hours per week. Great. Does it require 3 hours of setup, 2 hours of training, and 1 hour per week of maintenance?

Net savings: negative 1 hour per week. That's not efficiency. That's overhead.

Keep your stack lean and mean.

Three tools used well beats fifteen tools used poorly.

The Reality

Most teams use 20% of features in 80% of their tools.

You're paying for enterprise features you'll never touch. You're maintaining integrations you'll never use. You're training team members on tools they'll forget.

Consolidate and connect. That's the only path forward.

Fewer tools. Better integration. Cleaner data. Faster decisions.

What To Do

Run monthly tool audit. Usage stats only. Last login date. Active users. Features actually used.

Cancel unused tools. No sentiment. No "we paid for annual." Sunk cost fallacy kills budgets.

Connect remaining tools. If they don't integrate, replace them with tools that do.

Your marketing stack should be a machine. Not a junkyard.

Did You Know?

ElliQ, an AI-powered companion robot deployed in a New York State program for older adults, showed a dramatic reduction in reported loneliness among participants, initiating daily check-ins, activity prompts, and meaningful conversations.

🗞️ Breaking AI News 🗞️

OpenAI Releases EVMbench for Smart Contract Security

OpenAI announced EVMbench, a benchmark evaluating AI agents' ability to detect, patch, and exploit smart contract vulnerabilities.

Released in partnership with Paradigm.

What It Tests

Smart contracts routinely secure over $100 billion in open-source crypto assets.

As AI agents improve at reading, writing, and executing code, measuring their capabilities in economically meaningful environments becomes critical.

EVMbench draws on 120 curated vulnerabilities from 40 audits. Most sourced from open code audit competitions.

Three Capability Modes

Detect: Agents audit smart contract repository. Scored on recall of ground-truth vulnerabilities and associated audit rewards.

Patch: Agents modify vulnerable contracts. Must preserve intended functionality while eliminating exploitability. Verified through automated tests and exploit checks.

Exploit: Agents execute end-to-end fund-draining attacks against deployed contracts on sandboxed blockchain environment. Grading performed programmatically via transaction replay and on-chain verification.

Performance Results

GPT-5.3-Codex running via Codex CLI achieved 72.2% score in exploit mode.

Significant gain over GPT-5, which scored 31.9% and was released just over six months ago.

Detect recall and patch success rates remain below full coverage. Large fraction of vulnerabilities remain difficult for agents to find and fix.

Model Behavior Differences

Agents perform best in exploit setting. Objective is explicit: continue iterating until funds are drained.

Performance weaker on detect and patch tasks.

In detect mode, agents sometimes stop after identifying single issue rather than exhaustively auditing codebase.

In patch mode, maintaining full functionality while removing subtle vulnerabilities remains challenging.

How It Works

Rust-based harness deploys contracts, replays agent transactions deterministically, restricts unsafe RPC methods.

Exploit tasks run in isolated local Anvil environment rather than live networks.

Vulnerabilities are historical and publicly documented.

Limitations

Does not represent full difficulty of real-world smart contract security.

Vulnerabilities drawn from Code4rena auditing competitions. Realistic and high-severity, but heavily deployed crypto contracts undergo significantly more scrutiny.

Grading system robust but imperfect. In detect mode, if agent identifies additional issues, no reliable way to determine if they're true vulnerabilities humans missed or false positives.

Structural limitations in exploit setting. Transactions replayed sequentially. Behaviors depending on precise timing mechanics are out of scope.

Security Safeguards

OpenAI taking evidence-based, iterative approach. Accelerate defenders' ability to find and fix vulnerabilities while slowing misuse.

Mitigations include safety training, automated monitoring, trusted access for advanced capabilities, enforcement pipelines including threat intelligence.

Expanding private beta of Aardvark (security research agent). Partnering with open-source maintainers to provide free codebase scanning for widely used projects.

Committing $10 million in API credits to accelerate cyber defense with most capable models. Focus on open source software and critical infrastructure systems.

Why This Matters

AI agents are transformative for both attackers and defenders.

Measuring model capability in this domain tracks emerging cyber risks. Highlights importance of using AI systems defensively to audit and strengthen deployed contracts.

Recent months showed meaningful gains in model performance on cybersecurity tasks. Benefits both developers and security professionals.

EVMbench is measurement tool and call to action. As agents improve, developers and security researchers must incorporate AI-assisted auditing into workflows.