I’m working on a project that uses AI-generated code, and I’m not confident it follows best practices or is optimized for performance and security. I’d really appreciate a detailed AI code review to spot bugs, improve structure, and ensure maintainability before I push this to production.
AI code output needs the same review you would give human code, plus a bit more paranoia.
Here is a practical checklist you can run through.
-
Structure and clarity
• Split long functions into smaller ones with single responsibility.
• Use clear names: verbs for functions, nouns for data.
• Add short comments only where intent is not obvious.
• Delete dead code and unused imports. -
Security first
• Never trust input. Validate and sanitize everything at boundaries.
• For web code, check for SQL injection, XSS, CSRF.
Example, always use parameterized queries, no string concat for SQL.
• For auth, use battle tested libraries. No custom crypto.
• Store secrets in env vars or secret manager, not in code or git.
• Run a static analyzer like:
- JavaScript: ESLint, npm audit, Snyk
- Python: bandit, safety
- Java: SpotBugs, OWASP Dependency Check
- Performance sanity
• Look for obvious N+1 queries or nested loops on big collections.
• Check if heavy work runs in the hot path of each request. If yes, move to background job or cache.
• Measure, do not guess.
- Web: use your framework profiler or APM like New Relic, Datadog.
- Python: cProfile, line_profiler.
- Node: node --prof, clinic.js.
• Avoid premature micro-optimizations. Fix the top 10 percent slow paths first.
-
Error handling
• Wrap external calls, DB, APIs with try/except or equivalent.
• Log errors with enough context, but no secrets.
• Return useful messages to the caller, but hide stack traces in production.
• Define a global error handler in your framework. -
Testing
• Add unit tests around AI generated chunks that feel suspicious or complex.
• Start with:
- Happy path test.
- One or two edge cases.
- One failure scenario.
• Use coverage tools, but focus on high risk areas first, like auth, payments, data access.
- Style and linting
• Turn on strict linters and autoformatters.
- Python: black, isort, flake8 or ruff.
- JS/TS: eslint, prettier.
- Go: gofmt, golangci-lint.
• Fix every error, review warnings. AI code often passes type checks poorly.
• If you have a type system, add types. Mypy, TypeScript, etc.
-
Trust but verify AI logic
AI often produces code that looks plausible but is wrong in edge cases.
• For each AI function, state in a comment what it is supposed to do in one sentence.
• Then check if the code matches that sentence line by line.
• Pay attention to off by one errors, incorrect conditions, missing null checks.
• For async code, check race conditions and proper await usage. -
Dependencies
• Review every added library.
- Do you need it.
- Is it maintained.
- License is compatible.
• Pin versions. Use lockfiles.
• Run security scans on dependencies on CI.
- CI and code review flow
• Set up CI with tests, lint, security scan.
• Treat AI output as a junior dev PR.
- Ask: is this minimal.
- Is it clear.
- Is it safe.
• Do not merge anything you do not understand.
- When you share code for review
If you want people here to review it, share:
• Language and framework.
• Snippets of AI generated parts, not the whole repo.
• What the code should do.
• Any errors, performance issues, or weird behavior you see.
If you paste a chunk of the AI code, happy to walk through line by line and point out bugs, perf problems, and security risks.
@codecrafter covered the “how to review” part really well, so I’ll hit the stuff that usually blows up specifically with AI‑generated code.
- Watch for quiet logical lies
AI code often “works” on the happy path but is semantically wrong. When you read a function, ask:
- What exactly is this supposed to guarantee?
- Under what conditions does it not do that?
Concrete checks: - Does it assume a sorted list but never sort?
- Does it assume non‑null values but never check?
- Does pagination math handle page 0 / last page?
- Are timezones, locales, encodings ignored?
I’d actually disagree a bit with the “small comments only” idea in this AI context: for AI‑written stuff, I like short “contract comments” at the top of tricky functions, literally:
// Pre: ids is non-empty, user is authenticated
// Post: returns only active records, sorted by created_at desc
Then test if the function really respects that. If not, rewrite.
- AI loves fake patterns and invented APIs
Look for:
- Functions/classes that are over‑abstracted with no real need (RepositoryOfRepository, BaseManagerManager, etc.).
- References to methods or options that do not exist in the libraries you’re actually using.
- Config flags that are never read anywhere.
Quick tactic: jump to definitions for every non‑standard call in your IDE. If you can’t find it or it comes from a lib, double‑check the official docs. AI will confidently call client.fetchAll() when the real method is client.list().
- Security blind spots specific to AI code
Beyond what @codecrafter said, AI is especially bad at:
- Correctly handling JWT expiration and refresh flows.
- Building “role” checks that are too broad, like
if user.is_admin:everywhere instead of permissions. - CSRF tokens missing on one or two “extra” endpoints it added.
- CORS configs that are way too open “just to make it work.”
If there is any auth / session / payment logic that came from AI, assume it is wrong until proven otherwise. I’m not joking. Compare it line‑by‑line with either framework docs or a known good sample.
- Performance traps AI falls into all the time
Patterns I keep seeing:
- Recomputing the same expensive query or HTTP call inside a loop.
- Using
map/filter/reduce+ multiple passes where one simple loop would do. - Building huge in‑memory lists where a streaming / cursor approach is needed.
- In ORMs: N+1 queries because it forgot to use
select_related/include/ eager loading.
You don’t need a full profiler up front. Just scan and highlight:
- Any nested loop that touches a DB or network.
- Any call inside a loop that does serialization, JSON parse, crypto, or regex.
Those are your first refactor targets.
- Error handling & logging weirdness
AI loves to:
- Catch generic
Exceptionand then do nothing. - Log sensitive data (tokens, passwords, full payloads) in debug logs.
- Bubble stack traces straight to the client in “debug helpers” that never got turned off.
Rules I use reviewing AI code:
- If it catches
Exception, ask “what specific error is this supposed to handle?” - If it logs request bodies, make sure it redacts or filters secrets.
- Check “helper” endpoints or debug flags are actually disabled in prod config.
- Dependency chaos cleanup
Beyond the checklist, check for:
- Multiple libraries doing the same thing (two HTTP clients, two JWT libs, etc.), because the AI changed its mind mid‑file.
- Dev‑only libs in runtime code paths.
- Huge frameworks used for one tiny feature that could be done with stdlib.
Practical move: scan requirements.txt / package.json / pom.xml and mark:
- “Do I know where this is used?”
- “Can I grep the code and see a real use?”
If not, drop it or isolate it until you know why it exists.
- Version drift & copy‑paste from old docs
AI often mixes APIs from different versions:
- Old syntax for a library that you’re using in its latest major version.
- Deprecated methods that still “build” but do the wrong thing.
For every critical library (ORM, web framework, auth, HTTP client), open the actual docs for your version and compare 1 or 2 representative calls. If you see mismatches, assume the rest may be wrong too and audit those parts first.
- How to share code here to get the most useful review
When you’re ready to post snippets:
- Include only the AI‑generated parts that:
- Touch external systems (DB, HTTP, files, auth) or
- Contain logic that affects money, identity, or data integrity.
- Say:
- “Language / framework”
- “What this function is supposed to do”
- “Any weird runtime behavior you saw (errors, slowness, odd edge cases)”
With that context, people here can give a very targeted teardown instead of hand‑wavy “looks fine” feedback.
If you paste a specific chunk (like a data access layer, an auth middleware, or some performance‑critical route), I’m happy to nitpick it line by line and call out where the AI has lied to you.
Skip the generic checklists; @codecrafter already nailed most of that. I’d zoom in on how to systematically distrust AI code so you can scale reviews instead of reading every line forever.
1. Treat AI code as “untrusted input”
Same mindset as user input:
- Wrap AI‑generated modules behind narrow, well‑tested interfaces.
- Never let them touch DB, filesystem, or network directly without a thin, human‑written adapter.
- For each adapter, define:
- What it can call
- What types it can return
- What errors it can throw
If some file is 90% AI, quarantine it. Put a human‑written facade in front and test that.
2. Use tests to pin down behavior before “improving”
I slightly disagree with leaning too hard on comments as the main truth. Comments rot; tests break loudly.
For each AI function that matters:
- Write 3 categories of tests:
- Normal cases (what you expect daily)
- Nasty edge cases (empty arrays, nulls, huge inputs, weird encodings)
- Malicious cases (user controls every string / ID)
- Only after tests are green, start refactoring or “optimizing.”
If the AI wrote tests too, be suspicious. Cross‑check that the tests actually hit the bug‑prone paths instead of just asserting trivial happy flows.
3. Static analysis is your friend
Instead of eyeballing everything:
- Turn on the strictest lints your language supports.
- Add type checking (TypeScript / mypy / Kotlin strict, etc.).
- Use security linters (Bandit, ESLint security plugins, etc.).
AI code tends to be just good enough to compile but bad enough to trip static tools quickly. Fixing all high‑severity lints is a low‑effort filter for hidden problems.
4. Enforce consistency like a tyrant
AI loves mixing styles and partial patterns:
- One file uses hex IDs; another assumes integers.
- One function returns
null, another throws, another returns{ error }.
Pick conventions and enforce them:
- Unified error model (exceptions or result types, not both at random).
- One logging style.
- One config source of truth.
When you find a place where the AI diverged from the chosen pattern, assume that spot deserves a deeper review.
5. Constrain the libraries early
Where I differ a bit from others: I’d proactively forbid most libraries before the AI ever writes code for them.
- Decide your HTTP, DB, auth, and testing libraries up front.
- Remove or block everything else from the dependency file.
- If the AI “invents” a new library in a snippet, that is an automatic red flag to refactor.
This removes a ton of “dependency chaos” before it starts and keeps you on known ground.
6. Performance: design small benchmarks instead of guessing
Rather than trusting intuition:
- For any performance‑sensitive path, create microbenchmarks.
- Compare:
- AI version
- Simplest possible human version
If the human version is clearer and within, say, 5–10% of performance, prefer clarity. AI loves overcomplicated “optimizations” that barely help and are harder to maintain.
7. Security: threat model the AI parts explicitly
Pick all AI‑touched areas involving:
- Auth / sessions
- Payments / money
- PII / secrets
- File uploads / downloads
For each, write a mini threat model:
- What can an attacker control here?
- What is the worst‑case if this function lies?
- How would we detect abuse?
Then design tests and logs to catch that. For example, explicitly log permission check outcomes (without leaking secrets) to spot overly permissive logic introduced by AI.
8. About using an “AI Code Review” workflow or tool
If you are building an “Ai Code Review” process around this, a structured checklist helps. Something like:
Pros of building a dedicated Ai Code Review flow:
- Forces you to treat AI code differently instead of trusting it blindly.
- Easier onboarding for teammates: consistent rules for what gets extra scrutiny.
- You can automate large parts with linters, type checks, and tests hooked into CI.
Cons:
- Extra upfront work creating rules, tests, and facades.
- Might slow initial development, especially for small throwaway scripts.
- If rules are too rigid, can discourage useful experimentation with AI prototypes.
Compared with @codecrafter’s more “manual code‑review oriented” approach, think of this as raising guardrails around the entire repo so you rely less on human reviewers catching everything each time.
If you share a specific AI‑generated module (especially one touching your DB, auth, or billing), people here can walk through it with this lens: what needs to be quarantined, what needs tests, and what should just be rewritten by hand.