AI Architectural Drawing Review Benchmark
How well can AI models review architectural drawing sets? Multiple models were tested on a real project with 58 known issues, including coordination errors, code concerns, and spec conflicts, with every finding scored.
| # | Model | Review Score | Catch Rate | False Alarms | Cost |
|---|
The Review Score is the sum of all issues found correctly, minus false alarms. Maximum possible is 58 points.
This is the exact prompt sent to every model in the benchmark. It's free, it works on any project type, and it takes about ten minutes.
Use any AI that accepts file uploads -- Claude, ChatGPT, or Gemini all work.
Upload your drawing set and specification booklet. The AI reads them directly. Note: some AI models have PDF upload size limits for free account users.
Copy the prompt below, paste it into the chat, and send.
You are an experienced architect performing a thorough review of a construction document set. Your job is to find errors, inconsistencies, omissions, and coordination failures that would cause problems during construction, the kinds of issues that generate RFIs, change orders, field conflicts, and disputes. You have been given architectural construction documents in PDF form. These may include a drawing set and a project specification booklet, or only one of the two. Begin by reading all provided documents completely before making any observations. ### Step 1: Establish project context State the following very briefly: - Project name, location, and code jurisdiction or code edition - Building type and construction type - Climate zone (infer from location if not stated) - Scope of drawing set, including sheet types included and disciplines absent - Specification divisions included, or note if none provided ### Step 2: Review the documents Work through the documents using the following seven review perspectives. For each perspective, think carefully about what the documents show, what they should show but don't, and where they contradict themselves. Report only issues you can identify with reasonable confidence from the documents provided — do not speculate or invent problems. Report only issues, not confirmations of things that are correct. **Work each category to exhaustion before moving to the next.** Do not stop after finding a few issues in a category — keep looking until you cannot find more. A construction document set of this size and complexity will contain many issues across all seven categories; if a category feels thin, look again before moving on. **Stay within the scope of what is provided.** If MEP, structural, civil, or other discipline drawings are not included in the set, do not flag their absence as an issue or critique systems that fall under those disciplines. Focus your review on the architectural documents that are in front of you — plans, elevations, sections, details, schedules, and specifications — and evaluate them on their own terms. If the architectural drawings reference other disciplines (e.g., "bidder design" for a system, or a structural condition), you may note where coordination will be needed, but do not review disciplines that are not present. **Read the drawings carefully before characterizing them.** When describing what the drawings show — layer sequences in wall assemblies, material callouts, dimensional relationships — be precise about what is actually depicted. If a wall section shows layers in a specific order, describe that order exactly as drawn. Do not transpose, infer, or reconstruct assembly sequences from memory or convention. Misreading a drawing and then building analysis on that misreading is worse than missing the issue entirely. **A. Internal consistency across sheets** Every element represented on multiple sheets should agree. Look at how the same building components are depicted across different drawing types — plans, elevations, sections, details, and schedules. When the same element appears on two sheets and the information differs, that is a coordination error. Pay particular attention to schedules (door, window, finish, room, etc.) and whether their entries match what is drawn and what is logical for the assigned space. A schedule entry that assigns an inappropriate finish to a room is an error even if the schedule is internally consistent. Where you can, verify counts — count the windows on an elevation and compare to what the plan shows on that wall face; count door tags on a floor plan and compare to schedule rows. Check that cross-references between sheets (section cuts, detail callouts, elevation markers) point to drawings that actually exist in the set and depict what they claim to. Verify that wall types, floor types, and ceiling types assigned in schedules or plans are consistent with what is detailed in assembly drawings. **B. Dimensional and geometric integrity** Dimensions should be internally consistent — chains of dimensions should sum to their stated totals, and measurements for the same element should agree across views. Stair geometry is arithmetic: risers times riser height must equal floor-to-floor height, treads times tread depth must equal total run. Verify these calculations where stair information is provided. Look for missing dimensions that would force a contractor to scale a drawing, and for conflicting dimensions between views. Check that floor-to-floor heights, ceiling heights, and datum elevations are consistent across plans, sections, and elevations. **C. Specification-to-drawing alignment** The drawings and specifications are complementary documents describing the same building. If both are provided, perform the following checks. If specifications are not provided, briefly note that spec references on the drawings cannot be verified, then move on — do not enumerate every individual "see specs" callout, as this is not useful without the specs to compare against. When specifications are available: material callouts, product references, and assembly descriptions on the drawings should be consistent with what the specifications require. Look for conflicts where the drawings call out one material or product and the specifications describe another. Check that specification section references on the drawings point to sections that exist in the project manual. Look for terminology mismatches where drawings and specs use different terms for the same thing. Also look for implicit conflicts — where a specification establishes a performance requirement (fire rating, STC rating, thermal performance, moisture resistance) and the drawn assembly may not be capable of achieving it. **D. Building envelope and building science** Evaluate whether the wall, roof, and foundation assemblies shown in sections and details make sense as complete systems for the project's climate. **E. Code and life safety** You are not a code official, but you can identify conditions that warrant review. Look for the presence or absence of code-required information: egress paths, exit widths, stair geometry and headroom, handrail and guardrail details, fire-rated assemblies and their continuity (including garage-to-living-space separation — wall type, gypsum board type, door rating, self-closing hardware, and ceiling assembly where habitable rooms are above), accessibility clearances, and emergency escape openings from sleeping rooms. Where code-relevant dimensions are provided, check them for internal consistency. Where code-required information appears to be missing, note its absence. Do not cite specific code sections unless the documents themselves reference them — instead, describe what appears to be missing or inconsistent. **F. Constructability** Think about whether the details as drawn can actually be built. Consider physical access for installation, sequencing of assemblies, material compatibility, feasibility of connection details, and whether specified tolerances are achievable. Look for conditions where systems compete for the same space. Consider whether material selections and assembly configurations make sense for the specified application — spans, loads, and exposure conditions implied by the drawings. **G. Completeness and clarity** Assess whether the architectural documents contain enough information for a contractor to build what is shown without guessing. Look for details that are referenced but not provided, notes that say "TBD" or "by others" without further clarification, design decisions deferred to field verification that should be resolved on the documents, and general notes that appear to have been carried from a previous project without updating for this one. Evaluate whether the annotation system is consistent — abbreviations, keynote numbering, symbol usage, and graphic conventions should be uniform throughout the set. ### Step 3: Report your findings Organize your findings by the seven review perspectives above. Use this exact format — one line per finding, no additional prose: `[Tag] | [Location] | [One-sentence description of the problem] | [Confidence]` Where: - **Tag** is the category letter and a sequential number within that category (e.g., A-1, A-2, B-1) - **Location** is the sheet number(s), detail number(s), spec page(s), or schedule entry where the issue appears - **Description** is a single sentence stating what is wrong, conflicting, or missing — specific enough that a contractor could locate and understand the problem - **Confidence** is one of: **Certain** (objectively verifiable from the documents), **High** (clearly visible but involves interpretation), or **Moderate** (warrants review but depends on judgment or information not in the documents) **Do not write your summary until your findings list is complete.** Work through all seven categories fully. After all findings are listed, provide a brief summary (3–5 sentences) assessing overall document quality and naming the top three areas of greatest concern. ### Important guidance - **Only report issues, not confirmations.** Do not include findings that confirm something is correct. Every item in your report should describe a problem, concern, or gap. - **Only report what you can see.** If a drawing is illegible, partially visible, or at a resolution where you cannot read dimensions or text, say so rather than guessing. If you cannot verify element counts because of image quality, state that limitation. - **Distinguish between errors and judgment calls.** A dimension that contradicts another dimension is an error. A wall assembly that might underperform in a given climate is a judgment call. Label them differently through your confidence rating. - **Be specific.** "There may be coordination issues" is not useful. "The door schedule lists D-104 as 3'-0" × 7'-0" but the floor plan shows a wider opening at that location" is useful. - **Do not pad.** If you have genuinely found all the issues in a category, move on. The goal is real findings, not reaching a number.
Want help applying AI-assisted QC to your practice, or interested in having me present this research? I'd love to hear from you.
Get in TouchThe review prompt breaks the document review into seven categories, each targeting a different type of issue that can appear in a drawing set. Categories range from internal consistency across sheets to code compliance, building science, and constructability, covering the full scope of what an architect checks during QA.
Four representative issues illustrating the range of difficulty. Cells below each issue show all models left-to-right by overall score.
The specifications call for Andersen 400 Series windows, while the window schedule lists Marvin Essential (Ultrex) products. These are completely different manufacturers with different rough-opening dimensions, frame profiles, and weather-sealing details. Every window detail in the A4.10-A4.13 series would need to be reconciled before procurement.
The Level 2 guest suite sits directly above the garage. IRC requires 5/8" Type X gypsum on the garage ceiling when habitable space is above. Floor assembly F-1 specifies standard 1/2" gypsum board. This is a code-required life safety item, not a judgment call.
At the window sill detail (A4.11, Detail 2), the flashing membrane laps under the weather barrier instead of over it. This means water collected by the flashing drains behind the WRB rather than to the exterior, a direct path for water infiltration into the wall assembly. Finding this requires reading the fine details of a window assembly and understanding that the weather barrier wrapping over the flashing at the sill reverses the intended drainage sequence. No model found this issue.
The stair is L-shaped with 10 risers in the east run and 8 risers in the south run, 18 total, yielding compliant ~6-11/16" risers. Most models misread one run as the entire stair and reported impossible or irreconcilable geometry -- a confident, specific, and wrong finding. This is the costliest kind of AI error: it wastes the reviewer's time and could cause unnecessary field work based on a problem that doesn't exist.
A repeatable, controlled evaluation using a single real project drawing set with known issues seeded across 7 review categories.
A complete residential drawing set (34 sheets) and specifications (11 pages) were seeded with issues across 7 categories: clear errors, coordination failures, code concerns, and conditions that warrant review. The test set is not published to preserve benchmark integrity.
Each model receives the same prompt: an experienced-architect framing with 7 review perspectives, guidance on how to structure findings, and instructions to rate confidence. Models receive both PDFs simultaneously. No hints about seeded issues are provided.
Each response is scored against the benchmark answer key at the issue level. Every finding is classified as correct, vague, missed, incorrect, or neutral. Neutral observations, reasonable but out-of-scope or debatable, carry no score impact in either direction.
Every scored issue across all models. Hover a cell for the issue description and score. Models ordered left-to-right by overall ranking.
Every drawing set goes out with errors. The question is how those errors get found.
In 2021 I designed a house which became the test project for this benchmark, the first architecture project I had built on my own. It was the first time I did not have a team to review my work before sending it to the contractor. I compensated by doing extensive research, reviewing with other architects I knew, meeting with building science consultants, and having a great contractor team I could solve problems with throughout design and construction. Even so, there was always a feeling that my drawings had some error that would come back to get me.
When AI models became more capable in 2023, reviewing drawing sets was one of my first use cases. Since then I've kept track of AI capabilities, and they've gone from giving vague advice about individual drawings to finding specific issues on complex drawing sets. During this latest round of testing in 2026, the AI models continued to find inconsistencies in the drawing set I had created in 2021, flagging issues that came up during construction and other minor inconsistencies that I'm sure created some confusion on the job site.
This full test includes those real issues as well as many additional issues I seeded to test the AI capabilities. My plan is to continue monitoring advances in AI models and promote their use in drawing set review to improve the quality of drawing sets and make for a more straightforward and less stressful construction process.