LLMs in Construction: Where They Fail and Where They Shine

At the Building 2030 Summer Seminar, doctoral researchers Tuomas Valkonen and Roope Nyqvist from Aalto University shared fresh insights into how Large Language Models (LLMs) like ChatGPT and DeepSeek perform in construction-related tasks.

The results highlight a simple truth: used wisely, LLMs can accelerate knowledge work; used unwisely, they can produce nonsensical results in areas where precision is critical.

HVAC systems descriptions with an LLM

Tuomas Valkonen shared his experiences using ChatGPT without any customization as a helper for MEP designers.

First, Tuomas prompted GPT-4o and DeepSeek to do an HVAC systems description: “I am an HVAC designer, and I’m designing a residential apartment building. The building is heated with district heating and hydronic underfloor heating, with apartment-specific ventilation units. Prepare a systems description of all HVAC systems for me.”

Both LLMs created a “moderately reasonable” systems description, but DeepSeek missed two essential systems in its output.

Using PDFs as input for system design

Next, Tuomas explained how he provided ChatGPT and DeepSeek with a simplified hospital floor plan showing patient rooms. He asked the LLMs to create a ventilation zoning table for the plan without giving additional instructions about the systems.

Both software programs offered a version with a different number of zones and air volumes.

Tuomas also asked the LLMs to create an HVAC systems schema on top of the floor plan, and ChatGPT to color-code the service zones of an actual plan from a real project. The results were nonsensical. DeepSeek even instructed you to use Microsoft Paint to do the coloring yourself!

Quantity takeoffs, way off

Trying to create a room schedule from the plan drawing did not go well either. The plan had 72 rooms, totaling 832 square meters. ChatGPT found 30 rooms covering 313 sqm.

Furthermore, calculating the room-specific heating load or calculating quantities resulted in several errors. ChatGPT failed to correctly perform a simple multiplication on one line, and it suggested that a duct segment in one apartment was two kilometers long.

Eventually, after several iterations, Tuomas started getting better results. He believes that with proper instructions, a quantity takeoff with this technology would become feasible and more reliable.

Assessing IFC file data quality

Tuomas also tested how well ChatGPT could check whether an MEP IFC file included the mandatory information.

Initially, ChatGPT could not read the IFC as-is, but created a piece of software that turned it into a text file it could use.

After a few trials and errors, he discovered a method to obtain the correct results with a model having 5,000 components. A model with 50,000 components, however, proved to be too large. In that case, ChatGPT could create Python code to do the job.

Where do LLMs shine?

As Tuomas’s tests demonstrate, out-of-the-box LLMs can’t be reliable assistants in many common construction tasks that deal with graphical information and understanding of construction concepts. However, they shine in some other applications.

Roope Nyqvist discussed use cases where LLMs proved helpful. He had, for example, developed a custom GPT that can answer questions about hospital design. The knowledge base incorporates 6,543 files, covering 1,749 pages of expert knowledge, into a system available for anyone to use. It took only 100 hours to create the tool.

LLMs also excel at creating clear RFIs, change orders, meeting summaries, or client-friendly reports. They can convert complex technical content into straightforward language or translate between languages for international project teams.

In bids and proposals, LLMs accelerate narrative writing, case study creation, and market research. They are already proving useful in helping contractors tell their story more effectively and quickly.

I asked ChatGPT 5 to summarize the reliable and unreliable use cases in the construction industry. It provided me with a list that is visualized in the following diagram.

ChatGPT use case examples in construction according to ChatGPT 5

Specialized tools rule

It’s clear that domain-specific models are essential when accuracy and reliability are necessary. LLMs won’t replace Togal, Buildots, Kreo, and similar tools, but they will communicate with them.

ChatGPT is most reliable in language-heavy, reasoning-heavy, low-liability tasks: communication, documentation, learning, project support, and orchestration. And even then, ask it: “Are you sure?”

PS. You can watch Tuomas’s and Roope’s presentations in Finnish on YouTube.

View the original article and our Inspiration here