A digital illustration shows a pink heart icon and the OpenAI logo connected by a glowing pulse line on a purple background

Apple Health’s ChatGPT-Powered Reports Raise Alarms Over Inaccurate, Inconsistent Diagnoses

Apple Health’s new ChatGPT-powered health integration was pitched as a smarter way to understand your wellness data. The idea sounded simple and compelling: connect your medical records and fitness apps, then ask an AI assistant to help explain lab results, organize questions for your next doctor visit, suggest diet and workout adjustments, or even weigh the pros and cons of insurance options based on your personal health patterns.

But a recent investigative report is raising serious doubts about how reliable ChatGPT Health actually is when it’s asked to interpret real-world health data from Apple Health.

In the investigation, reporter Geoffrey Fowler shared an enormous amount of Apple Health information with the service, including 29 million recorded steps and 6 million heart-rate measurements. He then asked the AI to evaluate his cardiac health. The result was alarming: ChatGPT Health gave him an F grade.

The bigger problem came next. When Fowler brought the assessment to his doctor, the doctor dismissed it outright. According to the doctor, Fowler’s risk for heart-related issues was so low that his health insurance would likely refuse to pay for extra testing meant purely to counter the AI’s claim.

Even more concerning, the chatbot didn’t stay consistent. Fowler repeated the same question multiple times and received different “grades” for his heart health, bouncing between a B and an F. For a tool positioned as a supportive way to understand medical and wellness data, that kind of variability is hard to accept—especially when the topic is something as sensitive as heart health.

The takeaway is difficult to ignore: in its current form, ChatGPT Health may not be dependable for interpreting personal health metrics from Apple Health in a way that’s stable, repeatable, and medically meaningful. And as Apple continues exploring smarter, AI-driven health features, this kind of inconsistency highlights the risks of leaning too heavily on automated health interpretations without strong safeguards, clear limitations, and clinical-grade validation.