Early tests suggest ChatGPT Health’s assessment of your fitness data may cause unnecessary panic

Date:

Share:



Earlier this month, OpenAI introduced a new health focused space within ChatGPT, pitching it as a safer way for users to ask questions about sensitive topics like medical data, illnesses, and fitness. One of the headline features highlighted at launch was ChatGPT Health’s ability to analyze data from apps like Apple Health, MyFitnessPal, and Peloton to surface long term trends and deliver personalized results. However, a new report suggests OpenAI may have overstated how effective the feature is at drawing reliable insights from that data.

According to early tests conducted by The Washington Post‘s Geoffrey A. Fowler, when ChatGPT Health was given access to a decade’s worth of Apple Health data, the chatbot graded the reporter’s cardiac health an F. However, after reviewing the assessment, a cardiologist called it “baseless” and said the reporter’s actual risk of heart disease was extremely low.

Dr. Eric Topol from the Scripps Research Institute offered a blunt assessment of ChatGPT Health’s capabilities, saying the tool is not ready to offer medical advice and relied too heavily on unreliable smartwatch metrics. ChatGPT’s grade leaned heavily on Apple Watch estimates of VO2 max and heart rate variability, both of which have known limitations and can vary significantly between devices and software builds. Independent research has found Apple Watch VO2 max estimates often run low, yet ChatGPT still treated them as clear indicators of poor health.

ChatGPT Health gave different grades for the same data

The problems did not stop there. When the reporter asked ChatGPT Health to repeat the same grading exercise, the score fluctuated between an F and a B across conversations, with the chatbot sometimes ignoring recent blood test reports it had access to and occasionally forgetting basic details like the reporter’s age and gender. Anthropic’s Claude for Healthcare, which also debuted earlier this month, showed similar consistencies, assigning grades that shifted between a C and a B minus.

Both OpenAI and Anthropic have stressed that their tools are not meant to replace doctors and only provide general context. Still, both chatbots delivered confident, highly personalized evaluations of cardiovascular health. This combination of authority and inconsistency could scare healthy users or falsely reassure unhealthy ones. While AI may eventually unlock valuable insights from long term health data, early testing suggests that feeding years of fitness tracking data into these tools currently creates more confusion than clarity.



Source link

━ more like this

It’s not just Grok: Apple and Google app stores are infested with nudifying AI apps

We tend to think of the Apple App Store and Google Play Store as digital “walled gardens” – safe, curated spaces where dangerous...

Sennheiser’s new audio gear keeps the wire and a budget appeal

Sennheiser has just dropped a lifeline to everyone who misses the simplicity of plugging in a pair of headphones and hitting play. In...

Agentic AI in Retail 2026: The Playbook for Scalable Impact – Insights Success

For brands and retailers, success is not just about executing assortments or managing seasonal demand. It’s about making the correct decisions quicker and...

The Complete Guide to Custom Shopping Bag Materials – Insights Success

The material you use for your unique shopping bags has a big impact on how people see your brand. The material affects durability,...

NASA animation shows exactly how its crewed moon mission will unfold

A NASA video (above) reveals in great detail how its upcoming Artemis II mission is expected to play out. The space agency released the...
spot_img