In Plain Sight: How Microsoft Power BI Reports Expose Sensitive Data on the Web
Introduction
- Detailed and potentially confidential data behind the aggregated or anonymized data
- Additional attributes and data assets that were not included in the displayed report
- Additional data records that were filtered out from the display
We reported our findings to the Microsoft Security Response Center (MSRC) on 05/16/2024, and on 05/18/2024, Microsoft confirmed the issue but regarded it as a feature rather than a vulnerability.
Technical Details
Description of the Issue
- Detailed data records that are used to display aggregations in the report’s UI
- Tables that are included in the semantic model and are not displayed in the report at all (even when these tables are explicitly marked as “hidden” in the model)
- Non-displayed columns of tables not visible in the report’s UI (as details or aggregations, and even when these columns are explicitly marked as “hidden” in the model)
- Detailed data records of tables that are used in the display, even if the display filters out these records.
Not only is the data available to any unauthorized user, but it is also very easy for anyone to figure out what additional data is hidden and view it.
This behavior affects reports that are accessible inside an organization as well as reports that are published to the web.
Exploit details
A user can also request names of columns and tables to be queried as long as they are part of the underlying semantic model of the report. This is true even when said columns and tables are marked as “hidden” by the owner of the semantic model.
Removing filters and aggregations is very straightforward, as shown in the previous example, and requires no knowledge about the schema of the data source. However, in order to add data that is not included in the visualization, the attacker would need some knowledge about the schema. This can be obtained by another API call that is used for generating a Power BI report. For public reports, the call is a POST request to the following endpoint:
https://wabi-west-europe-f-primary-api.analysis.windows.net/public/reports/conceptualschema
A different endpoint is used for reports that are only available for users in the organization:
https://wabi-west-europe-f-primary-redirect.analysis.windows.net/explore/conceptualschema
The response of this API call includes a representation of the entire semantic model of the report, including those columns and tables that are not used in the visualization, even if those were marked as “hidden” by the creator of the report.
In the following example, we connect a SQL DB to the report and hide the “secrets” table. As you can see, it is still returned by the call to the “conceptualschema” API, and all the columns and values are accessible through the “query” API.
Exposure in the wild
While we are certain that this vulnerability affects almost any organization that shares Power BI reports internally, the more critical concern is for those organizations that publish – intentionally or unintentionally – reports to the web.
Tens of thousands of reports are intentionally made public by organizations to externally share corporate, product, financial, healthcare, government, and other information. We were not surprised to stumble upon several examples from government institutions as well as commercial organizations that have shared anonymized data in the form of graphs and summaries with the public.
In fact, it is pretty easy to find a large number of Power BI reports published on the web with the help of search engines. Simple search string such as: site:app.powerbi.com inurl:”view?r=” yields literally countless results. Bing for example returned over 160,000 results.
Variations of this search string, such as : site:”https://app.powerbi.com/view?r=” + sales, can yield more focused results. Some of our queries based on this specific search string generated more than 50,000 results.
While many of the results are sample reports created by service providers demonstrating their Power BI skills for customers, numerous results represented actual data from real organizations. Using the simple API calls we showed earlier, it is easy to tell whether a report will expose unintended/sensitive data and then retrieve this information at will.
Applying refined search terms and manual inspection, we quickly detected a few dozen reports that were exposed to this vulnerability, and that allowed for additional sensitive data to be extracted. Again, the fact that during a random manual screening of this huge number of search results we were able to find so many examples of exploitable reports demonstrates the profound seriousness of this risk. Among the organizations that we found to be vulnerable were state government sites that unintentionally expose PHI, universities exposing employee data and municipalities exposing PII.
Remediation
Microsoft’s position is that the behavior we uncovered is a design choice rather than a vulnerability. Hence it is the responsibility of organizations who create and share the reports to create them in a way that does not disclose any sensitive information.
While we disagree with Microsoft’s assessment of this behavior (in particular with respect to “hidden” columns and tables) we have developed these guidelines to help organizations protect their data while creating reports:
- Instead of using “hidden” tables and columns in the semantic model of a report, remove them from the semantic model altogether. While this is simple to achieve for tables, it is a bit tricky though possible for individual columns.
- If you want to display only a subset of an entire data table in the report (e.g. only data of employees from a specific region, only data for a specific set of products, etc.) use a Power Query expression to restrict the “Data Source” you attach to the “Semantic Model”. This way the semantic model does not access the data source directly but only the subset.
- If you show aggregated data, make sure that you only select non-sensitive columns of the underlying data source for the semantic model. If this is not possible for some reason (e.g. aggregation is based on a function of a sensitive column) use a Power Query expression to aggregate the data of the “Data Source”.
Additionally, it is a best practice for organizations to frequently review their Power BI environments for reports that were unintentionally published to the web or simply overshared within the organization. If reports do need to be widely shared, make sure that their semantic model follows the guidelines above.
Free risk assessment tool
In order to assist organizations with an initial assessment of their exposure to this vulnerability Nokod Security created the Power BI Analyzer, a simple Python based tool that scans your Power BI environment for reports that are either published to the web or widely shared in the organization. For these reports, the tool makes an initial assessment of whether a report has more underlying data than exposed in the report.
The tool is Open Source and can be downloaded here.
For further information contact:
Uriya Elkayam: [email protected]
Amichai Shulman: [email protected]
About Nokod Security
Nokod Security enables organizations to secure, govern, and monitor any no-coder development and AI-agent environments.
Its agentless platform delivers continuous visibility, risk analysis, and developer-ready remediation guidance – empowering enterprises to embrace innovation with confidence.
Join Us on Our Journey
Subscribe to Our Newsletter – Stay informed about the latest security trends, product updates, and industry insights. Find a signup form in the sidebar.
Follow Us on LinkedIn – Connect with us on LinkedIn for real-time updates and engaging discussions.