Open-source Healthcare Statistics

Collecting statistics on the use of open-source code in the NHS and the wider healthcare field.

What is open-source?

Open-source is the practice of publishing the source code of a software project so that anyone can read, modify, re-use, and improve that software.

As set out in the NHS Digital Service Manual, public services are built with public money–so unless there’s a good reason not to (for security reasons for example), all code produced by the NHS should be made publicly available.

Open source means that the NHS can give our work back to the people who fund it, the public: allowing them to more easily join our staff, more quickly develop products to support us, and better understand and trust the work we do on their behalf. NHS Open-source Policy

To this end, the Department of Health & Social Care has recently made a commitment to make all new NHS code open source and published under appropriate licences such as MIT and OGLv3.

The growth of open-source in healthcare

The ‘cambrian explosion’ visualisation captures the rise in open-source software in recent years. From the first open-source repository published by NHS England in 2014, to over 1,200 today. Python, R, and webdev tools (HTML, css, Ruby, PHP) are the most popular languages.

Show code
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
pio.renderers.default = 'notebook'
from datetime import datetime
from dateutil.relativedelta import relativedelta

# Load data
df = pd.read_csv("../data/org_repos_agg.csv")

# Convert the "Date" column to a datetime dtype
df['Date'] = pd.to_datetime(df['Date'])

def plotly_chart(df: pd.DataFrame,
                     group_col: str,
                     values_col: str,
                     date_col: str,
                     plot_title: str,
                     x_lab: str,
                     y_lab: str) -> None:
    # Group the DataFrame by Organisation
    grouped_df = df.groupby(group_col)
    
    data = []
    for org, org_df in grouped_df:
        
        # Create a scatter plot of the data points for each organisation
        scatter = go.Scatter(
            x=org_df[date_col],
            y=org_df[values_col],
            name=org,
            mode='lines',
            line=dict(width=3, dash='solid'),
            hovertemplate=f'%{{y:.0f}}'
        )
        data.append(scatter)
    
    # Set options
    min_xaxis = min(df[date_col])
    max_xaxis = max(df[date_col])
    max_yaxis = max(df[values_col])
    remove = ['zoom2d','pan2d', 'select2d', 'lasso2d', 'zoomIn2d',
            'zoomOut2d', 'autoScale2d', 'resetScale2d', 'zoom',
            'pan', 'select', 'zoomIn', 'zoomOut', 'autoScale',
            'resetScale', 'toggleSpikelines', 'hoverClosestCartesian',
            'hoverCompareCartesian', 'toImage']
    
    # Set layout
    layout = go.Layout(title=plot_title,
                       font=dict(size=12),
                       xaxis=dict(title=x_lab,
                                  # add more time to x-axis to show plot circles
                                  range=[min_xaxis - relativedelta(days=5),
                                         max_xaxis + relativedelta(days=5)]),
                       yaxis=dict(title=y_lab,
                                  # fix y0 at 0 and add 10% to y1
                                  range=[0, max_yaxis + (max_yaxis * 0.1)]),
                       showlegend=False,
                       hovermode="x unified")
    
    # Set configuration
    config = {'displaylogo': False,
              'displayModeBar': True,
              'modeBarButtonsToRemove': remove}
    
    # Create the figure and show()
    fig = go.Figure(data=data, layout=layout)
    fig.update_layout(template='plotly_white')
    fig.show(config=config)

plotly_chart(df, "Organisation", "Open Repositories", "Date", "The growth of open-source in healthcare", "Date", "Open Repositories")

Latest open-source statistics

Show code
import pandas as pd
import plotly.graph_objects as go
import datetime
from dateutil.relativedelta import relativedelta
from IPython.display import display, Markdown
from tabulate import tabulate

# Load data
df = pd.read_csv("../data/org_repos_agg.csv")

# Convert the "Date" column to a datetime dtype
df['Date'] = pd.to_datetime(df['Date'])

# Filter the latest date for each organization
latest_dates = df.groupby('Organisation')['Date'].idxmax()
latest_df = df.loc[latest_dates]

# Create a new column with hyperlinks for the "Organisation" column
latest_df['Organisation'] = latest_df.apply(lambda x: f"[{x['Organisation']}]({x['URL']})", axis=1)

# Drop the "Date" column and sort by "Open Repositories"
latest_df = latest_df.drop(['Date', 'URL'], axis=1).sort_values('Open Repositories', ascending=False)

# Calculate date data was rendered
today = datetime.date.today()
formatted_date = today.strftime("%d %B %Y")
display(Markdown('Data updated as of: `%s`.' % formatted_date))

Markdown(tabulate(latest_df, headers='keys', tablefmt='pipe', showindex=False))
Table 1: open-source statistics

Data updated as of: 14 April 2024.

Organisation Open Repositories Top Language Top License
nhsconnect 195 HTML Apache License 2.0
opensafely 188 Python MIT License
NHSDigital 182 Python MIT License
nhsuk 173 HTML MIT License
ebmdatalab 131 Python MIT License
nhsbsa 117 HTML MIT License
nhsx 90 Python MIT License
HFAnalyticsLab 59 R MIT License
ukhsa-collaboration 58 Python MIT License
nice-digital 57 JavaScript MIT License
Health-Education-England 50 Java MIT License
opensafely-core 50 Python Other
nhsengland 41 Python MIT License
nhs-r-community 38 R Creative Commons Zero v1.0 Universal
CDU-data-science-team 33 R Other
UKHSA-Internal 30 Python MIT License
BHFDSC 30 Python Apache License 2.0
nhs-bnssg-analytics 25 R GNU General Public License v3.0
The-Strategy-Unit 24 R Other
NHSLeadership 17 HTML Other
111Online 13 C# Apache License 2.0
nhs-pycom 11 Python MIT License
CQCDigital 5 C# MIT License
MHRA 5 JavaScript MIT License
Nottingham-and-Nottinghamshire-ICS 4 R MIT License
NHS-Blood-and-Transplant 1 C# GNU General Public License v3.0

NHS Python Community slack

If you want to learn more about this project, please join the discussion at the NHS Python Community Slack group.