submit to reddit
Bryan Van de Ven

Bokeh 0.5 Released!

We are excited to announce the release of version 0.5 of Bokeh, an interactive web plotting library for Python!

This release includes some major new features: support for Widgets and Dashboards, preliminary integration of Abstract Rendering for big data visualization, and a new schematic bokeh.charts interface for very high level statistical charting. Additionally, tighter Pandas integration makes using Bokeh with the tools we all love even easier, and a new, much simpler bokeh.embed module makes including Bokeh content in your documents a snap. Over forty bugfixes and several new smaller features are also included: log axes, minor ticks, a new plot frame, gears and gauges glyphs, an npm package for BokehJS, and many usability improvements.

Get It Now!

If you are using Anaconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

In order to push features to users even faster there are also now periodic dev builds made available. See the Developer’s Guide for more details.

BokehJS is also available by CDN for use in standalone Javascript applications:

    http://cdn.pydata.org/bokeh-0.5.0.min.js
    http://cdn.pydata.org/bokeh-0.5.0.min.css

Additionally, BokehJS is also now installable with the Node Package Manager.

Widgets

We will have a lot more to say and show about the new support for Bokeh widgets and how you can use them to create interactive data-backed dashboards, but for now here are some images of things you can create today with Bokeh. Click on any image to see the code that created it:

Bokeh Charts

The new bokeh.charts API presents very high level schematic statistical charts, available with just a couple of lines of code. Currently available are histograms, bar charts (stacked or grouped), and scatter plots. These charts can take as input either ordered dictionaries, or pandas data frames. There are also two different styles of interface: keyword arguments and chained methods. Here are a couple of examples:

A scatter plot using chained methods:

    from bokeh.charts import Scatter
    from bokeh.sampledata.iris import flowers

    df = flowers/%22petal_length%22%2C%20%22petal_width%22%2C%20%22species%22
    g = df.groupby("species")

    scatter = Scatter(g)
    scatter \
        .title("iris dataset, gp_by_input") \
        .legend("top_left") \
        .width(600).height(400) \
        .notebook().show()

A stacked bar chart also using chained methods:

    import pandas as pd
    df = pd.DataFrame(medals, index=countries)

    bar = Bar(df)
    bar.title("stacked, df_input") \
        .legend(True) \
        .xlabel("countries").ylabel("medals") \
        .width(600).height(400) \
        .stacked().notebook().show()

A histogram using keyword arguments:

    import pandas as pd
    df = pd.DataFrame(normal_dist)

    hist = Histogram(df, bins=50, mu=mu, sigma=sigma,
                     title="no_tools, df_input", ylabel="frequency", legend="top_left",
                     tools=True, width=400, height=350, notebook=True)
    hist.show()

Many other chart types are planned for the near future: boxplots, candlestick plots, violin plots, catgorical heatmaps, stacked areas, etc.

Abstract Rendering

Abstract rendering (AR) is a bin-based rendering system the enables Bokeh to work with server-side rendering of large datasets. The basic idea is that when rendering, all the data is binned into the pixels anyway. By generalizing that binning process, greater control over the end representation can be gained while still using very simple plot definitions.

For example, the following produces a map of the centroids of the 2010 US Census.

    source = ServerDataSource(
        data_url="/defaultuser/CensusTracts.hdf5",
        owner_username="defaultuser"
    )
    censusPlot = square( 'INTPTLONG','INTPTLAT', source=source)

However, the above plot does not distinguish between different densities of populations. Using simple alpha composition is insufficient because the range of over plotting is too extreme. Some areas only have one tract, while many population have several hundred tracts on a single pixel and New York City has over a thousand on one point.

Adding the following Abstract Rendering code converts the plot into a density plot of the US (using the tracts as proxies) and provides perceptual correction so that areas that appear twice as dark have twice as many items.

    heatmap = ar.source(censusPlot, palette=["Reds-9"])
    image(
        source=heatmap,
        title="Census Tracts",
        reserve_val=0,
        **ar.mapping(heatmap)
    )

Adding the AR processing refines the rendering process, but keeps conceptual simplicity of the original definition.

Abstract Rendering is compatible with out-of-core rendering (including distributed memory) and can be used to render very large data sets. Our reference Java implementation works with hundreds of millions of data points and the Python implementation is catching up quickly. Abstract rendering is currently in beta form, so expect significant improvements over the next few releases.

What’s Next?

The release of Bokeh 0.6 is planned for August. Some notable features we intend to work on are:

  • Dynamic and data-driven layout using the Kiwi.js constraint solver (work underway)
  • More chart types (boxplot, violin, contour, etc.) and auto-faceting for bokeh.charts
  • R and/or Matlab language bindings
  • Polar coordinate systems

How to get involved

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh@continuum.io

If you would like some help incorporating Bokeh into your Notebooks, apps, or dashboards, please send an email to info@continuum.io to inquire about Continuum’s training and consulting services - not just for Bokeh, but for anything in the full NumPy/SciPy/PyData stack.

Tags: Python Bokeh Visualization Big Data Widgets
submit to reddit
comments powered by Disqus