Novem + Pandas = ♥
Posted January 10, 2024 by Sondov Engen ‐ 9 min read
Pandas is an amazing python library for data exploration loved by developers across the globe. In this article we shows how pandas is a perfect match for novem
The text herein is not endorsed by or in any way associated with the pandas community.
Why Pandas
When we started to build novem our goal was to create the tool we always wanted. This meant that novem had to be where we were already productive and work with the tools we already knew.
As ex finance and data-science people – python, and by extension pandas, was a natural starting point for us. Pandas is the premier data science library for python, allowing for fast and easy interaction, discovery and analysis of information.
As most good tools, pandas has a strong opinion of how things should be done, resulting in an effective, but somewhat difficult to learn, syntax. It was therefore important to us to that the knowledge and expectations our users had obtained with pandas should translate meaningfully when using novem.
In this article we show how novem plays to the strengths of pandas and how you can achieve a lot with little code.
Why Novem
Novem was built to make the presentation side of data analytics easier. We wanted a tool that would let us create beautiful reports, graphs, e-mails and dashboards straight from the command line.
Because of this, novem visuals doesn’t just look good, they are easy to integrate into more complex layouts such as reports and emails.
Combining novem with pandas you get the best of both worlds, you get the analytical and data processing power of pandas with the presentation and visualization power of novem.
Let your data come alive
Research has shown that it’s easier to understand what’s happening to data if the change can be accompanied with meaningful transitions. We were keenly aware of this when creating novem so data transitions are built into the fabric of our platform.Simply the act of updating the underlying data, or changing object attributes, will make novem construct a relevant transition. Below is a short example of how the above looping animation is created with a basic python script.
import pandas as pd # everything is easier with pandas
from novem import Plot # get the novem plot
from time import sleep # no need to rush it
# grab the state population sample data
df = pd.read_csv('https://data.novem.no/v1/examples/plot/state_pop.csv', index_col=0)
# the data is already sorted by total population size,
# so just get the 6 largest states
top = df.iloc[:6]
# construct novem plot, if the name already exists it will
# be updated
plt = Plot("state_pop_loop",
type="gbar",
name = "Novem Animation Demo - US States",
caption = "The five most populus states in the US. "
"Data from the Census Bureau Data API, but not endorsed or certified by "
"the Census Bureau. Calculations by novem.",
summary = "A looping animation of different views of us state populations"
)
# make it public for all to see
plt.shared = 'public'
# crate some steps
step = 0
steps = 5
# print our url so we can view it
print(plt.url)
# loop forever updating our plot (this is a demo after all)
while 1:
step = (step % steps) +1
if step == 1:
# transition to a grouped bar
plt.type = 'gbar'
sleep(3.0)
if step == 2:
# transition to a stacked bar
plt.type = 'sbar'
sleep(3.0)
if step == 3:
# don't filter the data, show all the states
plt.data = df
sleep(5.0)
if step == 4:
# push the top 6 states instead
plt.data = top
sleep(3.0)
As you can see from the above code, very little action is required to update a
novem visual, assigning an attribute is sufficient. You can also see novem has
native support for pandas DataFrame
and will extract the relevant information when
assigned to the plots' .data
endpoint.
All novem plots are live, so as soon as the value is updated, the corresponding information is pushed to the server. This makes novem very friendly to interactive experimentation in a REPL environment.
Another feature of the novem API is that all novem objects implements a callable
that expects a string or DataFrame
, and returns the same object on completion.
This means that novem can easily be integrated into existing
pandas workflows and processing chains.
import pandas as pd # everything is easier with pandas
from novem import Plot # get the novem plot
from time import sleep # no need to rush it
# grab the state population sample data
df = pd.read_csv('https://data.novem.no/v1/examples/plot/state_pop.csv')
# create a new_novem_plot that contains the population
# under 14 years for all US states
odf = df.iloc[:, :2].pipe(Plot('new_novem_plot', type='gbar'))
# Update the above plot to include only top 10 states
odf = odf.iloc[:10].pipe(Plot('new_novem_plot'))
print(Plot('new_novem_plot').url) # print url to plot
# all of the above operations works with a reference to a novem plot
# as well
# create new novem plot
np = Plot('another_new_novem_plot', type='gbar')
# repeat above code using chaining
adf = df.iloc[:, :2].pipe(np).iloc[:10].pipe(np)
Styling tables with pandas selectors
One of the things we really wanted to integrate well with pandas is the
DataFrame
selectors loc
and iloc
. These powerful primitives lets you
easily create complex filters on your data across values, categories and
shapes.
In this section we will show a few common scenarios when it comes to formatting tables and how pandas and novem solves this together.
We’ll be using our financial demo dataset, the Novem Example Index, for these cases. If you’re curious about how we created it you can read more here.
Top and Bottom Performers
Our first example is a Top and Bottom Performers table showing a common scenario where you want to bring attention to multiple different values in a table. Here we have the following goals:
- Create a pleasing header
- Right align numbers and apply percentage and numeric formatting
- Add some visual distance between the top and the bottom contributors
- Show a soft heatmap by coloring the return number according to their magnitude with more contrast indicating higher absolute magnitude
- As contributions are inherently smaller than the individual stock returns, make sure we use different value scales for the different columns
- Highlight all rows where the individual stocks have an absolute return over 10%
- Add a relevant caption
As you can see from the above table we’ve successfully applied the formatting. The code we used to create and format the table is available with comments below.
import sys
import pandas as pd
from novem import Plot
from novem.colors import StaticColor as SC, DynamicColor as DC
from novem.table import Selector as S
# grab our example numbers from novem data
df = pd.read_csv('https://data.novem.no/v1/examples/plot/nei_rgn_tb.csv', index_col=0)
# let's convert our Contribution to bps, Basis points)
df['Contribution'] *= 10000
# Create a novem e-mail table `mtable`
tb = Plot('nei_rgn_tb', type='mtable')
# add our data
df.pipe(tb)
# right align our numbers
tb.cell.align = S(df.loc[:, "NAV":], ">", df)
# add some spacing to the left and right of numbers
tb.cell.padding = S(df.loc[:, :], "l 2", df)
tb.cell.padding += S(df.loc[:, :], "r 1", df)
# add some padding below the header
tb.cell.padding += S(df.iloc[0, :], "t 2", df)
# add some padding between the top and bottom performers
tb.cell.padding += S(df.iloc[9, :], "b 4", df)
# let's format our numbers, first set NAV and all following
# numbers as decimal, then override Return as percent
tb.cell.format = S(df.loc[:, "NAV":], ",.1f", df)
tb.cell.format += S(df.loc[:, "Return"], ",.1%", df)
# use index colors
tb.colors.type = 'ix'
# let's highlight rows with an individual stock return above 10%
tb.colors = S(df.loc[df["Return"] > 0.1, :], SC("bg", "green-100"), df, c=":")
# let's highlight rows with an individual stock return below -10%
tb.colors += S(df.loc[df["Return"] < -0.1, :], SC("bg", "red-100"), df, c=":")
# Let's create a foreground, green, heatmap for our top 10 performers,
# we crate an individual heatmap for Returns and Contributions
tb.colors += S(df.iloc[:10, -1], DC("fg", min="green-300", max="green-600"), df)
tb.colors += S(df.iloc[:10, -2], DC("fg", min="green-300", max="green-600"), df)
# repeat the above exercise but for our detractors
tb.colors += S(df.iloc[10:, -1], DC("fg", min="red-600", max="red-300"), df)
tb.colors += S(df.iloc[10:, -2], DC("fg", min="red-600", max="red-300"), df)
# finally let's bold the text and add a bottom border to the top row, we'll add
# this manually as pandas slicing doesn't make it easy to slice for the index
# row
tb.cell.border = '0 : b 1 inverse' # row 0, all columns, bottom 1 wide border
# with the "inverse" color
tb.cell.text = '0 : b' # row 0, all columns, bold text
# add a caption to the table
tb.caption = """The above table shows the top 10 and bottom 10 contributors
to the Novem Example Index performance on the 10th of November 2022.
The return numbers are the individual stocks DTD return whereas the contribution
number is its individual contribution to the overall index expressed in basis points,
NAV is in million USD."""
# let's print our url
print(tb.url) # https://novem.io/p/NJy4w
We appreciate that the syntax might be a bit terse and this is because it’s designed to be written by hand while doing interactive data analysis. To provide some more context we’ll use the following line from code above:
If you want a more complex example which highlights both an hierarchical relationship as well as relative performance on a subset of the table you can have a look at our Hierarchy Performance Table.
Finally, if you want to see how it can all come together, we also have a fully working e-mail example.