Creating a new track

Data tracks take input data tiles and display it within a browser. To create a new track type, it is necessary to go through a number of steps. For this tutorial, we'll create a new track which displays a box-plot.

Define a viewconfig section describing an instance of the new track

In order for HiGlass to display a new track, it needs to know that it should display the new track. As with all other tracks, this is part of the viewconfig. In production, viewconfigs can be generated by exporting the view. During development, they are either loaded from app/index.html (when viewing http://127.0.0.1:8080 after running npm start) or from app/scripts/testViewConfs.js when running the tests (npm run tests).

When creating a new track we recommend adding a test case to test/HiGlassComponentTest.jsx to ensure that it is created and functions properly. If this proves too troublesome, it's also possible to add the config for the new track in `app/index.html.

We'll be creating a new type of track called a horizontal-boxplot, so add the following section to the "top" view of the testViewConfig in app/index.html. The uid here should be unique to this instance of the track so we just give it some random string (xxyxx). The height specifies how high this track should be. The tilesetUid specifies the uid of the data source on the server. In this case we'll use data that exists on our public server (higlass.io).

{
    'uid': 'xxyxx', 
    type:'horizontal-boxplot',
    height: 40,
    tilesetUid: 'F2vbUeqhS86XkxuO1j2rPA',
    server: "http://higlass.io/api/v1" 
}

When index.html is loaded, it will create a HiGlass component using that viewconfig. When it gets to the "top" section and sees the definition of that track, it will try to render it. Since it doesn't yet exist, it won't be able to. To tell it how to render a horizontal-boxplot track, we have to create a class that can render it and associate it with the horizontal-boxplot track type.

Creating a class to render the new track type

To create the new track type, we'll use the horizontal-line track as a template. This track is defined in the app/scripts/HorizontalLine1DPixiTrack.js file. To begin, we'll copy this file:

cp app/scripts/HorizontalLine1DPixiTrack.js app/scripts/HorizontalBoxplotTrack.js

There's a lot of boilerplate in the track code but the important parts are in drawTile. In particular, the loop which iterates of tileValues does the actual drawing using the graphics.lineTo and graphics.moveTo function calls. These need to be changed to draw rectangles instead of lines. See the PIXI.js documentation to find the documentation for the drawRect function which needs to be called.

When polishing the track, the exportSVG method should also be implemented so that the view can be exported to SVG. This can be done after the new track is created and tested.

Associating a track type with a track rendering class

Now that we have a class which renders this track type, we need to associate it with the track name (horizontal-boxplot) which was used in the viewconfig. This resolution is done app/scripts/TrackRenderer.jsx. The easiest thing to do is to copy the example and plug in the newly created track names:

            case 'horizontal-boxplot':
                return new HorizontalBoxplotTrack(this.currentProps.pixiStage,
                                                     track.server,
                                                     track.tilesetUid,
                                                     handleTilesetInfoReceived,
                                                     track.options,
                                                     () => this.currentProps.onNewTilesLoaded(track.uid));

We also need to import HorizontalBoxplotTrack from its javascript file at the top of TrackRenderer.jsx:

import {HorizontalBoxplotTrack} from './HorizontalBoxplotTrack.js';

This should be enough to get the track to display if it's already specified in the viewconf. To make it discoverable and configurable, we need to add it to the list of known track types in app/scripts/config.js.

Making the track discoverable and configurable

To be able to add a track using the "Add Track" dialog (accessed using the plus sign icon in HiGlass), HiGlass needs to know what types of data it is capable of displaying. This is specified in app/scripts/config.js. For our new box-plot track, we'll just copy the config for horizontal-line and change it slightly:

    {
        type: 'horizontal-boxplot',
        datatype: ['vector'],
        local: false,
        orientation: '1d-horizontal',
        thumbnail:  null,
        availableOptions: [ 'labelPosition', 'labelColor', 'labelTextOpacity', 'labelBackgroundOpacity', 'axisPositionHorizontal', 'valueScaling' ],
        defaultOptions: {
            labelColor: 'black',
            labelPosition: 'topLeft',
            axisPositionHorizontal: 'right',
            valueScaling: 'linear'
        }
    }

This tells HiGlass, that whenever it encounters a tileset containing data of the type vector it can display it using horizontal-boxplot track. It tells it that it can be placed in horizontal orientation, which means it can only be added as a top or bottom track. The thumbnail option is null because we haven't specified a thumbnail for this track. The available options mean that we can set the position of the label (dataset name) position, color, opacity as well as the axis position and type of scaling (e.g. log or linear) of data as options. We also provide some default values for these options in case they're not specified.

That's it. The horizontal-boxplot track should now be ready to use. What follows in this page are some scattered thoughts on the nitty gritty topics of scales and tiles. They can be ignored for unless a more thorough understanding of the track operations is desired.

Advanced Track Topics (under construction)

Scales

Zoomed scales:

Horizontal tracks: this._xScale() Vertical tracks: this._yScale()

2D tracks: this._xScale() and this._yScale()

Original scales:

this._refXScale() this._refYScale()

To draw the data, it needs various scales. The HorizontalLine1DPixiTrack, for example, requires a valueScale that maps tile values to y positions within the track. This scale can be calculated in a number of different ways, but the simplest is to just use the maxVisibleValue() function of the track. This returns the maximum value present in the dense fields of all the visible tiles.

Other scaling methods may include... quantile scaling, log scaling, etc...

Custom tracks may require bespoke scaling methods. When drawing intervals, we may want to calculate what the maximum number of intervals that will be drawn on top of each other at any point will be. Then for each interval, we will want to calculate its y position.

If the track will rely on translations and zooms to move and rescale the content, it needs to set pMain = this.pMobile in its constructor and draw using the reference scales (this._refXScale and this._refYScale).

Implement the initTile function

This function is called when the tile is initially created.

It is especially useful for tracks that require heavy initial rendering and lighter transformations for zooming and panning. The HeatmapTiledPixiTrack, for example, creates the heatmap sprite and renders it in the initTile function. It omits the drawTile function because it wouldn't do anything and relies on the zoomed function to alter the graphic's translate and scale factor to change the view.

Implement the drawTile(tile) method:

Within the tile structure there is the tileData member which contains the data retrieved from the server. The tile object itself contains the following fields. The following is an example of a tile:

tile = {
    graphics: <Pixi.Graphics>,
    remoteId: "uid.4.3",
    tileId: "uid.4.3",
    tileData: {
        discrete: [[0,1,0],[0,3,0]],
        tileId: "uuid.0.0",
        tilePos: [3],
        zoomLevel: 4
    }

The tile object can also contain information that is relevant to its rendering. If it is meant to be displayed as text, then it can contain additional PIXI.Text objects which are simply rescaled when the tile is redrawn.

Implement a drawing method

There are two ways to draw the visible data:

Example: HeatmapTiledPlot: Each tile can be drawn completely independently of every other one.

Example: HorizontalLine1DPixiTrack.js: To connect the lines between adjacent tiles, we need a draw method that looks at the adjacent tiles.

Example: CNVIntervalTrack: We need to have all of the intervals that are visible ready so that we can create a layout where all the elements are considered and there's no overlaps.

Debugging notes

image

The chromosome axis shows the current position within a given chromosome. When zoomed out far enough, the numbers disappear showing only the chromosome names.

image

Because different genome builds have different chromosomes, the chromosome axis requires a list of chromosome sizes to function properly.

Example configuration

  {
    "chromInfoPath": "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
    "type": "vertical-chromosome-labels",
    "position": "top",
    "name": "Chromosome Labels (hg19)"
  }

The chromosome grid can be overlayed on top of heatmaps to show where the 1-based chromosome boundaries are.

Example configuration

  {
      'type': '2d-chromosome-grid',
      'position': 'center',
      'chromInfoPath': "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
  }

The property chromInfoPath should be a link to a file which contains the chromosome sizes:

chr1    249250621
chr2    243199373
chr3    198022430
...

Overview and Terminology

HiGlass (HG) is organized into multiple levels of display.

Website: The website that you are currently looking at.

App: The page that contains a fullscreen view of HG.

Container / Component: The part of the website that shows the actual HG views and tracks.

View: A collection of tracks that share common axes. Views can be linked by zoom.

Track: A region which contains plotted data. This can range from lines, gene annotations and axes on 1D tracks to heatmaps and annotations on 2D tracks.

Series: A set of data which is plotted in a track. Multiple series can be displayed in a single track. Each series requires a track in order to be displayed. A track cannot exist without any series.

Usage

Adding a new track

The most elementary task in HiGlass is adding a track. This can be done by clicking on the '+' sign of the view header and selecting where to add the new track. The list of available tracks is pulled from our server and filtered according to which datatypes can be displayed in this location. 2D data (heatmaps from cooler files) may be displayed either in the center, or on the edge tracks, where only the region near the diagonal will be shown. 1D data (e.g. from bigWig files) is limited to the edge tracks.

Each dataset can often be displayed in multiple ways. A 2D dataset can be displayed as a heatmap or as a rendering of the tiles which it is retrieved as. 1D datasets can [currently] be displayed as lines or as tile outlines. Gene annotations can be displayed as exon-intron plots or as tile ids.

feb-01-2017 10-38-15

Resizing a view:

Individual views can be resized but they always expand to fill the available vertical space.

feb-01-2017 10-46-19

Adding and removing views:

New views can be created by copying existing views. They can then be edited to display different data.

feb-01-2017 10-48-05

Replacing tracks

A common task is to replace an existing track with a new track. This can be accomplished by either first closing the original track and then adding a new one in its place or simply selecting Replace track from a track's configuration menu:

feb-01-2017 10-51-30

Changing a heatmap's colormap

The colormap for contact matrices can be changed from the 'track config' menu at the upper right corner of the heatmap. From there you can select from a number of preset color maps (afmhot, hot, jet ...):

feb-01-2017 10-52-23

These presets roughly correspond to the some of the examples available from matplotlib and are defined in the file app/scripts/config.js.

Custom color map values can also be set by selecting 'Custom ...':

Adding track labels

Track labels display the name of the dataset being displayed. Note that at the moment having multiple series will lead to overlapping labels. This will be fixed in future releases.

feb-01-2017 10-54-03

Adding horizontal heatmap tracks

Click the 'plus' icon in the upper right corner:

image

Pick the location where you want it displayed:

image

Pick a dataset:

image

Pick the way you want to display it (for a heatmap in the 'top' position, there's currently only one option: 'horizontal-heatmap')

image

After clicking 'submit', you should see the new dataset:

image

You can then flip it by changing its configuration using the little 'cog' icon that appears in the upper right corner when you hover over the track:

image

And selecting 'Rao et al...' -> 'Configure series' -> 'Flip heatmap' -> 'Yes'

image

To yield an upside down view:

image

To create new track types, there are a number of methods that need to be implemented:

Required

draw()

Render this track to the SVG or Pixi canvas.

Optional

These methods are not strictly required but may cause problems with certain functionality if not implemented.

exportSVG()

This method should return a string version of the SVG representation of this track. This is required for exporting and is not rendered.

Example: (from HeatmapTiledPixiTrack.js)

    exportSVG() {
        let svg = '<g>'
        for (let tile of this.visibleAndFetchedTiles()) {
            //console.log('sprite:', tile.canvas.toDataURL());
            let rotation = tile.sprite.rotation * 180 / Math.PI;

            svg += `<g
                    transform="translate(${tile.sprite.x}, ${tile.sprite.y})rotate(${rotation})scale(${tile.sprite.scale.x},${tile.sprite.scale.y})"
                >`;
            svg += '<image xlink:href="' + tile.canvas.toDataURL() + '"/>';
            svg += "</g>";
        }

        svg += '</g>';
        return svg;
    }

The HiGlass project consists of the following components:

Example custom tracks

Labelled Annotations
GeoJSON
Multivec
Time Interval Track

View compositions can be shared through the JSON configuration files defining them. Configuration files can be exported through the view config menu by either exporting as a JSON file or a hyperlink. To restore the composition, links can be clicked* and JSON files can be dragged onto the HiGlass client to load their contents.

Below is an example JSON config file. It contains separate sections for each view along with a host of other information defining how the views in HiGlass are laid out, how they're linked to each other. Within each view are section for the tracks that it contains. Track definitions point to the dataset which they render and contain additional styling information.

{
  "editable": true,
  "zoomFixed": false,
  "trackSourceServers": [
      "higlass.io/api/v1"
  ],
  "exportViewUrl": "higlass.io/api/v1/viewconfs/",
  "views": [
    {
      "uid": "aa",
      "initialXDomain": [
            0,
            3000000000
      ],
      "autocompleteSource": "higlass.io/api/v1/suggest/?d=OHJakQICQD6gTD7skx4EWA&",
      "genomePositionSearchBoxVisible": true,
      "chromInfoPath": "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
      "tracks": {
        "top": [
          {
            "type": "horizontal-gene-annotations",
            "height": 60,
            "tilesetUid": "OHJakQICQD6gTD7skx4EWA",
            "server": "higlass.io/api/v1",
            "position": "top",
            "uid": "OHJakQICQD6gTD7skx4EWA",
            "name": "Gene Annotations",
          }
            ,
          {
            "chromInfoPath": "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
            "type": "horizontal-chromosome-labels",
            "position": "top",
            "name": "Chromosome Labels (hg19)"
          }
        ],
        "left": [
          {
            "type": "vertical-gene-annotations",
            "width": 60,
            "tilesetUid": "OHJakQICQD6gTD7skx4EWA",
            "server": "higlass.io/api/v1",
            "position": "left",
            "name": "Gene Annotations",
            "options": {
                "labelPosition": "bottomRight"
            }
          }
            ,
          {
            "chromInfoPath": "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
            "type": "vertical-chromosome-labels",
            "position": "top",
            "name": "Chromosome Labels (hg19)"
          }
        ],
        "center": [
          {
            "uid": "c1",
            "type": "combined",
            "height": 200,
            "contents": [
              {
                "server": "higlass.io/api/v1",
                "tilesetUid": "CQMd6V_cRw6iCI_-Unl3PQ",
                "type": "heatmap",
                "position": "center",
                "options": {
                  "colorRange": [
                    "#FFFFFF",
                    "#F8E71C",
                    "#F5A623",
                    "#D0021B"
                  ],
                  "maxZoom": null
                }
              }
              ,
              {
                  "type": "2d-chromosome-grid",
                  "position": "center",
                  "chromInfoPath": "//s3.amazonaws.com/pkerp/data/hg19/chromSizes.tsv",
              }

            ],
            "position": "center"
          }
        ],
        "right": [],
        "bottom": []
      }
    }
  ],
  "zoomLocks": {
    "locksByViewUid": {},
    "zoomLocksDict": {}
  }
};

The HiGlass server is capable of loading tile data from different file types. While they may physically store the data in different formats, they share the capability of being queried for data at a given zoom level and location.

Multires cooler files

Multires cooler files are HDF5 files which store multiple contact matrices binned at different resolutions. Each individual contact matrix is stored using the standard cooler format.

Regular cooler files can be turned into multires files using the cooler coarsegrain command. See the Processing and importing data section of the wiki for more information about the format.

Hitile files

Hitile files sore 1D genomic data at multiple resolutions using the HDF5 format. They are created using the clodius package. See the BigWig section of the processing and importing data section of the wiki for information about creating hitile files.

Contents

At the root level, attributes define metadata about the file. This is perhaps best explained with a chunk of code:

    import h5py

    f = h5py.File('file.hitile')
    d = f['meta']
    d.attrs['zoom-step'] = zoom_step        # store every nth aggregation (zoom) level (default: 8)
    d.attrs['max-length'] = assembly_size   # the size of the genome assembly  (default: hg19)
    d.attrs['assembly'] = assembly          # the name of the genome assembly (default: hg19)
    d.attrs['chrom-names'] = bwf.chroms().keys()  # the chromosome names in the assembly (default ['chr1', 'chr2',...])
    d.attrs['chrom-sizes'] = bwf.chroms().values() # the sizes of the chromosomes (e.g. [249250621, ...])
    d.attrs['chrom-order'] = chrom_order    # the order in which the chromosomes are stored (default ['chr1'..., 'chrX', 'chrY', 'chrM'])
    d.attrs['tile-size'] = tile_size        # the size of each individual tile (default: 1024)
    d.attrs['max-zoom'] = max_zoom =  math.ceil(math.log(d.attrs['max-length'] / tile_size) / math.log(2))
                                            # the maximum zoom level (default: 22)
    d.attrs['max-width'] = tile_size * 2 ** max_zoom  
                                            # the maximum width of a tileset with this tile size and maximum zoom

Internally, the data is stored at each zoom-step'th zoom level as one long array.

Size

Because HDF5 compresses data when storing it, hitile files end up being smaller than their bigWig counterparts.

File BigWig size HiTile size Conversion time (seconds)
wgEncodeSydhTfbsA549CtcfbIggrabSig 595M 166M 480
E116-H3K4me2.fc.signal 203M 175M 455
E004-H3K79me1.fc.signal 710M 465M 577

Each gene can have multiple isoforms (combinations of exons and introns). These isoforms can overlap

chr4    115519557       115599381       UGT8    25      +       NM_001128174    7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115519557,115544034,115585150,115586835,115589240,115597080,    115520130,115544858,115585293,115586912,115589460,115599381,
chr4    115519557       115599381       UGT8    25      +       NM_001322112    7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115519557,115540578,115544034,115585150,115586835,115589240,115597080,  115520130,115540681,115544858,115585293,115586912,115589460,115599381,
chr4    115519557       115599381       UGT8    25      +       NM_001322113    7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115519557,115544034,115585150,115586835,115589240,115597080,    115520213,115544858,115585293,115586912,115589460,115599381,
chr4    115520440       115599381       UGT8    25      +       NM_001322114    7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115520440,115544034,115585150,115586835,115589240,115597080,    115520942,115544858,115585293,115586912,115589460,115599381,
chr4    115543522       115599381       UGT8    25      +       NM_003360       7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115543522,115585150,115586835,115589240,115597080,      115544858,115585293,115586912,115589460,115599381,

or they can be located on distant regions (even different chromosomes).

chr1    367658  368597  OR4F16  2       +       NM_001005277    81399   protein-coding  olfactory receptor family 4 subfamily F member 16       367658  368597  367658, 368597,
chr1    621095  622034  OR4F16  2       -       NM_001005277    81399   protein-coding  olfactory receptor family 4 subfamily F member 16       621095  622034  621095, 622034,
chr5    180794287       180795226       OR4F16  2       +       NM_001005277    81399   protein-coding  olfactory receptor family 4 subfamily F member 16       180794287       180795226       180794287,      180795226,

We want to display an overview of all known exons but we don't want our genes to extend across chromosomes. To resolve this, we show all overlapping sets of exons as single entities. Genes with annotations that are far away from each other and don't overlap will be displayed separately:

7368    115519557       115599381       UGT8    25      +       union_7368      7368    protein-coding  UDP glycosyltransferase 8       115544036       115597444       115519557,115519557,115520440,115540578,115543522,115544034,115585150,115586835,115589240,115597080  115520130,115520213,115520942,115540681,115544858,115544858,115585293,115586912,115589460,115599381
81399   367658  368597  OR4F16  2       +       union_81399     81399   protein-coding  olfactory receptor family 4 subfamily F member 16       367658  368597  367658  368597
81399   621095  622034  OR4F16  2       -       union_81399     81399   protein-coding  olfactory receptor family 4 subfamily F member 16       621095  622034  621095  622034
81399   180794287       180795226       OR4F16  2       +       union_81399     81399   protein-coding  olfactory receptor family 4 subfamily F member 16       180794287       180795226       180794287       180795226

Gene annotations show where genes are located on a given genome. When zoomed in, the full exon-intron structure is shown. Because genes can be transcribed into numerous isoforms, we calculate an overlapping union of the known exons for each gene and display that. More details can be found on the Displaying gene annotations wiki page.

Available options

plusStrandColor and minusStrandColor

The colors for the annotations on each strand can be changed via the + Strand Color and - Strand Color configuration options in the Configure Series menu.

Annotations are tiled and require a server and tilesetUid. The labelPosition option can either be omitted or set to hidden if no label is desired.

Example configuration
  {
    "type": "vertical-gene-annotations",
    "width": 60,
    "tilesetUid": "OHJakQICQD6gTD7skx4EWA",
    "server": "/api/v1",
    "position": "left",
    "name": "Gene Annotations",
    "options": {
        "labelPosition": "bottomRight"
    }
  }

HiGlass is a web application for displaying genomic contact matrices.

Demo

An online demo can be found at higlass.io

Running locally

HiGlass can also be run locally as a docker container. The higlass-docker repository contains detailed information about how to set it up and run it.

The simple example below stops any running higlass containers, removes them, pulls the latest version and runs it.

docker stop higlass-container; 
docker rm higlass-container;

docker pull higlass/higlass-docker:v0.6.1 # higher versions are experimental and may or may not work


docker run --detach \
           --publish 8989:80 \
           --volume ~/hg-data:/data \
           --volume ~/tmp:/tmp \
           --name higlass-container \
           higlass/higlass-docker:v0.6.1

The higlass website should now be visible at http://localhost:8989. Take a look at the documentation for adding a new track to see how to display data.

For security reasons, an instance created this way will not be accessible from hosts other than "localhost". To make it accessible to other hosts, please specify a hostname using the SITE_URL environment variable:

docker run --detach \
           --publish 8989:80 \
           --volume ~/hg-data:/data \
           --volume ~/tmp:/tmp \
           --name higlass-container \
           -e SITE_URL=my.higlass.org \
           higlass/higlass-docker:v0.6.1

To use the admin interface for managing the available datasets, a superuser needs to created:

docker exec -it higlass-container higlass-server/manage.py createsuperuser

Once a username and password are created, the admin interface can be accessed at http://localhost:8989/admin.

Processing and importing data

Large datasets need to be converted to multiple resolutions so that they can be tiled and displayed using higlass. Unfortunately, due to the variety of data types available there are different procedures for different starting file types.

Cooler files

Cooler files store genome contact matrices as HDF files. Typical cooler files store data at one resolution. To support zooming, they need to be converted to multi-resolution cooler files. Starting with the highest resolution you would like to visualize in a file called matrix.cool:

pip install cooler
cooler zoomify --balance matrix.cool

This command will aggregate the contact matrix in matrix.cool to produce multiple normalized zoom levels, storing the resulting contact matrices in matrix.multi.cool. This can then be loaded into higlass:

docker exec higlass-container python higlass-server/manage.py \
  ingest_tileset \
  --filename /tmp/matrix.multi.cool \
  --datatype matrix \
  --filetype cooler 

Creating cooler files from contacts

If a cooler file doesn't already exist, it can be created from a list of contacts (positions of pairs of genomic loci) and a set of chromosome sizes. Here's an example of a tab-delimited contact list or "pairs file":

chr1       124478180       -       chr1       121966441       +
chr1       124478180       -       chr1       121760032       +
...

It can be aggregated into a multi-resolution cooler using the following commands:

CHROMSIZES_FILE=hg19.chrom.sizes
BINSIZE=1000
CONTACTS_FILE=contacts.tsv

cooler cload pairs -c1 1 -p1 2 -c2 4 -p2 5 \
     $CHROMSIZES_FILE:$BINSIZE \
     $CONTACTS_FILE.sorted \
     out.cool

cooler zoomify out.cool

Note that the order of the chromosomes in the chromosome sizes file should match the coordinate system used in HiGlass.

BigWig Files

BigWig files need to be processed using the clodius package before they can be displayed in higlass:

pip install clodius
clodius aggregate bigwig file.bigwig

The default bigwig aggregation will assume that the chromosome sizes are from hg19. To aggregate for a different assembly use the --assembly option. E.g. --assembly mm9. It is also possible to pass in a set of chromosome size with the --chromsizes-filename option. Even though chromosome sizes are stored in the bigWig file, the conversion script requires an ordering as provided by the chromsizes-filename to produce the hitile file.

This will convert file.bigwig into a higlass-legible file. If no filename is specified using the --output-file option, the original extension is replaced with .hitile. This hitile file can then be loaded into higlass:

docker exec higlass-container python higlass-server/manage.py \
  ingest_tileset \
  --filename /tmp/file.hitile \
  --filetype hitile \
  --datatype vector \
  --name "Some 1D genomic data"

bedGraph files

Data can be imported from text files which have a bedGraph-like format:

chrom   start   end     eigU    eigT    eigN    GC
chr1    3000000 3020000 -0.30001076078261446    -0.28139497528740076    -0.4257141574669923     0.39005
chr1    3020000 3040000 -0.6506417814728713     -0.04220806911621135    -0.7562304803612467     0.3995
chr1    3040000 3060000 -0.5962263338769729     -0.58579839698137       -0.5406451925771123     0.38845

These files need to be aggregated and converted to hitile files using clodius:

pip install clodius
clodius aggregate bedgraph file.tsv --output-file file.hitile --assembly hg19

The columns containing the chromosome name (--chromosome-col), the starting position (--from-pos-col), the ending position (--to-pos-col) and the values (--value-col) can be specified as 1-based parameters. They default to 1,2,3 and 4, respectively. The genome assembly defaults to hg19 but can be changed using the --assembly parameter.

Note: The entries in the bedlike file must be sorted so that the order of the chromosomes matches the order defined in the negspy package (e.g. hg19/chromOrder.txt). For assemblies such as hg19 and mm9 this defaults to a semantic ordering (e.g. chr1, chr2, chr3... chrX, chrY, chrM).

Bedpe-like files

2D annotations often have a two start and end points:

chr10   74160000        74720000    chr10    74165000    74725000
chr12   120920000       121640000    chr12    120925000    121645000
chr15   86360000        88840000    chr15    86365000    88845000

These can be aggregated using clodius:

clodius aggregate bedpe \
    --assembly hg19 \
    --chr1-col 1 --from1-col 2 --to1-col 3 \
    --chr2-col 4 --from2-col 5 --to2-col 6 \
    --output-file domains.txt.multires \
    domains.txt

Once created, they can be entered into higlass using docker:

docker exec higlass-container python higlass-server/manage.py \
  ingest_tileset \
  --filename /tmp/domains.txt.multires.db \
  --filetype bed2ddb \
  --datatype 2d-rectangle-domains

Gene annotation files

Gene annotation files store information about exons, introns and gene names. They are sqlite3 db files with a schema that is compatible with higlass-server. Creating these files first requires a bed-like list of gene annotations:

chr5    176022802    176037131    GPRIN1    7    -    union_114787    114787    protein-coding    G protein regulated inducer of neurite outgrowth 1    176023808    176026835    176022802,176036999    176026878,176037131
chr8    56015016    56438710    XKR4    8    +    union_114786    114786    protein-coding    XK, Kell blood group complex subunit-related family, member 4    56015048    56436786    56015016,56270237,56435839    56015854,56270437,56438710

These can be generated from publicly available data as described in the clodius wiki. This bed-like file then needs to be aggregated for multiple resolutions and converted to an sqlite3 db file using clodius:

pip install clodius
clodius aggregate bedfile \
    --max-per-tile 20 --importance-column 5 \
    --assembly hg19 \
    --output-file gene-annotations.beddb
    gene-annotations.bed

Once created, the gene annotations file can be loaded into higlass:

docker exec higlass-container python higlass-server/manage.py \
  ingest_tileset \
  --filename /tmp/gene-annotations.beddb \
  --filetype beddb \
  --datatype gene-annotation \
  --coordSystem hg19 \
  --name "Gene Annotations (hg19)"

The horizontal heatmap track is similar to the regular heatmap but it is rotated 45 degrees so that the diagonal lies along the x-axis. While it shows 2D data, this view is technically a 1D track and can be added to the top, left, right, or center track regions.

In the top and bottom configurations, the default is for the diagonal to be facing down. In the left and right configurations, the default is for the diagonal to face right. This default can be changed by selecting yes on the Flip heatmap option.

Example config

  {
    "tilesetUid": "CQMd6V_cRw6iCI_-Unl3PQ",
    "server": "http://higlass.io/api/v1",
    "position": "center",
    "type": "horizontal-heatmap",
    "height": 120,
    "options": {
      "maxZoom": null,
      "labelPosition": "bottomRight",
      "colorRange": [
        "#FFFFFF","#F8E71C","#F5A623","#D0021B"
      ],
    }
  }

Options

Horizontal and vertical heatmaps have the same options as regular heatmaps. See the Heatmap section above for more information.

image

Horizontal lines tracks display 1D tiled data as a line.

Value scaling

The values in a line are scaled according to the minimum and maximum visible values in the currently visible tiles (the so-called "visible values"). If the default linear scaling is selected, then values are scaled linearly from the minimum to the maximum visible values. If log scaling is selected, then to avoid having to scale values equal to 0, a pseudocount equal to the median of the "visible" values is added to each value and values are scaled from log(median_value) to log(max_value+median_value).

Configurable options

Label position

The label position indicates where the name of the track will be drawn. The example on the left has been labelled as "wgEncodeSydhTfbsGm12878Rad21IggrabSig.hitile". The available values are topLeft, topRight, bottomLeft, bottomRight, and hidden.

Axis position

They can be adorned with an axis using the axisPositionHorizontal option. The default value is right, but can be set to null or hidden if no axis is desired. For vertical line axes use the axisPositionVertical option with available options top, bottom and hidden.

Stroke color

The stroke color determines how to color the drawn line. It can be configured using hex or word colors in the config file or selected from the presets shown in the track config menu.

Example configuration (horizontal line)

   {
    "server": "http://higlass.io/api/v1",
    "tilesetUid": "b6qFe7fOSnaX-YkP2kzN1w",
    "type": "horizontal-line",
    "options": {
            labelPosition: 'topLeft',
        "axisPositionHorizontal": "left",
           lineStrokeColor: 'blue',
    }
  }

Example configuration (vertical line)

   {
    "server": "http://higlass.io/api/v1",
    "tilesetUid": "b6qFe7fOSnaX-YkP2kzN1w",
    "type": "vertical-line",
    "options": {
        "axisPositionVertical": "right"
    }
  }

The rectangular heatmap is one of the central plot types in HiGlass. It depicts matrices by coloring each cell according to its value. Data is pulled in remotely from a server and rendered client-side. This configuration gives us the opportunity to dynamically change how the data is displayed by changing its scaling and color mapping.

Value scaling

Values in rectangular (and horizontal and vertical) heatmaps are scaled logarithmically. Some cells may, however, have values of 0 which make logarithmic scaling impossible. To get around this, we add the minimum non-zero value in the visible area to each value as a pseudocount. The colors used to display these values are then scaled from log(min_value) to log(min_value + max_value), where min_value is the minimum non-zero value in all of the currently visible tiles and max_value is the maximum value in all of the currently visible tiles.

Color map

The color map of the heatmap can be changed through the track configuration options menu. The presets roughly correspond to the some of the examples available from matplotlib and are defined in the file app/scripts/config.js. The colors are spaced and interpolated evenly over the range of visible values.

Histogram-based color selection is planned for future releases.

Example configuration

  {
    "server": "/api/v1",
    "tilesetUid": "CQMd6V_cRw6iCI_-Unl3PQ",
    "type": "heatmap",
    "position": "center",
    "options": {
      "colorRange": [ "#FFFFFF", "#F8E71C", "#F5A623", "#D0021B"],
      "maxZoom": null
    }
  }

Custom color maps can be defined by selecting the Custom option. Up to five different colors can be selected. The cell values in the matrix will be interpolated evenly over the range of colors. More information on the color interpolation can be found in the documentation of d3's continuous scales.

Label position

Selecting a position ('top left', 'top right', 'bottom left' and 'bottom right') from the Label position configuration menu will place a label with information about the track in that position. Currently, the track label shows the following information:

Dataset name: The name of the dataset being displayed. This is either the name that was supplied when the file was uploaded, the name of the uploaded file (if no name was explicitly provided) or the name of the track.

Current data resolution While HiGlass provides smooth zooming, the data is stored and served at discrete resolutions. The current data resolution shows the resolution of the data being served at the current zoom level.

image

The regular axis track shows absolute positions along a given axis. It does not distinguish between chromosomes and is thus included primarily for debugging purposes.

Example configuration

{
    "type": "top-axis",
    "position": "top",
    "name": "Top Axis"
}

Tiles in HiGlass

HiGlass only requests small chunks of data corresponding to the visible region from the server. As seen on the left, any higlass view is composed of a number of "tiles" which are pieced together to form the visible region on the screen. Tiles are identified by their zoom level, x position and y position (shown as z/x/y on in the figure).

Tiles can be classified according to the dimensionality of the data they contain (1D or 2D) and according to the structure of the data (sparse or dense). Dense data containing tiles (e.g. for matrices or lines) store the data as an array. The positions of each data point can be determined by the tile's position and the index of the datapoint within the dense array. They are transferred between the server and client as base64 encoded strings.

Sparse tiles are more flexible in the type of data they contain and they contain position information for each data point. This makes it possible to display features such as gene annotation that are present in some locations and not others. Each dense. 2D tile contains the data for a 256x256 pixel region. Due to the free zooming, when a tile is displayed, it can take up an area smaller or larger than 256x256 pixels. Dense 1D tiles contain 1024 data points.

All tile data is compressed on the server and extracted on the client to minimize the amount of data which needs to be transferred.

Motivation

To display large datasets, HiGlass relies on aggregation and tiling to fetch only the visible region at any given time. A high-resolution (1Kb) genomic contact map will be matrix of roughly 3M rows and 3M columns. While the sparsity of the matrix implies that the majority of the cells in the matrix will be unpopulated, these cells still need to be rendered. Assuming one pixel per column, a monitor would need to be approximately two and a half football fields long to display it in its entirety. To fit the entire matrix in a single monitor, we need to aggregate data so that we are displaying multiple rows and column on each pixel.

Aggregation

Aggregation is the process of reducing larger datasets to smaller ones for the purposes of displaying more data than can fit on the screen at once. While there are a multitude of ways to aggregate large datasets, we make significant use of summation and prioritization for numerical and categorical data, respectively. The following sections will provide examples for how we use aggregation to reduce the size of larger datasets.

Summation

Aggregation by summation is simply aggregation of adjacent elements by summing their values. This can be generalized to any n-dimensional data set, but we only employ it for matrices and vectors.

Contact matrices

In the case of contact matrices, this is easily accomplished by summing adjacent cells. Consider the following matrix:

1234
5678
9101112
13141516

We can perform a round of aggregation, summing each block of four cells into one:

1422
4654

From this state, we could perform one more round of aggregation and end up with a matrix with just one entry which is simply the sum of all the values of our original matrix.

136

This procedure is an example of how we can lower the resolution of our matrix so that it can be displayed using fewer pixels.

Vectors

Similar to matrices, vectors can also be reduced by summation. The vector

1234

reduces to

37

and can be further reduced to

10

While summation is neither the only way nor, perhaps, the best way to reduce large matrices and vectors, we find that it not only serves our purposes but meshes smoothly with the notion of binning in the creation of contact matrices. A contact matrix that starts out at 1Kb resolution becomes a matrix at 2Kb resolution after one round of aggregation. After 14 rounds of aggregation, an entire 1Kb resolution human contact matrix can fit into a 256x256 pixel image (3.2e9 < 256 2 ** 14 1000). The resolution of the data at this level of aggregation is 1Kbp * 2 ** 14 = 16384 Kbp per pixel (or bin).

Prioritization

Another method of aggregating data is by picking out entries from a dataset according to some importance function. This is commonly found in maps. Showing every village on an overview of the world would be useless because all of the labels would overlap. Showing every village when only a county is displayed, however, makes more sense. As the size of the area increases, labels are selectively hidden to show features with a higher priority (often population or area, in the case of maps). The same holds true for gene annotations and other genomic features.

Gene Annotations

To display every gene label in the genome on one monitor is impossible. The labels would overlap. By prioritizing some labels over others and selectively hiding those with lower priority, we can maintain a nearly constant number of non-overlapping labels at any resolution. To do this, we first declare that we will attempt to display no more than 100 gene labels in any 1024 pixel region. We then aggregate adjacent regions by taking the 100 most 'important' entries from the union of the genes in the two regions. This can be illustrated using a simple example where we begin with a list of prioritized regions

region start region end gene priority
1 1 A 9
2 2 B 2
3 3 C 6
4 4 D 13

and aggregate them such that no single region has more than one gene in it:

region start region end gene priority
1 2 A 9
3 4 D 13

And once more for completeness:

region start region end gene priority
1 4 D 13

Track configuration

Each track can be configured to a certain extent. All configuration and track-related operations can be accessed to through the track configuration menu. This menu only appears on mouseover so if you don't see it, move the mouse out of and back onto the track. An example of the menu can be seen in the screenshot on the right.

Track information

Many tracks can display information about themselves. This can be enabled by selecting Label Position for a particular series in a track and picking a location (such as Top Left).

image

This will display some information about the track in one of its corners:

image

Zoom limiting

A common use case is to limit the resolution of the data which is visible. While this may result in a more coarse-grain image, it can also preserve features that are only visible under a more coarse-grain aggregation:

image

This option can be found under the Configure Series -> Zoom limit menu:

image

Axes

Axes can be added to horizontal-line and vertical-line track types.

image

Selecting left or right (top or bottom on vertical-line tracks), places the axis in the selected position:

image

Track operations are actions that affect how tracks are displayed and how they interact with one another. To access the track operations menu, click on the cog icon that appears when the mouse is over a track.

HiGlass can display data in a variety of different track types.

Rectangular heatmap
Horizontal and vertical heatmaps
Line
Chromosome grid
Chromosome axis
Viewport projection
Gene annotations

Views

Views are visible units with their own x and y scales. Every track within a unit shares the same view-wide x and y scales. 2D tracks in the middle use both the view-wide x and y scales. Horizontal tracks use the view-wide x scale and ignore the y. They can define their own y scale (as in Line tracks) that scales values in the track or they can simply display information without a y scale (gene annotations). Vertical tracks ignore the view-wide x scale and can define their own.

All tracks share the same scaling factor.

Adding new views

New views are created by clicking the copy view icon on the right side of the view header. The newly created view will be a copy of the view on which the icon was clicked. When created, it will try to place itself at nearest available position moving left to right, top to bottom.

Closing views

Views are closed by clicking the close view icon. The vertical space which is occupied by a view can then be compacted by views below it.

Cross-view operations

Cross view operations involve transferring or linking parameters (such as scaling factor and location) between two separate views. They are always initiated from the view settings menu and always involve the selection of a target view. When a cross-view operation such as taking the zoom level is initiated, a target view selector appears as green overlay. Hovering over different views moves the target view selector to that view. Clicking on a target performs the operation between the source view (the one which initiated the operation) and the target view (which was selected).

View synchronization

While scales between views are generally independent, it is possible to synchronize the axes of one view with another.

Take zoom from

Taking the zoom from a different view sets the scaling factor of this view to that of the target view. Both views remain centered on the same point that they were centered on before the operation.

Take location from

Taking the location from a different view sets the center of this view (along both the view-wide x and y axes) to the the center of the target view.

Take location and zoom from

Taking the location and zoom from a different view centers and zooms this view on the same location (e.g. same center point) as the target view.

View linking

Having independent view-wide x and y scales is useful for displaying different regions in different views, but there are situations when we may want to link (lock) scales between views. This operation maintains a constant difference between either the center points or scale factors of two views. This constant difference is equal to the difference in the parameter (center point or scale factor) at the time of linking.

It is entirely possible to link more than two views. The pairwise differences in parameters are maintained between all of the members of the zoom group.

Linking views by zoom level (scale factor)

Views linked by zoom level maintain a constant zoom separation. When one view is zoomed, the other linked views follow. The locations remain free and panning is unconstrained between views.

Linking views by location (center point)

Views linked by location maintain a constant separation of their center points. They may be scaled independent of each other but the difference in center point location remains constant. Note that zooming often modifies the center point so zooming operations may appear to move both views but this is simply a byproduct of the ability to zoom into points away from the center.

Linking views by zoom and location

Views linked by zoom level and location always maintain a constant separation between both parameters. Zooming in one zooms or moves in the other.

Unlinking

Any parameter linking can be removed from the view settings menu.

Syncing and linking at same time

A common operation is taking the zoom level and location from a different view and linking their zooms and locations. This is useful when one wishes to compare identical locations in multiple samples. While this operation can be accomplished by first taking the zoom and location and then linking the zoom and location, we've also included a convenience menu option which performs both operations with one action.

Searching for a gene or genomic coordinate

It's possible to search for a particular locus using the genome position search box:

image

If the genome position search box isn't visible, it can be enabled by toggling it in the view config menu:

The viewport projection track shows the bounds of one view on another. It is useful when showing the same dataset at two different resolutions. To instantiate it, select the settings of the target view (the one whose bounds we want to draw) and click 'Show this viewport on'. This will then let you select another track (2D) onto which to display the bounds of the target view.

Available options

Example configuration

  {
    "uid": "FI58zIkYQKe2S--8x6Iwfg",
    "type": "viewport-projection-center",
    "fromViewUid": "A4tM32baS9qnYB0HCAiuTg",
    "options": {},
    "name": "Viewport Projection",
    "options": {
        "projectionFillColor": "#777",
        "projectionStrokeColor": "#777",
        "projectionFillOpacity": 0.3,
        "projectionStrokeOpacity": 0.3
    }
  }

Overview

Common Tasks

View Operations

Track Types

Developer Documentation

Example Custom Tracks