# Helen's Notes and Links from [EMiT 2019](https://emit.tech/emit/emit2019/programme/) All talk slides can be found [here](https://emit.tech/emit/emit2019/programme/) ### Intro: **Prof. Liz Towns-Andrews:** touched upon industrial strategy - AI funding increasing via gov investment. NB there’s a Yorkshire industrial strat group. Shout out to [3Mbic](https://3mbic.com/) and [NPL](https://www.npl.co.uk/) collaborations at Huddersfield. ### Keynote: data flow supercomputer big data **Prof. V. Milutinovic** [home.etf.rs/~vm/ISCUltimateDataFlow.ppt](http://home.etf.rs/~vm/ISCUltimateDataFlow.ppt) * inspiration sources all have famous books Feynman's notes on computing Von Neumann paradigm talks about relation on time to calculate vs time to communicate. Now we computer much faster than can communicate - moving to novel architectures moving to multiscale approach using aSoGs via FPGAs with [maxeler](https://www.maxeler.com/technology/dataflow-computing/). Used in physics computational code - bad mouthing Fortran MPI - suggesting that these methods could be helpful with a willingness to move technique for existing code. **See presentation for paper reading suggestions ** * Like all these new hardware solutions, have their own set of coding languages here prefaced with max. * highlights Kolmogorov complexity * has geophysics applications: * Schlumberger seismic data acquisition: [Nemeth et al. seg 2008 implementation of the acoustic wave equation of FPGA](https://www.maxeler.com/media/documents/MaxelerSummaryAcousticWaveEquation.pdf) - allows lower energy consumption to do calculations on the vessel * slide 34 gives weather forecasting example from internship students - see later slides for more applications [appgallers.maxeler.com](appgallers.maxeler.com). [mi.sanu.ac.rs/~appgallery.maxeler](mi.sanu.ac.rs/~appgallery.maxeler) * Possible to get access to the machine in London * potential increase in HPC quality of life impacts - supercomputers are huge and energy-hungry - control flow is very energy costly compared to the future possible data flow technology. * currently looks not sure open source * [https://github.com/maxeler](https://github.com/maxeler) * [more on dataflow programming](https://www.doc.ic.ac.uk/~georgig/OpenSPL2015/lectures/OpenSPL_4.pdf)
## Stands and posters * [National Physic Laboratory](https://www.npl.co.uk/) had a dashboard set up with sensors on a turbine * used [node.red]( https://nodered.org/)and suggested [https://thingsboard.io/](https://thingsboard.io/) as dashboard interfaces for easy drag and drop setups. Also suggested [graphly](https://graphly.io/) services.
## Novel hardware ### Mike Ashworth [exascale project](https://www.exascaleproject.org/) - weather forecasting [Similar slides to EMiT talk](https://www.ecmwf.int/sites/default/files/elibrary/2018/18627-fpga-acceleration-lfric-weather-and-climate-model-euroexa-project-using-vivado-hls.pdf) * [exascale computing](https://insidehpc.com/ecp/) * [FPGA](https://www.edgefx.in/fpga-architecture-applications/) offer large performance per W gains * [LFRic](https://www.metoffice.gov.uk/research/modelling-systems/lfric) - Dynamics from [GungHo](https://www.metoffice.gov.uk/research/foundation/dynamics/next-generation), Scalability and Accuracy focusing. Built on the concept of “separation of concerns” i.e. separating scientific code from parallel code - also speaks about [PsyClone](https://www.esiwace.eu/events/5th-enes-hpc-workshop/presentations/ford-porter-psyclone). * Helmholtz solver saw ~50%. Speed up. * requires either [c code vivado hls](https://en.wikipedia.org/wiki/Xilinx_Vivado) or [opmss bsc](https://pm.bsc.es/ompss), or the max j or [open cl](https://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/opencl/opencl-03-basics.pdf?__blob=publicationFile) etc. * focuses on the c in vivado hls to rewrite the UM in C on [zynQ ultrascale](https://www.xilinx.com/products/technology/ultrascale-mpsoc.html) blocks * currently not as fast but MUCH cheaper. Note [Breton white paper on gpu vs FPGA performance/price](http://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf). * beginning to move into LFRic code ### Event based computation **P. A. Bogdan** * moving from deep nns to spiking nns both have hardware accelerators * current applications outside of neuroscience are difficult * qr code videos around. Showed event based camera sensors i.e rather than frames just picks out changes in luminosity. * [Deep Learning in Spiking neural networks](https://arxiv.org/pdf/1804.08150.pdf)
## Deep learning and GPUS ### Porting [Telemac mascaret](http://www.opentelemac.org/index.php/22-introduction/83-welcome-to-telemac) to open power ** J. Grasset** * TELEMAC is a traditional GFD model Fortran based MPI code to work on CPUS * this guys is looking to use GPUS instead of CPUS * identified a huge bottleneck * move to an openACC with MPI to run on GPUS - run much quicker also use hybrid open mp MPI to see a smaller improvement. Also saw an increase using the pig compiler vs the IBM - on Nvidia hardware * interesting comments on smt speedups not being as good as would be expected - seen not just here ## demos supercomputer for material sciences **M. Turchetto** * [samma.hse.ru](samma.hse.ru) * the demos system was focused on GPUS focused cheap alternative to for example Intel Xeons - although more constrained on applications - but significantly cheaper * now have a hybrid supercomputer with Tesla GPUS
## machine learning parallelisation **Dr Torsten Hoefler** * [T Ben-Nun demystifying parallel deep learning, arXiv, 2018](https://arxiv.org/pdf/1802.09941.pdf) * [Scalable Parallel Computing Laboratory - ETH](https://spcl.inf.ethz.ch/) * [Nui et al 2011](http://i.stanford.edu/hazy/papers/hogwild-nips.pdf) and [Dean et al 2012]( using stale synchronous - adjusting weights according to how old the data is so modern data is weighted higher than old data * also shows examples of losing data [convergence of sparsified methods nips 18](https://papers.nips.cc/paper/7837-the-convergence-of-sparsified-gradient-methods.pdf) * [deep 500](https://www.deep500.org/) - 500 ways to optimise your Neural Network with a [github codebase](https://github.com/deep500/deep500) [https://spcl.inf.ethz.ch/Publications/.pdf/deep500_ipdps19.pdf](https://spcl.inf.ethz.ch/Publications/.pdf/deep500_ipdps19.pdf)
## Panel discussion ### key emerging technique to track? * [https://www.oerc.ox.ac.uk/projects/ops](https://www.oerc.ox.ac.uk/projects/ops) ops with caution - coding with no care for the architecture can never be done but something like this could be used works with CUDA * see [https://op-dsl.github.io/](https://op-dsl.github.io/) porting applications across parallel architectures - panel seemed to not super like this idea but perhaps for not so emerging technologies could be handy * noted that archer 2 will be lacking even GPUs - large resistance in porting code. ### ensuring requisite skills to get uptake? * spoke about UKRI [kpt](https://www.gov.uk/guidance/knowledge-transfer-partnerships-what-they-are-and-how-to-apply) push funding for increasing knowledge base across education and industry
## Novel Software ### Lattice Boltzmann method **L. Ragta** * using on multiple architectures - wants to future proof the software * HiLeMMS (High-Level Mesoscale Modelling System) using mesh solvers in domain-specific language based on [ops library](https://www.oerc.ox.ac.uk/projects/ops) to run on different architectures ### Pattern mining - **S. Titarenko** * used to clustering or classification analysis * wants to work in real time so can have limits on storage and compute time - needed optimising * use farpam for optimised constraints on pattern-mining * packs binary info into smaller amounts on memory and used binary logic * give an example of patterns in temperature change in European cities dropped the pattern mining from few hours to minutes with uncertainty constraints few hours dropped to few seconds (~6000 times faster!) * optimised by sorting first and storing only unique events to reduce length, and then transfer to binary vectors * Allows to further reduce the number of patterns required to check. * highlights the importance of compression and efficient multithreading and simplifying the problem.
## WORKSHOPS
## Deep learning workshop ## Introduction * slides available (talk similar to AI day workshop in dec) * [https://developer.nvidia.com/deep-learning](https://developer.nvidia.com/deep-learning) ### SPANNER * bio inspired neural networks - spike driven * modelled on synapses * use hardware advancements to create “homeostasis” * focused on fault tolerance and self-repairing * Astrocyte neural networks * Use FPGA systems (field programmable gate array) * Neuromorphic engineering - analogous to biological processes * looks at electrical characteristics of neurone and synapses * Spiking neural networks - input currents give output spikes * **Biological overview** Wade 2011 bidirectional coupling between astrocytes... (plos one 6.12) * Kinda cool works with currents and transmitting probability * showed an example of a fault-tolerant self-driving robot car * special hardware even alters clock speeds ### spiNNaker * Spiking neural network platform * simulate 1% human brain using a 100kW machine (low power) * [https://collab.humanbrainproject.eu](https://collab.humanbrainproject.eu) * Has no shared memory - works with spinnaker packets instead and uses multicast routing * Like GPU’s requires a special set of software to work - convert python using spinnaker c and requires extracting of spikes back to arrays/data * Has a front end for non-neuroscience applications too e.g PyNN, graph - requires writing spinnaker applications * a Million cores available can make embarrassingly parallel code e.g. Lattice Boltzmann Method for parallelising stokes theorems * SpiNNaker 2 => 10 times scale-up in most aspects - aiming to the model whole brain! * Working in the future to Map Tensor Flow to spiNNaker
### NVIDIA DIGITS class * Start off training Alex net NN - with only 100 epochs does pretty well at identifying the dog * Uses DIGITS running on caffe * Use large Kaggle datasets for training datasets - have to be “rich” enough to prevent overfitting * preparing data: must standardise images (e.g. all same size etc) and separate into classes * can specify a set of data for validation e.g. 75% for training and 25% validations * if you see the loss not changing in the validation but changing in your training data set that implies that you are at the edge of overfitting
## SPANNER WORKSHOP ### FPGA * configurable array of logic array * requires hardware description languages like verilog (similar to C) or VHDL * example a 4 bit counter in verilog with input and output ports * mostly closed source but can use free version [https://www.edaplayground.com/register](https://www.edaplayground.com/register) * use toolset simulator synopsis vcs and open epwave after run * whole set of links to run through various tools can be as simple as clicking **RUN** * example of unsupervised fault-tolerant models - with spiking nns * building a neuron in verilog with leaky inter-gate and fire method * Can see in the “waves” spikes appear with current thresholds being reached and adjust to get stable firing rates * in this example the current reduces and generates a fault but the firing rate increases to make up for this * Uses a moving average to calculate spike rates see links 6 and 7 showing difference in spikes with different input currents * The uses a transmission probability - approximates a Gaussian distribution to a step function via bandpass filter * Follows a Hebbian Learning framework. * Has to account for differences in spikes from inputs to outputs - uses correlations to adjust and the neurones can even have different learning rates. Uses bcm stdp rules for representing activation functions. * Weight adjustments are made via a synaptic calculator - like the loss function weight adjustments * example link 9 is specifically forcing the synaptic waves to never stop learning * can input multiple different patterns of spikes for pattern recognition in the example a robot picks up a pattern of spikes, for example, an object is identified and the pattern tells the robot to move left * Lastly reduction in hardware resources can be achieved by logical optimisation and partitioning * They show a raspberry pi robot that can follow a maze [Www.york.ac.uk/robot-lab/spanner](Www.york.ac.uk/robot-lab/spanner)
### spiNNaker https://spinn-20.cs.man.ac.uk Jhub tutorial. * using pyNN.spiNNaker * well written ipython notebook tutorials in the Manchester Jhub. * set up populations of neutrons and specify how to connect them with weights and delays (due to the spiking nature) - delays of up to 16 timesteps is recommended * can use static or plastic (time dependent) synapses * runs in ms rather than timesteps pyNN sends the code to spiNNaker and runs it all for you. Then you have to extract the data in a weird neo format [http://neuralensemble.org/docs/PyNN/](http://neuralensemble.org/docs/PyNN/)