Berlin Buzzwords 2017

14 Jun '17

Data

This year, Sky Betting & Gaming sent three architects from the Data Tribe to Berlin Buzzwords to learn all about storing, processing, streaming and searchability of large amounts of digital data. Focusing on open-source projects, it is a great conference for talking to practitioners of big data rather than vendors.

The event started with an afternoon of barcamp sessions, followed by two days of more formal conference split across four rooms and multiple streams - Scale, Search, Stream, Store.

Our favourite talks

Many of the more vendor-driven conferences tend to start with quite abstract keynotes and sales pitches from the sponsors. That is far from true for Berlin Buzzwords, with the keynotes being some of the best talks. Karen Sandler’s story about the importance of free and open-source software for her (Video), and Duncan Ross’s talk about data evangelism (Video) were both very inspiring talks.

Data science for good? A data science pledge #bbuzz pic.twitter.com/u8fYX2yUox
— Alice in Wanderland (@AliceFromOnline) June 13, 2017

Michael Häusler gave a great talk on the integration patterns for big data applications (Video)at Researchgate. I highly recommend watching the video of this, as there are some unique ideas here which seem to work really well.

Lars Francke gave a good overview of securing Hadoop (Video), encouraging people to start with Kerberos authentication right from the start and adding extra security as required by your needs/regulatory requirements.

Alvaro Videla gave an interesting talk on metaphors we compute by (Video), and how we need to be careful with language.

Metaphors are the tools of thought #bbuzz pic.twitter.com/Hm1sXMy9Fk
— Stefan Rudnitzki (@stefzki) June 12, 2017

Fokko Driesprong and Vincent Warmerdam gave a thought-inspiring talk on streaming Bayesian analysis to match user skill levels in online computer games (Video): starting with Pokémon.

Frank Lyaruu’s talk on embracing database diversity (Video) reminded us that putting data into Kafka makes it available not just for your first use-case, but for many others. Once user data is being fed through Kafka you can then plug in elasticsearch, key-value stores, and even caching layers and push updates to web sockets.

Maxim Zaks asked some great questions about why we’re still using JSON (Video) and looked at how binary formats perform much better.

The talk about JSON inefficiency was eye opening. Would be interesting to see how Avro compares to the other formats tested @iceX33 #bbuzz pic.twitter.com/f0LCzMSR9Z
— Ville Brofeldt (@VilleVBro) June 14, 2017

I would highly recommend next year’s Berlin Buzzwords, especially as it will be combined with a second conference in 2018 - one on governance and management of open source communities.

Want to read more about Data? Check out these related articles...

26 Apr '17

How we broke Hadoop by optimising services

We’ve been optimising the allocation of services in our Hadoop cluster recently. It turns out a quiet Hadoop gateway server is a bad one.

Author:

Alice Kaerast

Time:

3 minute read

23 Jan '17

Towards a realtime streaming architecture

Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming

Author:

Alice Kaerast

Time:

7 minute read

18 Jan '17

A Recent Graduate's Guide to Sky Betting & Gaming…

A recent graduate’s blog on their first few months at Sky Betting & Gaming as well as what to expect.

Author:

Marcus Ojerinde-Ardalla

Time:

7 minute read

12 Jan '17

Hadoop: The Data Storage Elephant

Outline of how Hadoop works and how it is used at SBG

Author:

Former Employee

Time:

4 minute read

5 May '16

Measuring Impala performance using Apache JMeter

Our web performance teams regularly use JMeter to load test our websites to identify performance of the various components involved, but it turns out you can actually use it to directly test the performance of a Hadoop datawarehouse.

Author:

Alice Kaerast

Time:

2 minute read

30 Mar '16

Google Phone Numbers in Spark

Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.

Authors:

Alice Kaerast and Darrell Taylor

Time:

15 minute read

17 Nov '15

How to DBA - All Your Base conference experience

Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.

Author:

Alice Kaerast

Time:

6 minute read

9 Sep '15

Open-Sourcing Pidl (Pipeline Definition Language)

Announcing the release of Pidl, a Ruby DSL that we developed to manage our ETL pipelines through Hadoop.

Author:

Craig Andrews

Time:

17 minute read

5 Aug '15

Distributed Database Query Optimisation with Lego

Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.

Author:

Alice Kaerast

Time:

6 minute read

14 Apr '15

Alberto Brandolini on DDD & CQRS

Some key take-aways from having Alberto Brandolini on-site at Skybet talking to us about DDD methodologies.

Author:

Rob Tuley

Time:

2 minute read

20 Jan '15

SerDe vs UDF – parsing JSON in Hive

5 different approaches to handling JSON data with Hive.

Author:

Tom Scott

Time:

15 minute read

Meet the author

Alice Kaerast

Senior Automation Engineer · Infrastructure Tribe

Worked at Sky Betting & Gaming 2012–2019

Alice played a key role in building our Hadoop-based data platform as a DevOps Engineer and then Architect, and then applied this engineering experience in the infrastructure tribe on Apache Kafka and other shared platforms.

Tweets by @SBGTechTeam