Wednesday, November 28, 2018

Why AI and machine learning are driving data lakes to data hubs


Data lakes were built for big data and batch processing, but AI and machine learning models need more flow and third party connections. Enter the data hub concept that'll likely pick up steam.

The data lake was a critical concept for companies looking to put information in one place and then tap it for business intelligence, analytics and big data. But the promise never quite played out. Enter the data hub concept, which is starting to become a rallying point for technology vendors as enterprises realize they have to connect to more than their own data to enable their algorithms.

Pure Storage last month outlined its data hub architecture in a bid to ditch data silos and enable more artificial learning, machine learning and cloud applications. On Oct. 9, MarkLogic, an enterprise NoSQL database provider, launched its Data Hub Service to offer better curated data for Internet of things, AI and machine learning workloads. MarkLogic claimed that its Data Hub Service is actually "data lakes done right."
Meanwhile, SAP also has a data hub that's focused on moving data around. And you could argue that the $5.2 billion merger of Cloudera and Hortonworks will put the combined company on a path to be a broad enterprise platform that will eventually have data hub features.

Rest assured that the term "data hub" is going to be a phrase mentioned by enterprise technology vendors. Data hub may also be a phrase in the running for the 2019 buzzword of the year race.

So what's driving this data hub buzz? AI and machine learning workloads. Simply put, the data lake is more like a concept designed for big data. You can analyze the lake, but you may not find all the signals needed to learn over time.
Jeremy Barnes, chief architect of ElementAI, said "the data lake is not dead from our perspective." But the data lake model "doesn't take into account AI and the ability to learn. It needs to adapt to something that enables intelligence systems to evolve," said Barnes.

ElementAI's mission is to take research and turn it into a product for businesses. Based in Montreal, Element AI leverages its own research as well as a network of academics to help clients develop their AI strategy.


Read more at:  https://www.zdnet.com/article/why-ai-machine-learning-is-driving-data-lakes-to-data-hubs/

-- -- -- -- -- -- -- -- -- --

Posted by Jayne Merdith


Thursday, November 8, 2018

The Chairman of Nokia on Ensuring Every Employee Has a Basic Understanding of Machine Learning — Including Him



I’ve long been both paranoid and optimistic about the promise and potential of artificial intelligence to disrupt — well, almost everything. Last year, I was struck by how fast machine learning was developing and I was concerned that both Nokia and I had been a little slow on the uptake. What could I do to educate myself and help the company along?

As chairman of Nokia, I was fortunate to be able to worm my way onto the calendars of several of the world’s top AI researchers. But I only understood bits and pieces of what they told me, and I became frustrated when some of my discussion partners seemed more intent on showing off their own advanced understanding of the topic than truly wanting me to get a handle on “how does it really work.”

I spent some time complaining. Then I realized that as a long-time CEO and Chairman, I had fallen into the trap of being defined by my role: I had grown accustomed to having things explained to me. Instead of trying to figure out the nuts and bolts of a seemingly complicated technology, I had gotten used to someone else doing the heavy lifting.

Why not study machine learning myself and then explain what I learned to others who were struggling with the same questions? That might help them and raise the profile of machine learning in Nokia at the same time.
Going back to school



Read More at:    https://hbr.org/2018/10/the-chairman-of-nokia-on-ensuring-every-employee-has-a-basic-understanding-of-machine-learning-including-him



Posted by:    Jayne Merdith, Tendron Systems Ltd,

Thursday, November 1, 2018

Project Tycho 2.0: a repository to improve the integration and reuse of data for global population health

BACKGROUND AND SIGNIFICANCE

Decisions in global population health can affect the lives of millions of people and can change the future of entire communities. For example, the decision to declare an influenza pandemic and stockpile vaccines can save millions of lives if a pandemic of highly pathogenic influenza actually occurred, or could waste millions of dollars if the decision was based on false alarm.1 Decision making in global health is often made under a high degree of uncertainty and with incomplete information.

New data are rapidly emerging from mobile technology, electronic health records, and remote sensing.2 These new data can expand opportunities for data-driven decision making in global health. In reality, multiple layers of challenges, ranging from technical to ethical barriers, can limit the effective (re)use of data in global health.3,4 For example, composing an epidemic model to inform decisions about vaccine stockpiling requires the integration of existing data from a wide range of data sources, such as a population census, disease surveillance, environmental monitoring, and research studies.5

Integrating data can be a daunting task, especially since global health data are often stored in domain-specific data siloes that can each use different formats and content standards, ie, they can be syntactically and semantically heterogeneous. The heterogeneity of data in global health can slow down scientific progress, as researchers have to spend much time on data discovery and curation.6
To improve access to standardized data in global health, the Project Tycho data repository in 2013.7 The first version of Project Tycho (v1) comprised over a century of infectious disease surveillance data for the United States that had been published in weekly reports between 1888 and 2014.7

Read More at:     https://doi.org/10.1093/jamia/ocy123 

Willem G van Panhuis Anne Cross Donald S Burke
Journal of the American Medical Informatics Association, ocy123, https://doi.org/10.1093/jamia/ocy123
Published:
15 October 2018
 
 
 
 
 
Posted by:  Jayne Merdith, Tendron Systems Ltd, London, UK