This event is now over

Event Details



Workshop and Tutorial

Getting Started With Data Science

Date: Saturday, September 20, 2014 (9:00am to 5:00pm)

Venue: SAP Labs India Pvt. Ltd., #138, EPIP Zone, Whitefield, Bangalore - 560066 

Workshop focus areas: Participants will be able to understand

  •  The core foundations of Data Science
  •  The mathematical principles applied vastly in this area. 
  •  Various techniques to transform data into knowledge and knowledge into intelligence
  •  The popular tools and programming languages (scripting languages) used in Data Science.
  •  Hands on sessions:
    •  Core Concepts (Bayesian, Regression, Neural Nets) 
    •  Analyzing Large Datasets for Pattern Recognition
    •  Sentiment Analysis.

Audience: Students and Young Professionals.

Difficulty: Beginner to Medium.



Time slot


9:00 to 9:30


9:30 to 11:00

 Introduction to Data Science

  •  Overview of some common problems in Data Science
  •  Approaches

11.00 to 11:15

Short Break 

11:00 to 1:30

Concepts in Data Mining / Machine Learning

  •  Prominent Data Mining examples
  •  Clustering and Classifications
  •  Supervised & Unsupervised Learning

1:30 to 2:30

Lunch Break

2:30 to 4:00

The Machine Learning Pipeline

  •  Typical Data sources
  •  Training, Testing and Cross validation
  •  Reporting and Data Visualization

4:00 to 4:15

Short Break

4:15 to 5:30

Text Mining

  •  Key Definitions
  •  Opinion Mining Rules
  •  Case Study

5:30 to 6:00

Open Discussion

  •  Programming Languages and Libraries used in Data Science
  •  Project Ideas



Pre-requisites: Familiarity with Math and basics of Computer Science. Participants should bring their own Laptop with below software’s installed.

  1. RapidMiner Studio:
  2. R: (Windows Or Ubuntu:
  3. Python (PyCharm Community Edition):

Data, scripts, notes and presentations required to do the class exercises will be provided to the participants during the session.

Brief introduction to Data Science 


“A data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.” 


A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization. 



Why is Data Science important?

Every organization will need someone wearing the data scientist hat just like very organization has people responsible for product, sales, marketing and support. Unfortunately, to date, the tools available to data scientists have been rudimentary. Data scientists have had to learn diverse and complex computer languages for working with data. That world is changing as we create simpler ways for data scientists to use big data.

Surely we need data scientists in machine learning, right? Well, if you have very customized needs, perhaps. But most of the standard challenges that require big data, like recommendation engines and personalization systems, can be abstracted out. For example, a large part of the job of a data scientist is crafting “features,” which are meaningful combinations of input data that make machine learning effective. As much as we’d like to think that all data scientists have to do is plug data into the machine and hit “go,” the reality is people need to help the machine by giving it useful ways of looking at the world.

It’s never easy to automatically surface the most valuable insights from data. There are ways to provide domain-specific lenses, however, that allow business experts to experiment – much like a data scientist. This seems to be the easiest problem to solve, as there are a variety of domain-specific analytics products already on the market.

Research Areas:


  • Cloud computing; Databases and information integration
  • Learning, natural language processing and information extraction; Signal Processing; Computer vision
  • Information retrieval and web information access; Knowledge discovery in social and information networks


Vishwas -

Mr Vijaykant Naddadur


SAP Labs
SAP Main campus
Bangalore, Karnataka