The course introduces the fundamentals of Database Management Systems (DBMS), including data, databases, and database models. It covers database architecture, schemas, and data independence, along with the roles of database users and administrators. Students learn Entity-Relationship (ER) modeling, relational model concepts, normalization, and SQL for creating, querying, and managing databases. Advanced topics include transaction management, concurrency control, recovery, and indexing techniques to ensure efficiency and reliability. The course also explores distributed databases, NoSQL systems, and modern applications of DBMS in real-world scenarios.
This course covers the fundamentals of R programming, including installation, data editing, logical statements, and set operations. It advances into data exploration, matrix operations, statistical models, and mapping models to machine learning. The course also introduces set theory, relations, logic, methods of proof, and mathematical induction. Students learn linear algebra concepts such as systems of equations, vectors, vector operations, and Support Vector Machines with Python. Finally, it explores matrix properties, eigenvalues, eigenvectors, and Principal Component Analysis (PCA) for dimensionality reduction.
The Big Data Analytics course introduces the concepts and terminologies of Big Data, highlighting its key characteristics—volume, velocity, variety, veracity, and value—and types of data such as structured, unstructured, semi-structured, and metadata, along with the business motivations and the Big Data Analytics Lifecycle. It then explains parallel and distributed data processing concepts, Hadoop fundamentals including HDFS, MapReduce, YARN, clusters, data replication, block abstraction, data locality, and high availability, as well as batch processing challenges and benefits. Further, it covers the HDFS architecture and MapReduce framework, detailing job execution, failures, scheduling, shuffle and sort, task execution, and formats. The course also explores the Hadoop ecosystem, focusing on Pig (Pig Latin, execution modes, UDFs, operators), Hive (Hive Shell, services, Metastore, HiveQL, comparison with databases), and HBase (concepts, clients, examples, comparison with RDBMS). Finally, it discusses the Data Analytics Lifecycle in depth, introduces R for data analysis (exploratory analysis, statistical evaluation), and moves into advanced analytics tools and technologies such as in-database analytics, SQL essentials, text analysis, and advanced SQL methods.
CO1: Analyze the core Python concepts, and evaluate the advanced Python techniques such as lambdas, decorators, and iterators, and synthesize solutions to complex problems.
CO2: Comprehend the core principles of scientific and numerical computing, using various Python modules to solve scientific problems, interpret statistical methods, to make informed decisions in data analysis.
CO3: Identify and process various types of data sets and employ techniques for preprocessing, analysis and, visualization, of the results.
CO4: Describe the concepts of computational graphs with TensorFlow, and its associated elements and apply the techniques for encoding and optimization.
CO5: Summarize key machine learning approaches, and implement basic machine learning modules in Python and apply various techniques with Python libraries to solve practical machine learning problems.
Course Outcomes:
On completion of the module the student should be able to:
CO1. Understand a variety of techniques for designing algorithms.
Credits – 4
CO2. Understand a wide variety of data structures and should be able to use them appropriately to
solve problems
CO3. Understand some fundamental algorithms.
Course Outcomes:
Upon successful completion of the course, students will be able to
Credits – 3
CO1. Distinguish the major data mining problems as different types of computational tasks
(prediction, classification, clustering, etc.) and the algorithms appropriate for addressing these
tasks
CO2. Analyze data through statistical and graphical summarization, supervised and unsupervised
learning algorithms
CO3. Evaluate data mining algorithms and understand how to choose algorithms for different
analysis tasks
CO4. Analyse the methods and results from a data mining practice
CO5. Design and implement data mining applications using real-world datasets, and evaluate and
select proper data mining algorithms to apply to practical scenarios
Course Objectives
Text analytics concepts and applications
Fundamental of Information retrieval and natural language processing
Text analytics framework
Theoretical techniques and applications in text analytics (e.g. social media)
After completing this course, the student will
· understand basic knowledge in Python programming.
· learn how to design and program Python applications.
· acquire object-oriented skills in Python.
· able to work with python standard library.