The post Bayesian Theorem: Breaking it to simple using pymc3 Modelling appeared first on Affine.

]]>**Abstract**

This article edition of Bayesian Analysis with Python introduced some basic concepts applied to the Bayesian Inference along with some practical implementations in Python using PyMC3, a state-of-the-art open-source probabilistic programming framework for exploratory analysis of the Bayesian models.

The main concepts of Bayesian statistics are covered using a practical and computational approach. The article covers the main concepts of Bayesian and Frequentist approaches, Naive Bayes algorithm and its assumptions, challenges of computational intractability in high dimensional data and approximation, sampling techniques to overcome challenges, etc. The results of Bayesian Linear Regressions are inferred and discussed for the brevity of concepts.

**Introduction**

Frequentist vs. Bayesian approaches for inferential statistics are interesting viewpoints worth exploring. Given the task at hand, it is always better to understand the applicability, advantages, and limitations of the available approaches.

In this article, we will be focusing on explaining the idea of Bayesian modeling and its difference from the frequentist counterpart. To make the discussion a little bit more intriguing and informative, these concepts are explained with a Bayesian Linear Regression (BLR) model and a Frequentist Linear Regression (LR) model.

**Bayesian and Frequentist Approaches**

**The Bayesian Approach:**

Bayesian approach is based on the idea that, given the data and a probabilistic model (which we assume can model the data well), we can find out the posterior distribution of the model’s parameters. For e.g.

In Bayesian Linear Regression approach, not only the dependent variable *y,* but also the parameters(β) are assumed to be drawn from a probability distribution, such as Gaussian distribution with mean=β^{T}X, and variance =*σ*^{2}I (refer equation 1). The outputs of BLR is a distribution, which can be used for inferring new data points.

The Frequentist Approach, on the other hand, is based on the idea that given the data, the model and the model parameters, we can use this model to infer new data. This is commonly known as the Linear Regression Approach. In LR approach, the dependent variable (y) is a linear combination of weights term-times the independent variable (x), and e is the error term due to the random noise.

Ordinary Least Square(OLS) is the method of estimating the unknown parameters of LR model. In OLS method, the parameters which minimize the sum of squared errors of training data are chosen. The output of OLS are “single point” estimates for the best model parameter.

Let’s get started with Naive Bayes Algorithm, which is the backbone of Bayesian machine learning algorithms. Here, we can predict only one value of *y*, so basically it is a point estimation

**Naive Bayes algorithm for classification **

Discussions on Bayesian Machine Learning models require a thorough understanding of probability concepts and the Bayes Theorem. So, now we discuss Bayes’ Algorithm. Bayes’ theorem finds the probability of an event occurring, given the probability of an already occurred event. Suppose we have a dataset with 7 features/attributes/independent variables (x_{1}, x_{2, }x_{3},…, x_{7}), we call this data tuple as **X**. Assume H is the hypothesis of the tuple belonging to class C. In Bayesian terminology, it is known as the *evidence*. *y* is the dependent variable/response variable (i.e., the class in classification problem). Then Mathematically, Bayes theorem is stated as :

Where:

- P(H|X) is the probability that the hypothesis H holds correct, given that we know the ‘evidence’ or attribute description of X. P(H|X) is the probability of H conditioned on X, a.k.a., Posterior Probability.
- P(X|H) is the posterior probability of X conditioned on H and is also known as ‘Likelihood’.
- P(H) is the prior probability of H. This is the fraction of occurrences for each class out of total number of samples.
- P(X) is the prior probability of evidence (data tuple X), described by measurements made on a set of attributes (x
_{1}, x_{2, }x_{3},…, x_{7}).

As we can see, the posterior probability of H conditioned on X is directly proportional to likelihood times prior probability of class and is inversely proportional to the ‘Evidence’.

**Bayesian approach for regression problem:** **Assumptions of Bayes theorem, given a sales prediction problem with 7 independent variables.**

i) Each pair of features in the dataset are independent of each other. For e.g., feature x_{1} has no effect on x_{2}, & x_{2} has no effect on feature x_{7}.

ii) Each feature makes an equal contribution towards the dependent variable.

**Finding the posterior distribution of model parameters is computationally intractable for continuous variables, we use Markov Chain Monte Carlo and Variational Inferencing methods to overcome this issue.**

From Naive Bayes theorem (equation 3), posterior calculation needs a prior, a likelihood and evidence. Prior and likelihood are calculated easily as they are defined by the assumed model. As P(X) doesn’t depend on H and given the values of features, the denominator is constant. So, P(X) is just a normalization constant. We need to maximize the value of numerator in equation 3. However, the evidence (probability of data) is calculated as:

Calculating the integral is computationally intractable with high dimensional data. In order to build faster and scalable systems, we require some sampling or approximation techniques to calculate the posterior distribution of parameters given in the observed data. In this section, two important methods for approximating intractable computations are discussed. These are sampling-based approach. Markov-chain Monte Carlo Sampling (MCMC sampling) and approximation-based approach known as Variational Inferencing (VI). Brief introduction of these techniques are as mentioned below:

**MCMC**– We use sampling techniques like MCMC to draw samples from the distribution, followed by approximating the distribution of the posterior. Refer to George’s blog [1], for more details on MCMC initialization, sampling and trace diagnostics.**VI**– Variational Inferencing method tries to find the best approximation of the distribution from a parameter family. It uses an optimization process over parameters to find the best approximation. In PyMC3, we can use Automatic Differentiation Variational Inference (ADVI), which tries to minimize the**Kullback–Leibler**(KL) divergence between a given parameter family distribution and the distribution proposed by the VI method.

**Prior Selection: Where is the prior in data, from where do I get one? **

Bayesian modelling gives alternatives to include prior information into the modelling process. If we have domain knowledge or an intelligent guess about the weight values of independent variables, we can make use of this prior information. This is unlike the frequentist approach, which assumes that the weight values of independent variables come from the data itself. According to Bayes theorem:

Now that the method for finding posterior distribution of model parameters are being discussed, the next obvious question based on equation 5 is how to find a good prior. Refer [2] for understanding how to select a good prior for the problem statement. Broadly speaking, the information contained in the prior has a direct impact on the posterior calculations. If we have a more “revealing prior” (a.k.a., a strong belief about the parameters), we need more data to “alter” this belief. The posterior is mostly driven by prior. Similarly, if we have an “vague prior” (a.k.a., no information about the distribution of parameters), the posterior is much driven by data. It means that if we have a lot of data, the likelihood will wash away the prior assumptions [3]. In BLR, the prior knowledge modelled by a probability distribution is updated with every new sample (which is modelled by some other probability distribution).

**Modelling using PyMC3 library for Bayesian Inferencing**

Following snippets of code (borrowed from [4]), shows Bayesian Linear model initialization using PyMC3 python package. PyMC3 model is initialized using “with pm.Model()” statement. The variables are assumed to follow a Gaussian distribution and Generalized Linear Models (GLMs) used for modelling. For an in-depth understanding on PyMc3 library, I recommend Davidson-Pilon’s book [5] on Bayesian methods.

**Fig. 1 Traceplot shows the posterior distribution for the model parameters as shown on the left hand side. The progression of the samples drawn in the trace for variables are shown on the right hand side. **

We can use “Traceplot” to show the posterior distribution for the model parameters and shown on the left-hand side of Fig. 1. The samples drawn in the trace for the independent variables and the intercept for 1,000 iterations are shown on the right-hand side of the Fig 1. Two colours – orange and blue, represent the two Markov chains.

After convergence, we get the coefficients of each feature, which is its effectiveness in explaining the dependent variable. The values represented in red are the Maximum a posteriori estimate (MAP), which is the mean of the variable value from the distribution. The sales can be predicted using the formula:

As it is a Bayesian approach, the model parameters are distributions. Following plots show the posterior distribution in the form of histogram. Here the variables show 94% HPD (Highest Posterior Density). HPD in Bayesian statistics is the *credible interval, *which tells us we are 94% sure that the parameter of interest falls in the given interval (for variable x_{6}, the value range is -0.023 to 0.36).

We can see that the posteriors are spread out, which is an indicative of less data points used for modelling, and the range of values each independent variable can take is not modelled within a small range (uncertainty in parameter values are very high). For e.g., for variable x_{6}, the value range is from -0.023 to 0.36, and the mean is 0.17. As we add more data, the Bayesian model can shrink this range to a smaller interval, resulting in more accurate values for weights parameters.

**When to use linear and BLR, Map, etc. Do we go Bayesian or Frequentist?**

The equation for linear regression on the same dataset is obtained as:

If we see Linear regression equation (eq. 7) and Bayesian Linear regression equation (eq. 6), there is a slight change in the weight’s values. So, which approach should we take up? Bayesian or Frequentist, given that both are yielding approximately the same results?

When we have a prior belief about the distributions of the weight variables (without seeing the data) and want this information to be included into the modelling process, followed by automatic belief adaptation as we gather more data, Bayesian is a preferable approach. If we don’t want to include any prior belief and model adaptions, the weight variables as point estimates, go for Linear regression. Why are the results of both models approximately the same?

The maximum a posteriori estimates (MAP) for each variable is the peak value of the variable in the distribution (shown in Fig.2) close to the point estimates for variables in LR model. This is the theoretical explanation for real-world problems. Try using both approaches, as the performance can vary widely based on the number of data points, and data characteristics.

**Conclusion**

This blog is an attempt to discuss the concepts of Bayesian inferencing and its implementation using PyMC3. It started off with the decade’s old Frequentist-Bayesian perspective and moved on to the backbone of Bayesian modelling, which is Bayes theorem. Once setting the foundations, the concepts of intractability to evaluate posterior distributions of continuous variables along with the solutions via sampling methods viz., MCMC and VI are discussed. A strong connection between the posterior, prior and likelihood is discussed, taking into consideration the data available in hand. Next, the Bayesian linear regression modelling using PyMc3 is discussed, along with the interpretations of results and graphs. Lastly, we discussed why and when to use Bayesian linear regression.

**Resources:**

The following are the resources to get started with Bayesian inferencing using PyMC3.

[1] https://eigenfoo.xyz/bayesian-modelling-cookbook/

[2] https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

[3] https://stats.stackexchange.com/questions/58564/help-me-understand-

bayesian-prior-and-posterior-distributions

[4] https://towardsdatascience.com/bayesian-linear-regression-in-python-

using-machine-learning-to-predict-student-grades-part-2-b72059a8ac7e

[5] Davidson-Pilon, Cameron. *Bayesian methods for hackers: probabilistic *

*programming and Bayesian inference*. Addison-Wesley Professional, 2015.

The post Bayesian Theorem: Breaking it to simple using pymc3 Modelling appeared first on Affine.

]]>The post Defying the COVID-19 with an Unstoppable Work-Force appeared first on Affine.

]]>**Manas Agrawal**, **Affine’s CEO**, recently shared an email to his employees, urging them to stay healthy and work from home amidst the recent developments in COVID-19 situation.

Team,

I trust that you, your families/communities are safe and doing well this season. We acknowledge the unprecedented times we are in, owing to the constantly changing COVID-19 situation, and our hearts and thoughts go out to each and every one of you.

While we are affected by the impact of COVID-19 in all aspects of our lives, I am sad to tell you that things may get worse before they get better. With no manuals to guide us through this fast-changing workflow, please remember to have deep empathy and understanding of each other’s situation.

And as we travel through unchartered territories, I understand that some of you may feel that all this is a little unsettling and overwhelming. Remind yourselves to stay grounded with a sense of purpose and the importance of acting as a community in trying times like these.

We are working with the senior leadership teams to support you in the best ways possible, prioritizing your health and safety. Our resolve is to empower individuals as we will not be able to solve a challenge like this on our own. While technology has a significant role in accelerating progress for solutions to pandemics such as this, the private and public sectors will have to work together to turn the tide on COVID-19.

I’m not alone in being grateful for the exceptional work you are all doing for Affine. The management is noticing your efforts at every level. I want to congratulate everyone for successfully achieving this Work From Home scenario without any disruptions in your deliverables. Your diligence, self-motivation, and dedication in going the extra mile are admirable.

Remember to focus on what you can do to make the world a better place. Our collective efforts will make a difference beyond measure. Times ahead may get harder than ever, but remember that we are all in this together!

Keep your hopes up and continue the exceptional teamwork knowing that you are part of an Unstoppable Work-Force!

It will only be a matter of time before we emerge victorious to an era of wellbeing and development.

But until then, stay safe, stay healthy.

Warm Regards,**Manas Agrawal**

CEO – Affine

The post Defying the COVID-19 with an Unstoppable Work-Force appeared first on Affine.

]]>**Pushing facial recognition technology beyond conventional applications for real-time deployment on a large-scale will require overcoming numerous challenges to achieve high accuracy rates at minimal processing time, Mentioned below are some of the best practices to follow while Revolutionizing the Technology.**

**Author:****Dr.Monika Singh |** Senior-Data Scientist

[contact-form-7]

The post Facial Recognition appeared first on Affine.

]]>The post Bidirectional Encoder Representations for Transformers (BERT) Simplified appeared first on Affine.

]]>**Bidirectional Encoder Representations for Transformers (BERT) **has revolutionized the NLP research space. It excels at handling language problems considered to be “context-heavy” by attempting to map vectors onto words post reading the entire sentence in contrast to traditional methods in NLP models.

This blog sheds light on the term BERT by explaining its components.

BERT (Bidirectional Encoder Representation from Transformers)

**Bidirectional** – Reads text from both the directions. As opposed to the directional models, which read the text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once. Therefore, it is considered bidirectional, though it would be more accurate to call it non-directional.

**Encoder** – Encodes the text in a format that the model can understand. It maps an input sequence of symbol representations to a sequence of continuous representations. It is composed of a stack with 6 identical layers. Each layer has two sub-layers. The first layer is a multi-head self-attention mechanism. And the second layer is a simple, position-wise fully connected feed-forward network. We employ a residual connection around each of the two sub-layers, followed by **Layer Normalization**. The key feature of layer normalization is that it normalizes the inputs across the features.

**Representation** – To handle a variety of down-stream tasks, our input representation can unambiguously represent both a single sentence and a pair of sentences, e.g. Question & Answering, in one token sequence in the form of transformer representations.

**Transformers** – Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task. Since BERT’s goal is to generate a language model, only the encoder mechanism is necessary.

**Transformers are a combination of 3 things:**

In this blog, we will only talk about the Attention Mechanism.

**Limitations of RNNs over transformers:**

- RNNs and its derivatives are sequential, which contrasts with one of the main benefits of a GPU i.e. parallel processing
- LSTM, GRU and derivatives can learn a lot of long-term information, but they can only remember sequences of 100s, not 1000s or 10,000s and above

**Attention Concept**

As you can see in the image above, attention must be paid at the stop sign. And for the text, **eating** (verb) has higher attention in relation to **oats**.

Transformers use attention mechanisms to gather information about the relevant context of a given word, then encode that context in the vector that represents the word. Thus, attention and transformers together form smarter representations.

Types of Attention:

- Self-Attention
- Scaled Dot-Product Attention
- Multi-Head Attention

**Self-Attention**

Self-attention, also called **intra-attention** is an attention mechanism that links different positions of a single sequence to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, etc.

**Scaled Dot-Product Attention**

Scaled Dot-Product Attention consists of queries Q and keys K of dimension dk, and values V of dimension dv. We compute the dot products of the query with all keys, divide each of them by √dk, and apply a SoftMax function to obtain the weights on the values.

The two most commonly used attention functions are:

**Dot-product (multiplicative) attention:**This is identical to the algorithm, except for the scaling factor of √1dk.**Additive attention:**Computes the compatibility function using a feed-forward network with a single hidden layer.

While the two are similar in theoretical complexity, dot-product attention is much faster and more space-efficient in practice as it uses a highly optimized matrix multiplication code.

**Multi-Head Attention**

Instead of performing a single attention function with dmodel dimensional keys, values and queries, it is beneficial to linearly project the queries and values h times with different, trained linear projections to dk, dk and dv dimensions, respectively. We can then perform the attention function in parallel to each of these projected versions, yielding dv-dimensional output values. These are concatenated and once again projected, resulting in the final values. Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions.

**Applications of BERT**

**Context-based Question Answering:**It is the task of finding an answer to a question over a given context (e.g., a paragraph from Wikipedia), where the answer to each question is a segment of the context.**Named Entity Recognition (NER):**It is the task of tagging entities in text with their corresponding type.**Natural Language Inference:**Natural language inference is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.**Text Classification**

**Conclusion:**

Recent experimental improvements due to transfer learning with language models have demonstrated that rich and unsupervised pre-training is an integral part of most language understanding systems. It is in our interest to further generalize these findings to deep bidirectional architectures, allowing the same pre-trained model to successfully tackle a broader set of NLP tasks.

**References:**

The post Bidirectional Encoder Representations for Transformers (BERT) Simplified appeared first on Affine.

]]>The post TensorFlow Lite: The Future of AI in Mobile Devices appeared first on Affine.

]]>**What is TensorFlow Lite?**

TFLite is TensorFlow’s light weight solution for mobile, embedded and other IoT devices. It can be described as a toolkit that helps developers run the TensorFlow model on such devices.

**What is the need for TensorFlow Lite?**

Running Machine Learning models on mobile devices are not easy due to the limitation of resources like memory, power, storage, etc. Ensuring that the deployed AI models are optimized for performance under such constraints becomes a necessary step in such scenarios.

This is where the TFLite comes into the picture. TFLite models are hyper-optimized with model pruning and quantization to ensure accuracy for a small binary size with low latency, allowing them to overcome limitations and operate efficiently on such devices.

**TensorFlow Lite consists of two main components:**

: that converts TensorFlow models into an efficient form and creates optimizations to improve binary size and performance.*The TensorFlow Lite converter*: runs the optimized models on different types of hardware, including mobile phones, embedded Linux devices, and microcontrollers.*The TensorFlow Lite interpreter*

**TensorFlow Lite Under the Hood**

Before deploying the model on any platform, the trained model needs to go through a conversion process. The diagram below depicts the standard flow for deploying a model using TensorFlow Lite.

**Step 1:** Train the model in TensorFlow with any API, for e.g. Keras. Save the model (h5, hdf5, etc.)

**Step 2:** Once the trained model has been saved, convert it into a TFLite flat buffer using the TFLite converter. A **Flat buffer,** a.k.a. TFLite model is a special serialized format optimized for performance. The TFLite model is saved as a file with the extension **.tflite**

**Step 3:** Post converting the TFLite flat buffer from the trained model, it can be deployed to mobile or other embedded devices. Once the TFLite model gets loaded by the interpreter on a mobile platform, we can go ahead and perform inferences using the model.

Converting your trained model (‘my_model.h5’) into a TFLite model (‘my_model.tflite’) can be done with just a few lines of code as shown below:

**How does TFLite overcome these challenges? **

TensorFlow Lite uses a popular technique called **Quantization**. Quantization is a type of optimization technique that constrains an input from a large set of values (such as the real numbers) to a discrete set (such as the integers).

Quantization essentially reduces the precision representation of a model. For instance, in a typical deep neural network, all the weights and activation outputs are represented by a 32-bit floating-point numbers. Quantization converts the representation to the nearest 8-bit integers. And by doing so, the overall memory requirement for the model reduces drastically which makes it ideal for deployment in mobile devices. While these 8-bit representations can be less precise, certain techniques can be applied to ensure that the inference accuracy of the new quantized model is not affected significantly. This means that quantization can be used to make models smaller and faster without sacrificing accuracy.

Stay tuned for the follow up blog that will be a walkthrough of how to run a Deep learning model on a Raspberry Pi 4. In the meantime, you can keep track of all the latest additions to TensorFlow Lite at https://www.tensorflow.org/lite/

The post TensorFlow Lite: The Future of AI in Mobile Devices appeared first on Affine.

]]>The post AI in Robotic Process Automation – The Missing Link appeared first on Affine.

]]>RPA can be used to automate repetitive process-driven work with well-defined outcomes. However, there is a catch. If this is Process Automation, then why don’t we just call it Process Automation? Apart from the marketing angle of using the word Robotic in Process Automation, whoever coined the term had something beyond just process automation in mind.

The intent and endeavor to automate workflows are not new. In traditional systems, automation was achieved by software developers building a comprehensive list of APIs or Scripts to cover all pre-conceived and possible tasks. However, there was a serious drawback in this approach, it was not scalable.

In modern RPA systems, instead of writing a finite number of scripts, the software systems trained to understand any number of steps as executed by recording the actual process and then replicating the same as it was recorded with the RPA platform. The 90’s generation that used excel extensively to write macros to record a set of standard activities and then store it so that every time those set of processes are run, it executes a complex but well-defined set of processes in a certain sequence. This was also called an excel macro. The current RPA platforms execute on similar principles, although at a much larger scale of complexity and size with advanced technology.

While this overcomes the problem of scale, there was still one last challenge. The significance of the word Robotic comes from the fact that there is an element of intelligence expected from the process of automation undertaken by an RPA platform. This intelligence allows the platform to take autonomous decisions based on trigger. While the trigger can be programmed, the same is not the case with the decision tree that activates the trigger.

Most RPA software’s like UIPath, Blue Prism and Automation Anywhere have so far come out with platforms that are very good at process automation by being programmed to follow a certain set of standard processes. However, they fall short miserably when it comes to making the whole process intelligent. They fail to transcend from their platforms being Automatic to Autonomous.

Let’s illustrate this with an example. Let’s assume there is a complex senior management report that gets generated by collating some specific lines of data from various enterprise databases like Oracle, SAP, and others. After the report generation, an automated mail is sent out to 500 users with specific content to each user.

The process is repetitive because multiple reports need to be generated from multiple sources of data. This is a complex process with ginormous scales of data having a multiplier effect on each of the customized reports generated for more than 500 stakeholders which are then emailed to intended recipients.

Sounds like quite a complex process but today’s RPA platforms are equipped to handle this easily and repeat process automation with minimum errors. This does not require a lot of development on existing RPA platforms. As mentioned, it can also be easily integrated into multiple platforms mentioned earlier.

However, current implementations still fall short of one critical feature required to certify it as a true autonomous implementation.

Taking one particular use case as an example, if the RPA platform had to decide on whom to send the reports based on some critical random content of the reports which could be either a picture, text or numeric and it could appear in a random pattern, the platform would fail to do so. It will fail for the simple reason that it does not know how to detect and tackle unknown situations and the decisions thereof because it is not a part of the standard process.

Similarly, many other use cases are missing from current RPA platforms. While they claim that some of them are AI-enabled, most of them are not there yet.

The main reason for such a shortcoming is that AI is not the core of process automation developers. Naturally, it is an area they are skeptical in investing and rightfully so. It is going to be difficult for them to develop such a specialized competency. The RPA platforms should, therefore, drop the word Robotic unless their platforms are truly autonomous.

The RPA enterprise customers who see a real and large-scale implementation of their process automation platforms should work with AI service providers like Affine, to be able to add the element of intelligence and true autonomous Capabilities to the process automation already implemented.

One should not have the apprehensions of integration here because just like RPA platforms can be easily integrated into existing systems, AI modules can also be integrated into either RPA modules or to the end systems directly.

That is when RPA implementations will truly become Robotic in nature.

The post AI in Robotic Process Automation – The Missing Link appeared first on Affine.

]]>The post Capsule Network: A step towards AI mimicking human learning systems appeared first on Affine.

]]>The field of computer vision has witnessed a paradigm shift after the introduction of Convolutional Neural Network (CNN) architectures which has pushed AI performance at par with humans. There has been significant progress in CNN driven architectures right from the first AlexNet architecture published in 2012 to newer architectures like ResNet, DenseNet, NASNet and more recently EfficientNet; each focusing on improving accuracy while rationalizing the computing cost of adoption (through a lesser number of parameters).

CNNs learn faster with higher accuracy than any traditional non-convolutional image models owing to features like :

- Local connectivity: While this limits the learnings to nearby pixel but it is sufficient enough to learn correlations required to evaluate an image
- Parameter sharing across spatial locations: It makes learning easier and faster by reducing redundancy. e.g. if the algorithm has learned to detect horizontal edge at a point A, it need not learn horizontal edge detection again at point B

While CNN has worked remarkably well, they fall short on 2 key aspects:**Lack of Invariance**:

Human beings perceive in a translation invariance way, which means we are capable of identifying an object even if the location and orientation of the object changes in our field of view. An example below:

Humans can identify cat in each of the above scenarios. However, CNN needs to be trained on multiple orientation scenarios for accurate inferencing. While image augmentation techniques have helped to overcome the orientation challenge, but it leads to higher processing, data management costs and also might not work in all scenarios.

**Loss of information on related features**:

In the scenario above, CNN will recognize both Figures A and B as a face.

This happens due to the Pooling layers in the CNN architecture. In simple terms, we know that initial layers in CNN learn about the low-level features e.g. edges, points, curves, arcs, etc., while the later layers learn about high-level features e.g. eyes, nose, lips which are further enhanced to identify the actual object i.e. the human face in subsequent layers. The intermediate Pooling layers help in regulating these high order features while reducing the spatial size of the data flowing through the network. The dynamics of Pooling layers don’t take into account the spatial relationships between simple and complex objects.

Thus, in both Figure A and B, CNN recognizes a face by evaluating the presence of high-level features like eyes, nose, lips, without applying any cognizance to their spatial relationships.

According to Hinton, for accurate image classification CNN should be able to learn and preserve the spatial relationships and hierarchies between the features. Capsule networks, introduced by Hinton and his team, is a step towards learning a better and complete knowledge representation.*What is a capsule?A capsule is a group of neurons that captures both the likelihood and parameters of the specific feature*

Capsule networks use a vector representation as compared to scalar representations used in existing neurons in CNN architectures. This capsule vector representation includes (1) whether an object exists (2) what are the key features (3) where is it located.

Thus, for an object capsule, the activations will consist of:

1. Presence probability a

2. Feature vector c

3. Pose matrix OV

Few popular techniques for implementing a capsule architecture has been discussed below:

**A. Dynamic Routing Algorithm (**source**)**

The original Capsules network developed by Hinton and his team uses a Dynamic routing algorithm to group child capsules to form a parent capsule. The vectors of an input capsule are transformed to match the output, and if a close match is available it forms a **vote**, capsules with a similar vote are grouped.

If the ** activity vector** has a close similarity with

The main drawback of this approach is that it takes a long time both during training and inferencing of the model. Since the voting is done iteratively, each part could start by initially disagreeing and voting on different objects, before converging to the relevant object. Hence this iterative manner of voting is highly time consuming and inefficient

**B.** **Stacked Capsule Autoencoders (SCAE) (**source**)**

Dynamic routing can be thought of a bottom-up approach where the parts are used to learn parts à object relationship. Since this relationship is learnt in an iterative manner (iterative routing) it leads to many inefficiencies. SCAE takes a top-down approach where parts of a given object are predicted, thus removing the dependency of the iterative routing approach. A further advantage of this version of capsules is that it can perform unsupervised learning.

**SCAE consists of two networks**

**Part Capsule Autoencoder (PCAE)**: Detects parts and recombines them into an image in the following manner**Part Capsule Encoder:**Segments an image into constituent parts, infers their poses**Part Capsule Decoder:**Learns an image template for each part and reconstructs each image pixel

**Object Capsule Autoencoder (OCAE)**, organizes parts into objects**Object Capsule Encoder**: Tries to organize discovered parts and their poses into a smaller set of objects**Object Capsule Decoder:**Make predictions for each object part into one of the object types

Capsule Networks has achieved better performance as compared to CNN networks on MNIST (98.5% to 99%) and SVHN datasets (55% to 67%).

The major drawback of SCAE is that the part decoder uses fixed templates, which are insufficient to model complicated real-world images.

These networks can be perceived as a network with reasoning ability and thus with more expressive templates in the future, it can help infer complex real-world images efficiently with lesser data.

The post Capsule Network: A step towards AI mimicking human learning systems appeared first on Affine.

]]>Published on : Yahoo Finance

Date: August 21, 2019

The post Yahoo Finance – Affine’s Singapore Office Launch to Support Rapid AI & Data Science Growth in The APAC Region appeared first on Affine.

]]>SINGAPORE, Aug. 21, 2019 /PRNewswire/ —** Affine Analytics, **Leading Artificial Intelligence & Data Sciences Solution provider, has opened its new office in Singapore. This move is in line with its ambitious expansion plans into the Asia Pacific region in 2019 and beyond.

Affine is the brain-child of three data-mavericks, Manas Agrawal, Vineet Kumar and Abhishek Anand, who are all reputed specialists of their fields with more than 30 years of combined experience and unique strengths in management and solutioning of AI & Advanced Analytics based services. Affine, at the cutting-edge of technology-based development, has emerged as a leader in harnessing the power of cloud-based services, Big Data and Artificial Intelligence.

**Manas Agrawal, Affine Co-founder & CEO said**:

“We are celebrating a milestone moment on the opening of our new office in Singapore region. This is an exciting time for us, as we continue to expand our operations in the Asia-Pacific markets to attain significant growth and help our customers scale quickly, easily and effectively.”

Affine foresees a tremendous potential for growth & consumption in some niche areas such as Machine Learning, Deep Learning, IoT, Computer Vision, etc., especially with organizations now wanting to acquire & implement data-driven solutions for informed decision making and to stay proactive.

Catering to the ever-increasing technology-based needs of this market, they offer Data Sciences & Artificial Intelligence as a strategic tool and provide the necessary expertise & experience, acquired and honed over the years.

With the launch in Singapore & their increasing footprint globally, they plan to create competitive advantage for clients by incorporating innovation and data technology upgrades into their existing systems.

**Ankit Khandelwal, Affine’s Lead for South East Asia Region said:**

“Our prospective clients would be Asian arms of global corporates, regional businesses with global aspirations as well as Government linked companies. We also intend to partner with some of the leading technology players in the region to quickly scale up and reach out to maximum number of clients who could benefit from our set of offerings.”

**Preferred Partners**

Affine is a strategic analytics partner to medium and large-sized organizations (majorly Fortune 500 & Global 1000) around the globe that creates cutting-edge creative solutions for their business challenges.

**Human Capital**

They have strategically built up their human capital with over 300 Data Scientists, Business Analysts & Consultants, Statisticians, Researchers and Data Engineers across India, Singapore and the United States, bringing together an enviable expertise & knowledge of Programming, Statistical & Querying languages, technological orientation along with a strong pedigree of business understanding and fervent problem-solving acumen.

**About Affine**

Affine is a Data Sciences & AI services provider, offering capabilities across the analytical value chain from data engineering to analytical modeling and business intelligence to solve strategic & day to day business challenges of organizations worldwide. They empower their clients to make informed decisions & to take proactive actions through impeccable technology-based development & business acumen.

They develop solutions for multiple verticals such as Retail, CPG, E-commerce, High-Technology, BFSI, Media & Entertainment, Manufacturing among others and are respected as one of the Marquee names in the “Consultancies for Transformation” space.

Affine is headquartered in Bengaluru, India with other offices in New York & Seattle, United States and Singapore.

**Contact:**

Ankit Khandelwal

Country Manager – Sales & Client Engagement

Ankit.khandelwal@affineanalytics.com

Singapore Office Address: 5001 Beach Road

#08-11 Golden Mile Complex

Singapore (199588)

SOURCE Affine Analytics

The post Yahoo Finance – Affine’s Singapore Office Launch to Support Rapid AI & Data Science Growth in The APAC Region appeared first on Affine.

]]>Date: August 20, 2019

The post Business Insider – Affine’s Singapore Office Launch to Support Rapid AI & Data Science Growth in The APAC Region appeared first on Affine.

]]>**Affine Analytics, **Leading Artificial Intelligence & Data Sciences Solution provider, has opened its new office in Singapore. This move is in line with its ambitious expansion plans into the Asia Pacific region in 2019 and beyond.

Affine is the brain-child of three data-mavericks, Manas Agrawal, Vineet Kumar and Abhishek Anand, who are all reputed specialists of their fields with more than 30 years of combined experience and unique strengths in management and solutioning of AI & Advanced Analytics based services. Affine, at the cutting-edge of technology-based development, has emerged as a leader in harnessing the power of cloud-based services, Big Data and Artificial Intelligence.

**Manas Agrawal, Affine Co-founder & CEO said**:

“We are celebrating a milestone moment on the opening of our new office in Singapore region. This is an exciting time for us, as we continue to expand our operations in the Asia-Pacific markets to attain significant growth and help our customers scale quickly, easily and effectively.”

Affine foresees a tremendous potential for growth & consumption in some niche areas such as Machine Learning, Deep Learning, IoT, Computer Vision, etc., especially with organizations now wanting to acquire & implement data-driven solutions for informed decision making and to stay proactive.

Catering to the ever-increasing technology-based needs of this market, they offer Data Sciences & Artificial Intelligence as a strategic tool and provide the necessary expertise & experience, acquired and honed over the years.

With the launch in Singapore & their increasing footprint globally, they plan to create competitive advantage for clients by incorporating innovation and data technology upgrades into their existing systems.

**Ankit Khandelwal, Affine’s Lead for South East Asia Region said:**

“Our prospective clients would be Asian arms of global corporates, regional businesses with global aspirations as well as Government linked companies. We also intend to partner with some of the leading technology players in the region to quickly scale up and reach out to maximum number of clients who could benefit from our set of offerings.”

**Preferred Partners**

Affine is a strategic analytics partner to medium and large-sized organizations (majorly Fortune 500 & Global 1000) around the globe that creates cutting-edge creative solutions for their business challenges.

**Human Capital**

They have strategically built up their human capital with over 300 Data Scientists, Business Analysts & Consultants, Statisticians, Researchers and Data Engineers across India, Singapore and the United States, bringing together an enviable expertise & knowledge of Programming, Statistical & Querying languages, technological orientation along with a strong pedigree of business understanding and fervent problem-solving acumen.

**About Affine**

Affine is a Data Sciences & AI services provider, offering capabilities across the analytical value chain from data engineering to analytical modeling and business intelligence to solve strategic & day to day business challenges of organizations worldwide. They empower their clients to make informed decisions & to take proactive actions through impeccable technology-based development & business acumen.

They develop solutions for multiple verticals such as Retail, CPG, E-commerce, High-Technology, BFSI, Media & Entertainment, Manufacturing among others and are respected as one of the Marquee names in the “Consultancies for Transformation” space.

Affine is headquartered in Bengaluru, India with other offices in New York & Seattle, United States and Singapore.

**Contact:**

**Ankit Khandelwal**

Country Manager – Sales & Client Engagement

Ankit.khandelwal@affineanalytics.com

Singapore Office Address: 5001 Beach Road

#08-11 Golden Mile Complex

Singapore (199588)

The post Business Insider – Affine’s Singapore Office Launch to Support Rapid AI & Data Science Growth in The APAC Region appeared first on Affine.

]]>The post Are Streaming-services like Stadia the future of Gaming? appeared first on Affine.

]]>Uber has revolutionized the way of commute since its launch. Traveling short distances has never been hassle free. Earlier people used to use their personal vehicles to cover small distances. Other alternative was to use public transport which is time-consuming and inconvenient. Uber, on the other hand, provides flexibility to non-frequent traveler and ones who love commuting over shorter distances, as they do not have to spend on purchasing a vehicle and at the same time can move around very conveniently. The same might hold true for the future of gaming! What would you feel if technology giants like Google and Amazon-owned the expensive hardware to process games with the best possible CPU and GPUs allowing you to simply stream the games? This could potentially eliminate the need of purchasing an expensive console and pay in a proportion of usage! This could be a game changer especially for someone who has not been able to commit to a INR 30,000/- console to play a single game. Can the entry of Google and Amazon in the gaming industry make this possible?

At the Game Developers Conference (GDC) 2019, Google unveiled its cloud-streaming service called STADIA. Just like how humans have built stadiums for sports over hundreds of years, Google believes it’s building a virtual stadium: Stadia, to foster 1000s of player to play or spectate games simultaneously interacting with each other. Free to play games like Fortnite will standout on Stadia if Google can increase the number of players participating in an instance from 100 to say 1000s. Would Stadia really live up to its hype is a tricky question that only time may answer.

2. **How does it work?**

Google will make use of its massive data centers across the globe that will act as computational power for this service. Massive servers will make use of its advanced CPUs, GPUs, RAM and ROM to render games and stream to the users the enhanced audio/visual outputs. The players’ input shall be uploaded via keyboard or custom Stadia controller directly to the server. Let’s look at how Stadia stands against conventional console-based gaming.

3. **Comes with advantages over console-based gaming**

3.1. No Hardware (other than a remote): The bare minimum piece of hardware required is a device that can run chrome like a laptop, PC, mobile, tablet or even smart TV.

3.2. No Upgrade costs as they are taken care of, by the shared infrastructure hosted by Google. In the recent past, we had games that were below 10 GB in size while the recent RDR2 was above 100 GB with its patches. One can imagine how the need to upgrade hardware is the biggest driver for upgrading to next-gen consoles.

3.3. No RAM/ROM or Loading time limitations: Apart from these, YouTube integration will enable users to live broadcast their gameplay and will allow others to join as well in case of multiplayer games. In addition, the google assistant present on stadia controller will provide immediate help in case one is stuck at some point of time to clear the stage.

The benefits of this concept are really promising. But will the drawbacks offset these promises? Let’s go through each of them.

4. **Need to overcome challenges to expand at scale**

The drawbacks can potentially be addressed over time, but for now, scaling this remain the biggest hindrance. There are various challenges that Google (and users) will face such as Latency, Pricing, Markets and Game Library. There are other pointers as well, but these are going to be the biggest ones.

4.1. Latency effect

The video footage must get to you and the controller inputs must get from you to the server. Hence it is obvious that there is going to be an extra latency. Latency will depend upon three elements:

– Amount of time to encode and decode the video feed: Google has tons of experience in the field of video feed under the likes of YouTube

– The quality of internet infrastructure at the end user: This worrisome problem will hinder the smooth conduct of this process. The internet speed will be good in tier 1 cities, and not necessarily in the rural areas. You will also need a data connection without any cap. As per google, a minimum speed of 25Mbps will be required to bring Stadia into function. This means 11.25 GB of data will be transferred per hour. That’s about 90 hours of game streaming before the bandwidth is exhausted, considering that the user has a data cap of 1 TB. In other words, 3 hours of gaming per day in a month of 30 days. This is under the assumption that there is only one user and is utilized only for gaming purpose.

4.2. Dilemma for developers

Above was the issue that the end user will face. Let’s look at the situation from the game developer’s perspective. With the advent of a new platform, the developers will have yet another platform to port and test games. The developers will have to do more research which will increase the cost of production. At the same time, more time will be required to release game. This will be a big challenge for franchises that launch games every year. Google has partnered with Ubisoft and has promised to feature Ubisoft games at launch. The time will tell how many more developers will be willing to go a step ahead to support this concept. If not, then this could potentially mean that a lot of games will not be available ever. Now from a consumer’s perspective, it will be hard to justify their purchase as they won’t be able to play all the games available in the market.

4.3. Optimal pricing

Another challenge will be pricing. There is no information regarding the pricing of the overall model. Is this going to be a subscription service? Do we have to buy games? How the revenue is going to be shared with developers? Will the pricing be the same for hardcore gamers and casual gamers? Consider Activision (developer of games like Call of Duty) for example. Historical analysis tells us that slightly more than one-fourth of the purchasers do not even play the game for few hours. On the other hand, there are purchasers who play it day in and day out. The cost that each user has to pay for the game is $60. This amount goes to Activision and the platform on which it is sold. In case, Activision decides to release the game on Stadia, all the casual purchasers who would have bought the game to test out the hype, would now just stream it on Stadia at a much lower cost. Will Activision take that chance and release the game on Stadia?

In case, the pricing is different for the types of users, how will the revenue be shared with the developers? Let’s assume that this will be a subscription model and users will be charged $30 per month, which comes out to be $360 per year. Now for a casual gamer, this will be very high as he can buy a console for $300 and play for years. All these questions will have to be answered before the launch. Running a cloud gaming service is expensive. If the whole selling point is making gaming accessible to more and more people, then a high price point is not going to help the cause.

4.4. Available markets

At the GDC event, the team said that the service will be available in the US, Canada, UK, and Europe at launch. These regions have a high penetration of console-based gamers and Google will have to make a lot of efforts to make these people switch. The penetration of PlayStation and Microsoft Xbox is in single digits in India or China. With Stadia not available in Asia, Google is missing a lot of developing countries like India and China where people are not inclined towards consoles and hence hampering its user coverage. Given the high cost of consoles in developing countries like India, Stadia can become the go-to gaming platform.

4.5. Array of games available

Games library will be another hurdle in the race. We have no information regarding the list of games available during launch. Third party support isn’t enough for a gaming platform to survive. You need a list of exclusive games to bring people aboard. Google even unveiled its own Stadia Games and Entertainment studio to create Stadia-exclusive titles, but it didn’t mention any details on what games it will be building. In addition, it is highly unlikely that Console exclusives (1P titles) like Spider-Man or Halo will be available for Stadia. 1P games play a significant role in the console sales and Sony and Microsoft will never let this happen until they stick to console-based gaming. So, Google will have come up with its own exclusive titles so be dominant in the market. Making exclusive games takes a lot of research and time. It took Sony a good 5-6 years to develop one of its best-selling game “God of War”. If Google has not already started on its exclusive games, then it would be a mountain to climb for them.

4.6. What about other browsers?

Stadia will only be available through Chrome, Chromecast, and on Android devices initially. There was no mention of iOS support through a dedicated app or Apple’s Safari mobile browser. Will Apple be comfortable to let its user base shift completely to Chrome from Safari? Will Apple charge Google additional money for the subscription that Google gets on Apple’s devices? All these questions will be answered over time.

4.7. What if…?

Last but not the least, in case Google decides to drop the idea of Stadia in the later years of its launch like it has done in the past with Google lens or google plus, then gamers will lose all their progress and games despite their subscription fees. Apart from the above drawbacks, Google is not the only company to step in this field. It already has some serious competition from existing players in the game streaming sector.

5. **Any competition that Google might face?**

Sony already streams games to its consoles and PCs via its PlayStation Now service. Microsoft is also planning its own cloud game streaming service and can leverage its Azure data centers. Also, both Sony and Microsoft don’t require developers to port their games for their cloud streaming service. Apart from these two players, Nvidia has been quite successful in this domain allowing users to stream games from its library. This means Google has some strong competition and looks like the cloud gaming war is just getting started.

6. **Conclusion**

What is the incremental change you get from one version of a device to another? It is the absolute bare minimum they can give to make people switch. Let’s take an example of PS4 slim and PS4 pro. The only difference is that Pro supports 4K while Slim doesn’t and we have seen 30% people switching from Slim to Pro. The entrance of Google into the gaming industry will make PlayStation better, it will make Xbox better, it will make internet infrastructure better. The success or failure of Google stadia will cost nothing to consumer and at the same time, it will be net positive to gaming industry as well.

Thanks for reading this blog, For anyfeedback/suggestions/comments,

please drop a mail to marketing@affineanalytics.com

Contributors:

Shailesh Singh – Delivery Manager

Akash Mishra – Senior Business Analyst

The post Are Streaming-services like Stadia the future of Gaming? appeared first on Affine.

]]>