A Little Here, A Little There: 2018

27 July 2018

[Investing] 3 Lessons To Learn from the Hyflux Saga

I first heard of Hyflux in 2006 while still doing my internship in a SME in a similar industry - water treatment, when Hyflux IPO-ed, if my memory does not fail me. It was a reputable company with much potential. Apparently, the MD of the SME was from the same batch/company as Olivia Lum, the CEO of Hyflux, and he was pretty envious of the latter. I suppose it is a thing with bussinessmen, to bring their businesses to public, or IPO.

I was not too sure about that. I just started to learn to invest then. I knew water was (and still is) an important resource in Singapore, so I reckon anything to do with water treatment should not be too bad a deal. I could not read the financial statement then.

Hyflux opened at S$1.50 on the day its stock debut, if I remember correctly. I could not participate in any of those actions because of the lack of capital and knowledge then, since I was just a student. Even when I started investing in 2009, I never quite considered Hyflux because I thought it was too expensive but I still watch stock once in a while for my interest in water-related stocks.

Fast forward 7 years to 2016, Hyflux introduced a 6% perpertual securities. A friend asked if it was a good deal. 6% is really a good deal, but I think it was too good to be true and did a quick check on the financials. I concluded that Hyflux could not sustain the 6% pay out; it was probably a "stunt" to raise capital. I hoped the friend heeded my suggestion and stayed far far away from the perps.

There are some interesting articles on Hyflux and its recent financial troubles. I shall list some here:

But what happened? I think it can be summarised as such: Tuaspring, loss, loss, loss and loss. Tuaspring is the second and largest desalination project which Hyflux won the bid in 2011. Today, Hyflux is trying to sell that plant because it is the cause of all the troubles, or so it claimed. I would also attribute it to bad management (I could hear Buffett screaming that too).

What has happened may or may not be important - I am not affected anyway, since I did not buy their stocks, but the lessons we can learn from are valuable. (I am writing this also as an application of what I know and also hope to serve as a reminder to those who are risking their money for a better retirement, etc. Imagine buying Hyflux at $1.20 a piece and see it down to S$0.20 a piece over 4 years.)

Lesson 1 - Profitability is always the First, with Consistency as its Prefix

As a stock investor, profitability of a company is a must, and it must be consistent. By consistent, I mean that revenues and profitability must grow in most years over a long period (at leat 5 years). It is ok to have a small dip in one of the years, as long as the company is profitable (not making a loss). I do not consider any company further if I see more red or bracketed numbers in the net income.

For the case of Hyflux, profitability has always been a challenge it has yet to overcome. Referring to the Income Statement from SGX for 2014 to March 2018, Hyflux profit has been flat, except for 2016 there was a 100% increase, then it started to go negative in Year 2017. EBITDA is negative for 3 out of the 5 periods. EPS too.

Lesson 2 - Costs will erode any good performance

Costs thin profit margins, and hence minimize profitability. Controlling costs is probably the easiest way to improve profitability. It is also a measure of the management's competency and efficiency. Read the annual report and see if anything is being done to keep costs in check at least. High costs is always a red flag that should be scrutnised.

Hyflux is considered a high tech company and there I will not be surprise if its operating costs was high. The next question is: did the management do anything about it? I did not investigate further because I think I made my point for Lesson 2.

Lesson 3 - Debt is like a cancer

As long as cash is not generated from the business operations, I would consider a debt. Money borrowed from banks = debt. Money raised from IPO, stock issuance and whatever = debt. Accounts Receivable = debt, since money is not "generated" into the companies account yet. It is a gross over generalisation from the conventional debt = banks, equity = shares, and the definition of liability. But I personally find defining debt my way is easier to assess the financial health of the company.

Some extent of debt is healthy. It helps with business operations and also cashflow. But when the company's operation is nto earning enough to service the debt, that is where it runs into trouble. It means the business model has a problem, it could be a cost or efficiency issue, or it could be that the business is not profitable to begin with (quite true for all the start-ups scene that pop up recently). It may also have issues with claiming money, or the companies that they have receivables with also run into financial problems (remember Keppel?). Or if they overextend their dividend payout (consider a company with a dividend payout ratio of more than 100% every year; it will run out of juice really fast).

As long as a company issues bond or new shares or the like, it is important to scrutinise the intent of it. Hyflux did the 6% perps, which I found that they probably wont have the ability to fulfill that commitment. True enough.

'A big shock': Retail investors in Singapore caught out by Hyflux woes (CNA, Jun 2018)

I think it is safe to assume that no company will want to give out money freely. Afterall, stocks is a way to raise money. Financial statements are marketing tools to do that.

While picking stocks, I look for three things in the financial statements - Profitability, Financial Strength and Sustainability, before diving further into analysis.

I find that Sustainability is very important. If companies overeextend payouts (be it executives' salary, dividends, etc) or the business operation is too costly, I do not foresee they will remain profitable, for long.

I think it helps to always asks:"Where is the money coming from?" when analysing companies.

If it interests you, you can also read about my thoughts on the Netlink Trusts (CJLU) here. The stock is now at S$0.78 (was S$0.815 at IPO), down as I would have expected it.

~ZF

19 May 2018

[Investing] Book Review - The Warren Buffett Way (3rd Edition) by Robert G. Hagstrom

As the title suggests, this book is about Warren Buffett and his approach and wisdom on investing. It is well-known that Buffett's investing approach goes something like this.

Regard buying a stock as owning part of a business. That is, treat stock investing as you would running a business.
Buy stocks of businesses that are easy to understand, within your level of competency, has a competitive edge and run by companies' with competent management.
Buy stocks cheap.

The book actually sums up the above 3 points in a form of a framework call the Tenets of the Warren Buffett Way. It has four categories, namely Business Tenets, Management Tenets, Financial Tenets and Market Tenets.

The book also discusses the academic forms of finance/investing that gave birth to the modern portfolio theory (the idea of covariance of stock prices as risks, the Captial Asset Pricing Model and the Market Efficient Hypothesis). I found a part pretty meaningful and I shall quote it here:

Today, investors are caught in an intellectual and deeply emotional crossroads. To the left lies the pathway of modern portfolio theory. The theory has a 50-year history full of academic papers, neat formulas, and Nobel Prize winners. It seeks to get investors from Point A to Point B with as little price volatility as possible, thereby minimizing the emotional pain of a bumpy ride. Believing the market is efficient, hence price and intrinsic value are one and the same, adherents to modern portfolio theory focus on price first and asset value later -- or sometimes not at all.

To the right lies the pathway that Warrent Buffett and other successful investos have taken. It has a 50-year history that is full of life experiences, simple arithmetic, and long-term business owners. It seeks to get investors from Point A to Point B not by providing a smooth short-term price ride but by orchestrating an investment approach that seeks to maxmize, on an economic risk-adjusted basis, the intrinsic-value rate of growth. Proponents of the Buffett approach do not believe the market is always efficient. Instead, they focus on asset values first and prices later -- or sometimes not at all

I have read much on value investing (see my reading list!) and I consider myself a value investor. I think this book is worth a read.

~ZF

13 February 2018

[Data Science] Thoughts After Completing Coursera's Deep Learning 5-Course Specialisation

I have just completed Coursera's 5-Course Specialisation in Deep Learning taught by Andrew Ng. I have been looking forward to this Deep Learning course after taking Ng's Machine Learning course, also in Coursera, and learning a little more about Deep Learning.

Course Structure

The specialisation consists of 5 courses:

1. Neural Networks and Deep Learning

This course introduces Neural Networks and Deep Learning. Since I have taken the Machine Learning course, the first half of the course was not totally new to me. Deep Learning is an neural network on steroids: more hidden layers, which can be multi-dimensional. The exciting part about this course is I get to code Deep Learning Algorithms by hand using Python and Numpy - no Tensorflow or Keras, yet.

2. Improving Deep Neural Networks: Hyperparameter tuning, Regularisation and Optimization

One of the powerful features of Deep Nets is the ability to learn complex relationship. However, the compromise to this power is the need for many parameters to the algorithm, or hyperparameters, to tune. This course is specifically for tuning hyper paramters. It can be quite dry. The only respite is when the Ng started to touch on Optimisation algorithms. I find this part very interesting, and I get to understand the workings behind when I type things like opt = AdamOptimizer(...)

3. Structuring Machine Learning Projects

This is one of the shorter module, and also another drier one. I shall not describe too much about it. The next two courses are the "specialisations" to the course.

4. Convolutional Neural Networks

Or CovNets for short. It is used for image recognition (or computer vision). Wonder how a security camera is able to pick up faces, verify objects or identify suspicious articles? It is like the works of CovNets. It took me a while to have a fuzzy understanding of this powerful application of Deep Nets.

5. Sequence Models

What happens if a time dimension is included into data? The data becomes a time series, or sequences. The Recurrent Neural Network is an application of Deep Nets on time series or sequence data. This is also a short course, but a pretty heavy one. It deals with mostly Natural Language Processing (NLP) and Machine Translation. There is also a side project on Jazz music improvisation, which I find interesting too.

The programming exercises are coded in Python using Jupyter Notebooks. They are interesting! Along the way, there will be some opportunity to use Deep Learning frameworks like Keras and Tensorflow. Keras is a "simpler" but less flexible derivative of Tensorflow, by the way.

Take-Away

The reasons I took this specialisation up are:

Deep Learning is a building block of artificial intelligence (AI), a topic which I have grown to be very interested in. I always believe AI is going to provide an opportunity to provide more equity to mankind, if managed properly.
My current work involves in analysis tasks which I think can be automated. With the knowledge of Deep Learning, I believe I can implement/build something, or at least a proof-of-concept, AI-ish to improvement work efficiency. If complicated tasks like image recognition, jazz improvisation, and even translation can be accomplished by Deep Learning, I believe my work can benefit from it too; I do not think the work I do is more complicated than what Deep Learning can accomplish, honestly.
Well, the name "Andrew Ng" has become somewhat a household name for machine learning.

I gained better understanding in Deep Learning and its related concepts. However I come to realise a few "inconvenient" aspects of learning from online resources, based on my person experience:

In online courses, data are provided - cleaned, transformed and ready to be deployed. In real life, data can be a pain to obtain, let alone clean and validated, which will consume much time.
The application of the concepts, however, requires much self-study is needed to learn about using Keras and Tensorflow. This mirrors much in real life. Stackoverflow remains my best friend.

Nevertheless, the learning continues.

~ZF

19 January 2018

[Investing] Colex (567) Vs 800 (5TG)

I have been collating and filtering stocks data and Colex Holdings Ltd (567) has been appearing in the top 10 stocks for me to KIV. I have noted the stock for and been watching it appreciate from 30-ish Singapore cents to 40-ish cents. Pain, I know.

Recently a friend discussed with me about the stock again, and he was also recommended 800 Super Holdings Ltd (5TG), which is also a similar company.

First, why Colex ((567) caught my eye:

Profile:
------
Stock Symbol = 567
Price = 0.481
Dividends Yield = 0.021
Price/Sales = 0.800
Price/BV = 1.500
Price/Cash Flow = 5.590

Profitability
------
Earnings per Share (EPS): 0.048
Net Profit Margin % : 7.960
Return on Equity % = 15.890
Return on Capital % = 12.640
Return on Assets % = 12.640
Inventory Turnover % = nan
Assets Turnover % = 1.590

Financial Strength
------
Book Value = 0.280
Current Ratio = 4.050
Quick Ratio = 3.900
Interest Coverage = 305.870
Debt-to-Equity Ratio = nan

The data is collated from [1] using a Python script I wrote. What drew my attention about Colex is its efficiency, namely that its Return on Capital (or Assets) is 12.6%, that is, for every dollar of capital, Colex is able to bring 12.6% of earnings. It also has a very strong current ratio of 4X! Oh, and that interest coverage too! Not shown here is that the P/E of Colex is 16x.

Next, why 800 (5TG) is a potential alternative:

Profile:
------
Stock Symbol = 5TG
Price = 1.207
Dividends Yield = 0.025
Price/Sales = 1.310
Price/BV = 2.400
Price/Cash Flow = 6.160

Profitability
------
Earnings per Share (EPS): 0.096
Net Profit Margin % : 10.800
Return on Equity % = 21.170
Return on Capital % = 11.120
Return on Assets % = 11.120
Inventory Turnover % = 31.630
Assets Turnover % = 1.030

Financial Strength
------
Book Value = 0.480
Current Ratio = 1.490
Quick Ratio = 1.420
Interest Coverage = 27.940
Debt-to-Equity Ratio = 0.470

Indeed, 800 (5TG) seems to have a better valuation. Its Return on Equity is much higher, its Return on Capital (or Asset) is almost on-par with Colex's. Financial strength-wise it is not as impressive compare to Colex. 800 (5TG) has a P/E of about 12X. Based on this metric, it seems like it is somewhat under-valued.

Just by comparing the data I just shown, it is difficult to decide which to buy. At 11-12% Return on Capital it is equally enticing at any price. I have to dwell deeper into the financials. I did not refer to the respective Annual Reports, but rely on [2]. Again, I will be doing the Profitability, Financial Strength and Cash Flow analysis, quick and hopefully not too dirty.

Profitability

Comparing the Net Income for the past 5 years, Colex has a increasing net income, whereas 800 seems to have plateau or at least not showing a solid consistency.

Financial Strength

In terms of cash stash, 800 has more (25M Vs 14M). 800 also has a comparatively larger amount of receivables. I learnt that receivables, although is in the Asset account, can be a potential write-off risk. Total Assets for 800 and Colex are 159M and 44M respectively.

However, if we compare liabilities, based on the data [2], Colex has close to 9M, whereas 800 has close to 77M.

Both companies' retained earnings are growing.

Cashflow Analysis

In this analysis I took the Operating Cashflow and added back depreciation and minus of CAPEX. Then I divide by the outstanding number of stock. This yielded 0.62 for 800 and 0.74 for Colex. Stock to stock, Colex stock can generate more cash. Colex stock is at 40cents but each stock can churn 74 cents of cash! (OK this part I need to review the calculations, again).

Conclusion

When I look for stock, I look for efficiencies based on ratios such as Returns on Capital and the Free Cashflow generated per share. At this point, Colex at approximately 12% Returns on Capital and a better cashflow number makes a better choice between the two.

Please feel free to comment!

*** Note: This is my personal analysis and is not a recommendation for a stock buy or a stock tip.

~ZF

References:
[1] MSN Money
[2] Yahoo! Finance

11 January 2018

[Investing] Comparing Dollar-Cost Averaging and Buy-Low-Sell-High (B.Lo.S.Hi) Strategies

In my previous post, I applied dollar-cost averaging (DCA) on Nikko AM STI ETF (G3B) on a 5-year time-frame.

Then a friend showed me an article about the fallacy of dollar-cost averaging.

Indeed, I mentioned in my previous post that dollar-cost averaging depends much on the point where the prices are; if one started low, the accumulation will be fast.

But the result of averaging is not average. At times when prices are low, units accumulation takes place. When prices are high, the additional units also contribute to the total value. Units are the one generating dividends too. In good times (prices are high), dollar-cost averaging suppresses costs. In bad times when prices are low, dollar-cost averaging helps consolidate units.

Dollar-cost averaging is a lazy-man style of investing in auto-pilot. It makes sense if investing costs are low. Obviously, it cannot do better than the Buy-Low-Sell-High (BLoShi) strategy, as I will show later. But the DCA has its merits over the BLoShi too, which I will also share my thoughts later.

DCA Vs BLoSHi. Fight!

I have extracted the maximum data (month) from Yahoo! Finance and coded up the following charts using Python. The hypothetical investing strategy is that I am investing $100 per month since 2009, when G3B debutted into the market.

As shown in the above chart, the average costs increases over time, but the rate of increase slows as time advances because of the large amount of units accumuated (not shown here). The points that are marked with a red 'X' are points when the actual price that is below the average price. These will be points in consideration for the BLoSHi approach.

The BLoShi Approach

In the BLoShi approach, funds are accumulated since the start and maximum units will be bought with all the funds accumulated when the price is below the average price of the DCA method. As shown, there are 5 points that mark this criteria. I shall assume that I enter at only two of these - the first (31-Aug-2011) and the last (31-Jan-2016).

The expected result is that this approach would definitely fare better than the DCA. As we shall analyse in the next chart.

DCA Vs BLoSHi. Performance

The following chart shows compares the value of the units held for the whole period in consideration.

The relevant numbers are:

With DCA,
Total Amount invested = $10,800

Total Units Bought = 3839

Total Value = $13,669

Total Yield = 26.565%

BLoSHi (Without DCA)
Total Amount invested = $8,400

Total Units Bought = 3280

Total Value = $11,678

Total Yield = 39.027%

Due to the perpertual nature of the DCA, the amount accumulated in DCA is definitely more than the approach without - 600 more units as shown. The costs and holding value will also be higher.

Note that the analysis does not include the dividend yield.

A Simple Principle in Investing

"Buying Low and Selling High" is one of the principles of investing. "Buy a dollar for 50-cents". How low is low and how high is high? There are many metrics to determine that. For the lows, some use the 52-week low to gauge, the same is true for the highs. This form some-sort of a price resistance (I am not a trading pro BTW). Some use moving average over a certain period - if the price is below that MA, buy; if the price is above that MA, sell! Simple enough. Of course there is also the almost-mythical intrinsic value.

We may determine the intrinsic value, but we are able to make a best guess of it. Personally, I would say that numbers that are backed by fundamentals. i.e. there is a basis, is probably the more reliable one, because I cannot find satisfactory answers to question like:"How do we know if the price is going to break the 52-week high (or low)?", "How do we know if the prices will go lower even after falling below the MA?". To the latter question, one way is to keep buying, of course.

The Problem with BLoSHi

BLoSHi seems like a very sound investment strategy, I just wait out for the price to be low enough to buy and sell when the price is high enough. However, the investment strategy is oversimplified. Let's look at the statement again, this time I will italics the key words:

I just wait out for the price to be low enough to buy and sell when the price is high enough.

In essence, there is uncertainty in the price movement and the time. Firstly, I need to determine the price - how low/high is adequately low/high to buy/sell? In the above analysis, it is retrospective. It is easy to pin-point prices that are lower than a certain value (an yearly average, for example) on a historical price chart like the above. Again we can use various metrics (discussed above) to determine the price to buy with some confidence.

Next, when prices are live, and when it shows that the price is trending down after a peak, not many of us will be able to overcome the psychological barrier (fear, that is) or the discipline to buy even if it is near the low that we are waiting for. That is because nobody is certain where the price is going - is it going to bottom out? What if it crash?

Finally, holding out cash to wait out for the price to reach buying-low is not efficient at all. There are much opportunity costs to this approach. In the example above, it is more than 5 years between the two purchases (Aug 2011 and January 2016). Consider the amount that could have been paid out. Assuming that the price remains stable at $2.50, a monthly contribution of $100 will allow me to buy 40 units. In a year I would have accumulated 480 units, and 2400 units in 5 years. If each unit pays out 2% or 0.05cents dividends every year, I would have missed out on $360 of dividends in total, that is more than 3 months' worth of contribution.

With 2% dividend, the yields presented above will be about 29% and 39% for the DCA and Non-DCA cases respectively. As mentioned, the BLoSHi has forgone the compounding effect of any dividends.

Conclusion

Dollar-cost averaging is not a perfect investment strategy. Honestly, in my opinion, it cannot beat the market per se, but it helps to suppress costs when prices are high and accumulate units when prices are low. Here 'highs' and 'lows' are relative to the average price. It also helps one stay invested without the psyhchological barrier when prices dip, and it optimizes the effect of compounding. It also helps negate the need for large capital outlay. This way, we still can get back a fair share from the stock market even though the outcome may not be as stellar as the BLoSHi approach, which I shared that there are many barriers to apply efficiently. However, this is all in the pretext that the investing costs are low.

Perhaps, to optimize the results, one way is to increase the contribution when the prices are low. Maybe I shall leave it for another time.

Do feel free to air your comment!

~ZF

07 January 2018

[Data Science] Machine Learning - Part 3, Logistic Regression

In a Classification task, the learning algorithm allocates a number 1 if an instance of the object is to a class (True), and 0 otherwise. When predicting, the outcome will be a number between 1 and 0, which can be interpreted as the probability of the object belonging to a certain class.

The common algorithm to do so is the Logistic Regression. The hypothesis function of algorithm is

\[\hat{y} = \frac{1}{1 + e^{-Z}}\]

known as the sigmoid or logistic function, or the S-curve. $Z$ can also be any function with respect to the features and the corresponding weights, for example, $Z = w_1X_1 +w_2 X_2 + w_3X_3$. In the learning algorithm, the weights, $w_1, w_2, w_3$ are determined through optimization, such as Gradient Descent, of the cost function.

The output of the sigmoid function will always be between 0 and 1, as the chart below shows for $Y = \frac{1}{1 + e^{-Z}}$

I shall apply Logistic Regression on the Iris data set. We can call out the dataset in Scikit-Learn using this code:

from sklearn import datasets

iris = datasets.load_iris()

The data is a dictionary, so we can see the keys using iris.keys(). There are three species of the flower with 50 instances each. Hence there are 150 instances of data.

Next, the feature and label data, X and y respectively.

X = iris['data'][:, 3:] #petal width

y = (iris['target']==2).astype(np.int) #1 if Iris-Virginica

As shown, only the petal width will be used to identify if the species is Iris-Virginica in this example.

It is always a good idea to split the data into training and test set randomly. I will split 30% the data as test data (that is 45 instances). This can be done easily using Scikit-Learn:

from sklearn.model_selection import train_test_split

X_train, X_test = train_test_split(X, test_size = 0.3, random_state = 123)

y_train, y_test = train_test_split(y, test_size = 0.3, random_state = 123)

Now, we can apply Logistic Regression on the data!

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X_train, y_train)

The model's attributes such as the intercept and coefficient can be called out using model.intercept_ and model.coef_ respectively. Note the underscore '_'!

Model Coefficient = [[ 2.17900148]]

Model Intercept = [-3.67603174]

Hence the model is:
\[\hat{y} = \frac{1}{1+e^{2.179X -3.3676}}\]

We can use the model to make predictions.

pred = model.predict(X_test)

To determine how the model fare, we can use a Confusion Matrix to see the number of right classifications.

from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_test, pred))

The confusion matrix will come in a form:

[[ TN, FP]
[FN, TP] ]

The result will be:

[[28 0]

[ 2 15]]

Hence,

TP = 15
TN = 28
FP = 0
FN = 2

The rows of the matrix are the True classes; the columns are the Predicted classes. Hence there are 15 + 28 instances where the model classfied correctly (15 True predicted as True, or also known as the True Positive; 28 Negative as Negative, or True Negative).

From the confusion matrix, the True Positive Rate, or Precision, the proportion of positives predicted by the model that are truly positive, can be determined.
\[precision = \frac{TP}{TP + FP}\]

Another useful metrics is the proportion of positive instances that are correctly predicted by the model. This is known as the Recall.
\[recall = \frac{TP}{TP + FN}\]

The precision and recall can be computed using the following code:

from sklearn.metrics import precision_score, recall_score

print('Precision Score = {:0.3f}'.format(precision_score(y_test, pred)))

print('Recall Score = {:0.3f}'.format(recall_score(y_test, pred)))

The result:

Precision Score = 1.000 (15/15+0)
Recall Score = 0.882 (15/15+2)

If we were to calculate the accuracy, the instances that were predicted correctly, it will be 33% (15/15+28+2). This is why accuracy is never used in classification tasks.

Here's the graphical representation of the things were carried out.

This marks the end of the Logistic Regression example. The logistic regression learning algorithm is one of the important learning algorithm. It is also frequently used as a building block to understand Artificial Neural Networks, or Neural Nets for short, which I hope to touch on soon.

I hope this has been useful in providing a bigger picture to the Logistic Regression learning algorithm. I have deliberately left out the mathematics behind the learning algortihm because there are many resources that can do a better job than I. By the way, the reference quoted below is awesome!

~ZF

References:

[1] Hands-On Machine Learning with Scikit-Learn and Tensor Flow, Aurelien Geron, O'reilly

01 January 2018

[Data Science] Machine Learning - Part 2, Essentials of Machine Learning

In my previous post, I briefly described what machine learning is. I shall attempt to dwell into more (but enough) details in this posts.

As mentioned, machine learning is about deploying algorithms on a computer to apply statistical methods in data analytics. An algorithm is a set of instructions to perform a specific tasks, so a machine learning algorithm is a set of instruction for the computer/programme to learn patterns from a set of given data.

Types of Machine Learning Algorithms

There are many machine learning algorithms. The common ones are:

Linear Regression
Logistic Regression
Nearest Neighbours
Support Vector Machines
Decision Trees (and Random Forests)
Neural Networks
Clustering
Dimensionality Reduction (e.g. Principle Component Analysis, or PCA)

Supervised and Unsupervised Learning Algorithms

Machine learning algorithms are categorized as supervised learning or unsupervised learning - basically it is data with a known outcome, or labels, for the former (supervised learning) and data without labels for the latter (unsupervised learning).

The outcome of supervised learning is to find a general pattern of the data that validates the labels. Items 1-6 in the list above are supervised learning algorithms.

The outcome of unsupervised learning is to find and group general characteristics within the data. Items 7-8 are unsupervised learning algorithms.

Data Types and Coresponding Objectives

Data can be quantitative (numbers, or measurables on a standard scale), or qualitative (description, or non-measureables because not on a standard scale). Qualitative data is also known as categorical data.

Quantitative data is generally used for projection. For example, what would be next year's sales of a shopping mall be like, given the number of visitors to the shopping mall this year?

Qualitative data is generally used for classification, which can be broken down into two tasks: object verification (if a cat is a cat), and object identification (if an object is a cat).

In practice, the two types of data co-exists. It is unlikely to have a pure data of either form. Thus extra care has to be made to handle data because categorical data is usually assigned a number to represent groups or level, and usually this might give algorithms a false sense of scale if adequate consideration is not taken.

Hypothesis Function

Each machine learning algorithm has an assumption about the data, perhaps except neural networks. This assumptions are normally described as a function, called the hypothesis function. For example, the linear regression assumes linear relationship between the data and the label that we are interested in.

For example, the hypothesis function for linear equation is:
\[\hat{Y} = mX + c\]

By convention, $\hat{Y}$ is the approximation (to the real label, $Y$). It is related to the input data $X$ by $m$ and $c$, the parameters of the model; that is, a different set of $m$ and $c$ yields different value of $\hat{Y}$.

Cost Function

The desired $\hat{Y}$ is one that it is closest to $Y$. This means that $Y - \hat{Y}$ is minimum. Any deviation of $\hat{Y}$ from $Y$ is akin to a costs, hence $Y - \hat{Y}$ is the cost function of the algorithm. However, the mean square error is commonly used as the cost function:
\[\min \frac{1}{m}\sqrt{\sum (Y-\hat{Y})^{2}}\]

Optimization

This can be done by finding the optimal set of $m$ and $c$ to satisfy the above condition. We could have also taken the derivative $\frac{dY}{dX}$ and set it equals to 0 to find the minima or the maxima. However, this is not possible when the dimensions of $X$ increases, meaning there are $X_1, X_2, X_3, ..., X_n$ to be considered, in the case of Multivariate Regression. Fortunately there are optimization algorithms to help us find the optimal set of $m$ and $c$, the most common being Gradient Descent.

An Example

For the set of data given below (Figure 1), we can fit a line (in red) that generalizes the pattern (Figure 2).

Figure 1

Figure 2

We can calculate the gradient $m$ and intercept $c$ by hand in this case to find the equation of the line that generalizes the pattern of the data. Or we can use Python and Scikit-Learn to do so. Here's the code:

import numpy as np
import pandas as pd
import sklearn
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(x, y)
print('The coefficient, m = {}'.format(lm.coef_))
print('The intercept, c = {}'.format(lm.intercept_))

This has the following outputs: $m = 2.03364$ and $c = 2.30736$

Graphically,

Data and the fitted Linear Regression model.

As I have showned, I first deployed the statistical approach - fitting a straight line to a model, and then the computer science approach - run a code to perform Linear Regression. I could have computed the gradient ($m$) and intercept ($c$) manually, using derivatives and all but coding negate the hassle - all in 3 lines of code.

I hope this give some idea about machine learning and also the merits of learning to code. Data analytics gets complicated when the data becomes massive - with multi-dimensions and examples.

Till then~

~ZF