19 January 2018

[Investing] Colex (567) Vs 800 (5TG)


I have been collating and filtering stocks data and Colex Holdings Ltd (567) has been appearing in the top 10 stocks for me to KIV. I have noted the stock for and been watching it appreciate from 30-ish Singapore cents to 40-ish cents. Pain, I know.

Recently a friend discussed with me about the stock again, and he was also recommended 800 Super Holdings Ltd (5TG), which is  also a similar company.

First, why Colex ((567) caught my eye:
Profile:
------
Stock Symbol = 567
Price = 0.481
Dividends Yield = 0.021
Price/Sales = 0.800
Price/BV = 1.500
Price/Cash Flow = 5.590
 
Profitability
------
Earnings per Share (EPS): 0.048
Net Profit Margin % : 7.960
Return on Equity % = 15.890
Return on Capital % = 12.640
Return on Assets % = 12.640
Inventory Turnover % = nan
Assets Turnover % = 1.590
 
Financial Strength
------
Book Value = 0.280
Current Ratio = 4.050
Quick Ratio = 3.900
Interest Coverage = 305.870
Debt-to-Equity Ratio = nan
The data is collated from [1] using a Python script I wrote. What drew my attention about Colex is its efficiency, namely that its Return on Capital (or Assets) is 12.6%, that is, for every dollar of capital, Colex is able to bring 12.6% of earnings. It also has a very strong current ratio of 4X! Oh, and that interest coverage too! Not shown here is that the P/E of Colex is 16x.

Next, why 800 (5TG) is a potential alternative:
Profile:
------
Stock Symbol = 5TG
Price = 1.207
Dividends Yield = 0.025
Price/Sales = 1.310
Price/BV = 2.400
Price/Cash Flow = 6.160
 
Profitability
------
Earnings per Share (EPS): 0.096
Net Profit Margin % : 10.800
Return on Equity % = 21.170
Return on Capital % = 11.120
Return on Assets % = 11.120
Inventory Turnover % = 31.630
Assets Turnover % = 1.030
 
Financial Strength
------
Book Value = 0.480
Current Ratio = 1.490
Quick Ratio = 1.420
Interest Coverage = 27.940
Debt-to-Equity Ratio = 0.470

Indeed, 800 (5TG) seems to have a better valuation. Its Return on Equity is much higher, its Return on Capital (or Asset) is almost on-par with Colex's. Financial strength-wise it is not as impressive compare to Colex. 800 (5TG) has a P/E of about 12X. Based on this metric, it seems like it is somewhat under-valued.

Just by comparing the data I just shown, it is difficult to decide which to buy. At 11-12% Return on Capital it is equally enticing at any price. I have to dwell deeper into the financials. I did not refer to the respective Annual Reports, but rely on [2]. Again, I will be doing the Profitability, Financial Strength and Cash Flow analysis, quick and hopefully not too dirty.


Profitability

Comparing the Net Income for the past 5 years, Colex has a increasing net income, whereas 800 seems to have plateau or at least not showing a solid consistency.


Financial Strength

In terms of cash stash, 800 has more (25M Vs 14M). 800 also has a comparatively larger amount of receivables. I learnt that receivables, although is in the Asset account, can be a potential write-off risk. Total Assets for 800 and Colex are 159M and 44M respectively.

However, if we compare liabilities, based on the data [2], Colex has close to 9M, whereas 800 has close to 77M.

Both companies' retained earnings are growing.


Cashflow Analysis

In this analysis I took the Operating Cashflow and added back depreciation and minus of CAPEX. Then I divide by the outstanding number of stock. This yielded 0.62 for 800 and 0.74 for Colex. Stock to stock, Colex stock can generate more cash. Colex stock is at 40cents but each stock can churn 74 cents of cash! (OK this part I need to review the calculations, again).


Conclusion

When I look for stock, I look for efficiencies based on ratios such as Returns on Capital and the Free Cashflow generated per share. At this point, Colex at approximately 12% Returns on Capital and a better cashflow number makes a better choice between the two.

Please feel free to comment!

*** Note: This is my personal analysis and is not a recommendation for a stock buy or a stock tip. 

~ZF


References:
[1] MSN Money
[2] Yahoo! Finance

11 January 2018

[Investing] Comparing Dollar-Cost Averaging and Buy-Low-Sell-High (B.Lo.S.Hi) Strategies

In my previous post, I applied dollar-cost averaging (DCA) on Nikko AM STI ETF (G3B) on a 5-year time-frame.

Then a friend showed me an article about the fallacy of dollar-cost averaging.

Indeed, I mentioned in my previous post that dollar-cost averaging depends much on the point where the prices are; if one started low, the accumulation will be fast.

But the result of averaging is not average. At times when prices are low, units accumulation takes place. When prices are high, the additional units also contribute to the total value. Units are the one generating dividends too. In good times (prices are high), dollar-cost averaging suppresses costs. In bad times when prices are low, dollar-cost averaging helps consolidate units.

Dollar-cost averaging is a lazy-man style of investing in auto-pilot. It makes sense if investing costs are low. Obviously, it cannot do better than the Buy-Low-Sell-High (BLoShi) strategy, as I will show later. But the DCA has its merits over the BLoShi too, which I will also share my thoughts later.


DCA Vs BLoSHi. Fight!

I have extracted the maximum data (month) from Yahoo! Finance and coded up the following charts using Python. The hypothetical investing strategy is that I am investing $100 per month since 2009, when G3B debutted into the market.



As shown in the above chart, the average costs increases over time, but the rate of increase slows as time advances because of the large amount of units accumuated (not shown here). The points that are marked with a red 'X' are points when the actual price that is below the average price. These will be points in consideration for the BLoSHi approach.


The BLoShi Approach

In the BLoShi approach, funds are accumulated since the start and maximum units will be bought with all the funds accumulated when the price is below the average price of the DCA method. As shown, there are 5 points that mark this criteria. I shall assume that I enter at only two of these - the first (31-Aug-2011) and the last (31-Jan-2016).

The expected result is that this approach would definitely fare better than the DCA. As we shall analyse in the next chart.


DCA Vs BLoSHi. Performance

The following chart shows compares the value of the units held for the whole period in consideration.



The relevant numbers are:
With DCA,
Total Amount invested = $10,800 
Total Units Bought = 3839 
Total Value = $13,669 
Total Yield = 26.565% 

BLoSHi (Without DCA)
Total Amount invested = $8,400 
Total Units Bought = 3280 
Total Value = $11,678 
Total Yield = 39.027%
Due to the perpertual nature of the DCA, the amount accumulated in DCA is definitely more than the approach without - 600 more units as shown. The costs and holding value will also be higher.

Note that the analysis does not include the dividend yield.


A Simple Principle in Investing

"Buying Low and Selling High" is one of the principles of investing. "Buy a dollar for 50-cents". How low is low and how high is high? There are many metrics to determine that. For the lows, some use the 52-week low to gauge, the same is true for the highs. This form some-sort of a price resistance (I am not a trading pro BTW). Some use moving average over a certain period - if the price is below that MA, buy; if the price is above that MA, sell! Simple enough. Of course there is also the almost-mythical intrinsic value.

We may determine the intrinsic value, but we are able to make a best guess of it. Personally, I would say that numbers that are backed by fundamentals. i.e. there is a basis, is probably the more reliable one, because I cannot find satisfactory answers to question like:"How do we know if the price is going to break the 52-week high (or low)?", "How do we know if the prices will go lower even after falling below the MA?". To the latter question, one way is to keep buying, of course.


The Problem with BLoSHi

BLoSHi seems like a very sound investment strategy, I just wait out for the price to be low enough to buy and sell when the price is high enough. However, the investment strategy is oversimplified. Let's look at the statement again, this time I will italics the key words:
I just wait out for the price to be low enough to buy and sell when the price is high enough.
In essence, there is uncertainty in the price movement and the time. Firstly, I need to determine the price - how low/high is adequately low/high to buy/sell? In the above analysis, it is retrospective. It is easy to pin-point prices that are lower than a certain value (an yearly average, for example) on a historical price chart like the above. Again we can use various metrics (discussed above) to determine the price to buy with some confidence.

Next, when prices are live, and when it shows that the price is trending down after a peak, not many of us will be able to overcome the psychological barrier (fear, that is) or the discipline to buy even if it is near the low that we are waiting for. That is because nobody is certain where the price is going - is it going to bottom out? What if it crash?

Finally, holding out cash to wait out for the price to reach buying-low is not efficient at all. There are much opportunity costs to this approach. In the example above, it is more than 5 years between the two purchases (Aug 2011 and January 2016). Consider the amount that could have been paid out. Assuming that the price remains stable at $2.50, a monthly contribution of $100 will allow me to buy 40 units. In a year I would have accumulated 480 units, and 2400 units in 5 years. If each unit pays out 2% or 0.05cents dividends every year, I would have missed out on $360  of dividends in total, that is more than 3 months' worth of contribution.

With 2% dividend, the yields presented above will be about 29% and 39% for the DCA and Non-DCA cases respectively. As mentioned, the BLoSHi has forgone the compounding effect of any dividends.


Conclusion

Dollar-cost averaging is not a perfect investment strategy. Honestly, in my opinion, it cannot beat the market per se, but it helps to suppress costs when prices are high and accumulate units when prices are low. Here 'highs' and 'lows' are relative to the average price. It also helps one stay invested without the psyhchological barrier when prices dip, and it optimizes the effect of compounding. It also helps negate the need for large capital outlay. This way, we still can get back a fair share from the stock market even though the outcome may not be as stellar as the BLoSHi approach, which I shared that there are many barriers to apply efficiently. However, this is all in the pretext that the investing costs are low.

Perhaps, to optimize the results, one way is to increase the contribution when the prices are low. Maybe I shall leave it for another time.

Do feel free to air your comment!

~ZF







07 January 2018

[Data Science] Machine Learning - Part 3, Logistic Regression

In a Classification task, the learning algorithm allocates a number 1 if an instance of the object is to a class (True), and 0 otherwise. When predicting, the outcome will be a number between 1 and 0, which can be interpreted as the probability of the object belonging to a certain class.

The common algorithm to do so is the Logistic Regression. The hypothesis function of algorithm is
\[\hat{y} = \frac{1}{1 + e^{-Z}}\]

known as the sigmoid or logistic function, or the S-curve. $Z$ can also be any function with respect to the features and the corresponding weights, for example, $Z = w_1X_1 +w_2 X_2 + w_3X_3$. In the learning algorithm, the weights, $w_1, w_2, w_3$ are determined through optimization, such as Gradient Descent, of the cost function.

The output of the sigmoid function will always be between 0 and 1, as the chart below shows for $Y = \frac{1}{1 + e^{-Z}}$

I shall apply Logistic Regression on the Iris data set. We can call out the dataset in Scikit-Learn using this code:
from sklearn import datasets 
iris = datasets.load_iris()
The data is a dictionary, so we can see the keys using iris.keys(). There are three species of the flower with 50 instances each. Hence there are 150 instances of data.

Next, the feature and label data, X and y respectively.
X = iris['data'][:, 3:] #petal width 
y = (iris['target']==2).astype(np.int) #1 if Iris-Virginica
As shown, only the petal width will be used to identify if the species is Iris-Virginica in this example.

It is always a good idea to split the data into training and test set randomly. I will split 30% the data as test data (that is 45 instances). This can be done easily using Scikit-Learn:
from sklearn.model_selection import train_test_split 
X_train, X_test = train_test_split(X, test_size = 0.3, random_state = 123) 
y_train, y_test = train_test_split(y, test_size = 0.3, random_state = 123)
 Now, we can apply Logistic Regression on the data!
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
The model's attributes such as the intercept and coefficient can be called out using model.intercept_ and model.coef_ respectively. Note the underscore '_'!
Model Coefficient = [[ 2.17900148]] 
Model Intercept = [-3.67603174]
Hence the model is:
\[\hat{y} = \frac{1}{1+e^{2.179X -3.3676}}\]

We can use the model to make predictions.
pred = model.predict(X_test)

To determine how the model fare, we can use a Confusion Matrix to see the number of right classifications. 
from sklearn.metrics import confusion_matrix 
print(confusion_matrix(y_test, pred))
The confusion matrix will come in a form:

[[ TN, FP]
 [FN, TP] ]

The result will be:

[[28  0]
 [ 2 15]]

Hence,

TP = 15
TN = 28
FP = 0
FN = 2

The rows of the matrix are the True classes; the columns are the Predicted classes. Hence there are 15 + 28 instances where the model classfied correctly (15 True predicted as True, or also known as the True Positive; 28 Negative as Negative, or True Negative).

From the confusion matrix, the True Positive Rate, or Precision, the proportion of positives predicted by the model that are truly positive, can be determined.
\[precision = \frac{TP}{TP + FP}\]

Another useful metrics is the proportion of positive instances that are correctly predicted by the model. This is known as the Recall.
\[recall = \frac{TP}{TP + FN}\]

The precision and recall can be computed using the following code:
from sklearn.metrics import precision_score, recall_score 
print('Precision Score = {:0.3f}'.format(precision_score(y_test, pred))) 
print('Recall Score = {:0.3f}'.format(recall_score(y_test, pred)))

The result:

Precision Score = 1.000 (15/15+0)
Recall Score = 0.882 (15/15+2)

If we were to calculate the accuracy, the instances that were predicted correctly, it will be 33% (15/15+28+2). This is why accuracy is never used in classification tasks.

Here's the graphical representation of the things were carried out.

This marks the end of the Logistic Regression example. The logistic regression learning algorithm is one of the important learning algorithm. It is also frequently used as a building block to understand Artificial Neural Networks, or Neural Nets for short, which I hope to touch on soon.

I hope this has been useful in providing a bigger picture to the Logistic Regression learning algorithm. I have deliberately left out the mathematics behind the learning algortihm because there are many resources that can do a better job than I. By the way, the reference quoted below is awesome!


~ZF



References:
[1] Hands-On Machine Learning with Scikit-Learn and Tensor Flow, Aurelien Geron, O'reilly

01 January 2018

[Data Science] Machine Learning - Part 2, Essentials of Machine Learning

In my previous post, I briefly described what machine learning is. I shall attempt to dwell into more (but enough) details in this posts.

As mentioned, machine learning is about deploying algorithms on a computer to apply statistical methods in data analytics. An algorithm is a set of instructions to perform a specific tasks, so a machine learning algorithm is a set of instruction for the computer/programme to learn patterns from a set of given data.

Types of Machine Learning Algorithms

There are many machine learning algorithms. The common ones are:

  1. Linear Regression
  2. Logistic Regression
  3. Nearest Neighbours
  4. Support Vector Machines
  5. Decision Trees (and Random Forests)
  6. Neural Networks
  7. Clustering
  8. Dimensionality Reduction (e.g. Principle Component Analysis, or PCA)

Supervised and Unsupervised Learning Algorithms

Machine learning algorithms are categorized as supervised learning or unsupervised learning - basically it is data with a known outcome, or labels, for the former (supervised learning) and data without labels for the latter (unsupervised learning).

The outcome of supervised learning is to find a general pattern of the data that validates the labels. Items 1-6 in the list above are supervised learning algorithms.

The outcome of unsupervised learning is to find and group general characteristics within the data. Items 7-8 are unsupervised learning algorithms.


Data Types and Coresponding Objectives

Data can be quantitative (numbers, or measurables on a standard scale), or qualitative (description, or non-measureables because not on a standard scale). Qualitative data is also known as categorical data.

Quantitative data is generally used for projection. For example, what would be next year's sales of a shopping mall be like, given the number of visitors to the shopping mall this year?

Qualitative data is generally used for classification, which can be broken down into two tasks: object verification (if a cat is a cat), and object identification (if an object is a cat).

In practice, the two types of data co-exists. It is unlikely to have a pure data of either form. Thus extra care has to be made to handle data because categorical data is usually assigned a number to represent groups or level, and usually this might give algorithms a false sense of scale if adequate consideration is not taken.


Hypothesis Function

Each machine learning algorithm has an assumption about the data, perhaps except neural networks. This assumptions are normally described as a function, called the hypothesis function. For example, the linear regression assumes linear relationship between the data and the label that we are interested in.

For example, the hypothesis function for linear equation is:
\[\hat{Y} = mX + c\]

By convention, $\hat{Y}$ is the approximation (to the real label, $Y$). It is related to the input data $X$ by $m$ and $c$, the parameters of the model; that is, a different set of $m$ and $c$ yields different value of $\hat{Y}$.

Cost Function

The desired $\hat{Y}$ is one that it is closest to $Y$. This means that $Y - \hat{Y}$ is minimum. Any deviation of $\hat{Y}$ from $Y$ is akin to a costs, hence $Y - \hat{Y}$ is the cost function of the algorithm. However, the mean square error is commonly used as the cost function:
\[\min \frac{1}{m}\sqrt{\sum (Y-\hat{Y})^{2}}\]


Optimization

This can be done by finding the optimal set of $m$ and $c$ to satisfy the above condition. We could have also taken the derivative $\frac{dY}{dX}$  and set it equals to 0 to find the minima or the maxima. However, this is not possible when the dimensions of $X$ increases, meaning there are $X_1, X_2, X_3, ..., X_n$ to be considered, in the case of Multivariate Regression. Fortunately there are optimization algorithms to help us find the optimal set of $m$ and $c$, the most common being Gradient Descent.


An Example

For the set of data given below (Figure 1), we can fit a line (in red) that generalizes the pattern (Figure 2).
Figure 1

Figure 2

We can calculate the gradient $m$ and intercept $c$ by hand in this case to find the equation of the line that generalizes the pattern of the data. Or we can use Python and Scikit-Learn to do so. Here's the code:

import numpy as np
import pandas as pd
import sklearn
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(x, y)
print('The coefficient, m = {}'.format(lm.coef_))
print('The intercept, c = {}'.format(lm.intercept_))

This has the following outputs: $m = 2.03364$ and $c = 2.30736$

Graphically,
Data and the fitted Linear Regression model.


As I have showned, I first deployed the statistical approach - fitting a straight line to a model, and then the computer science approach - run a code to perform Linear Regression. I could have computed the gradient ($m$) and intercept ($c$) manually, using derivatives and all but coding negate the hassle - all in 3 lines of code.

I hope this give some idea about machine learning and also the merits of learning to code. Data analytics gets complicated when the data becomes massive - with multi-dimensions and examples.

Till then~

~ZF