Custom Hidden Markov Model Infused Stacked Recurrent Neural Network to forecast Regime-dependent ES/CVaR (for Credit Spreads)

10 min readJun 8, 2024

CVaR and Credit Spread and associated Risks

Conditional Value at Risk (CVaR), also known as Expected Shortfall, is a risk measure that quantifies the expected loss in the worst-case scenarios beyond a given confidence level. Unlike Value at Risk (VaR), which only considers the threshold loss, CVaR accounts for the extent of losses beyond the VaR threshold. It is calculated as the expected value of losses exceeding the VaR threshold, providing a more conservative and coherent measure of risk.

Here is the rolling 1-Year VaR and CVar (using historic method) of a sharpe-optimized portfolio of the following stocks

stocks = ['TGT','BHP','AMD','F']

for the following date range:

start_date = '2021-01-01'
end_date = '2024-01-01'

and heres the CVaR for the same:

I also computed the CVaR at 99% using all the three methods:

A credit spread is the difference in yield between a corporate bond and a risk-free benchmark, such as a government bond. It represents the additional compensation investors demand for taking on the credit risk associated with the corporate bond. Higher credit spreads indicate higher perceived credit risk by the market, as investors require a higher yield to compensate for the increased risk of default or credit deterioration. Conversely, lower credit spreads suggest lower perceived credit risk, reflecting the market’s belief in the creditworthiness of the issuer.

Here is the Credit Spread(Ratio) of Moody’s 10-year AAA yield rates/US Treasure yield from 2004 to 2024:

The CVaR of credit spread refers to the expected loss in the worst-case scenarios for the credit spread of a bond or a portfolio of bonds. More specifically, it estimates the potential increase in credit spread (or the increase in the yield required by investors to compensate for credit risk) under extreme market conditions. This measure is particularly important for risk management and portfolio optimization in the context of credit investments.

Calculating the CVaR of credit spread is crucial for several reasons. It helps financial institutions and portfolio managers quantify and manage their credit risk exposures more effectively. By incorporating the CVaR of credit spread into risk management frameworks, they can better understand and mitigate the potential losses arising from extreme credit events.

Secondly, the CVaR of credit spread can be integrated into portfolio optimization models, allowing investors to construct portfolios that are more resilient to severe credit risk events. By minimizing the CVaR of the portfolio, investors can potentially improve their risk-adjusted returns and better navigate periods of heightened credit market volatility.

Here’s the corresponding 24-day Rolling CVaR at 99%:

In the context of predicting credit spreads, stationarity is a crucial assumption for many traditional time series models. However, financial data often exhibits non-stationarities, such as structural breaks, regime shifts, and volatility clustering. Failing to account for these non-stationarities can lead to poor forecasting performance and unreliable risk estimates.

To address these challenges, I developed a custom Hidden Markov Model (HMM) infused Recurrent Neural Network (RNN) architecture to predict the CVaR of credit spreads for US bonds. This approach combines the ability of HMMs to capture regime shifts with the powerful sequence modeling capabilities of RNNs. See later sections for implementation details.

Regimes and Their Importance

Financial markets often exhibit distinct regimes or states, characterized by different underlying dynamics and statistical properties. These regimes can be driven by various factors, such as economic conditions, market sentiment, or policy changes. Failing to account for these regime shifts can lead to suboptimal forecasting performance and inaccurate risk estimates.

HMMs provide a natural way to model regime shifts by assuming that the observed data is generated by an underlying unobservable Markov process, where the system transitions between different states or regimes over time. By incorporating HMMs into the architecture, my model can capture these regime shifts and adjust its parameters accordingly, leading to improved forecasting accuracy and more reliable risk estimates.

Using the hmmlearn library I trained the model on my data using 4 Regimes

from hmmlearn import hmm
hidden_regimes = 4
model = hmm.GaussianHMM(n_components=hidden_regimes, covariance_type="diag", n_iter=5000)
model.fit(train_spreads)

The corresponding Regimes projected to CVaR

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a type of neural network architecture designed to handle sequential data, such as time series, text, or speech. Unlike traditional feed forward neural networks, RNNs have a recurring connection that allows them to maintain an internal state, enabling them to capture dependencies and patterns in the data over time.

The key components of an RNN are:

Input Layer: Accepts the input data at each time step.
Hidden Layer(s): Maintains the internal state or memory of the network, allowing it to capture long-term dependencies.
(Stacking)
Output Layer: Produces the output at each time step, based on the current input and the hidden state.

RNNs are particularly suitable for time series forecasting tasks, as they can effectively capture the temporal dependencies and patterns present in the data.

For this particular study I devised the following architecture:

his architecture is designed to capture regime shifts and non-stationarities in the input data, which can be particularly useful for modeling financial time series data.

The architecture consists of the following components:

Input Layer: The input layer takes two inputs: x_t and sigma_t-1. x_t represents the current input data point, while sigma_t-1 is the regime probability from the previous time step, obtained from the HMM component.
Input Weights and Biases: The input features are multiplied by the input weights, and the input bias is added to produce the weighted input.
Hidden Layer: The weighted input is then combined with the previous hidden state (from the recurrent connections) and the hidden bias to produce the current hidden state. This hidden state captures the temporal dependencies and patterns in the input data.
Regime-Specific Layer: This layer consists of multiple sub-layers, each corresponding to a specific regime or state. Each sub-layer takes the intermediate input (a combination of the weighted input and the current hidden state) and applies a set of regime-specific weights and biases. This allows the model to learn different dynamics and patterns for each regime.
Selection Layer: The outputs from the regime-specific sub-layers are combined with the transition probabilities obtained from the HMM component. This layer selects the appropriate regime-specific output based on the current transition probabilities.
Output Layer: The selected regime-specific output is then passed through additional layers (potentially with non-linear activations) to produce the final output.
HMM Model: The HMM component is trained separately on historical data to capture the regime dynamics and transitions. It provides the regime probabilities (sigma_t) and transition probabilities at each time step, which are fed into the RNN architecture.

Sure, here are the mathematical expressions for the key components of the custom stacked RNN architecture with the HMM component:

Input Layer:

Input: x_t, sigma_t-1
Weighted Input: x_i_t = x_t @ W_i + b_i

2. Hidden Layer:

Hidden State: h_t = tanh(x_i_t + h_t-1 @ W_h + b_h)

3. Regime-Specific Layer:

Intermediate Input: int_input_r = [x_i_t, h_t]
Regime-Specific Output: out_r_t = tanh(int_input_r @ W_r + b_r)

3. Selection Layer:

Selection Input: sel_input_t = [out_r1_t, out_r2_t, ..., out_rN_t, sigma_t]
Selection Output: sel_out_t = tanh(sel_input_t @ W_sel + b_sel)

4. Output Layer:

Output: y_t = sel_out_t @ W_o + b_o

5. HMM Component:

Regime Probabilities: sigma_t = p(z_t | x_1:t)
Transition Probabilities: p(z_t | z_t-1)

Where:

x_t is the input data at time t
sigma_t is the regime probability vector at time t
h_t is the hidden state vector at time t
W_i, b_i are the input weights and bias
W_h, b_h are the hidden weights and bias
W_r, b_r are the regime-specific weights and biases
W_sel, b_sel are the selection layer weights and bias
W_o, b_o are the output weights and bias
z_t is the hidden regime state at time t
p(z_t | x_1:t) is the posterior probability of the regime given the observed data
p(z_t | z_t-1) is the transition probability between regimes

The HMM component estimates the regime probabilities (sigma_t) and transition probabilities (p(z_t | z_t-1)) based on the observed data and the learned HMM parameters (initial state probabilities, transition probabilities, and emission probabilities).

The regime-specific layer computes outputs for each regime (out_r_t), which are then combined in the selection layer with the regime probabilities (sigma_t) to produce the final selection output (sel_out_t). This selection output is then passed through the output layer to obtain the final prediction (y_t).

The key advantage of this architecture is its ability to capture both regime shifts (through the HMM component) and long-term dependencies (through the RNN component), making it suitable for modeling non-stationary time series data with complex dynamics.

The typical forward pass would be :

        x = inp[i]
        reg = 0
        if i <= 0:
            prev_prev = 1
        else:
            prev_prev = inp[i-1]

        ret=0

        if prev_ans is None:
            reg = 0
        else:
            if prev_prev == 0:
              ret = 10
            else:
              ret = np.log(prev_ans/prev_prev)
              if math.isnan(ret[0]):
                reg = 0
              else:
                reg = model.predict(ret.reshape(-1,1)).flatten()

        x  = np.append(x,reg)
        x_i = (x @ i_weight)
        #print(x_i)
        if prev_hidden is None:
            x_h = x_i
        else:
            x_h = ( x_i + prev_hidden @ h_weight + h_bias).reshape(2)

        x_h = tanh(np.array(x_h))

        prev_hidden = x_h
        #print(x_h)

        intermediate_input_1 = [*x_i,*x_h]
        intermediate_input_2 = [*x_i,*x_h]
        intermediate_input_3 = [*x_i,*x_h]
        intermediate_input_4 = [*x_i,*x_h]

    # print(intermediate_input_1)

        intermediate_output_2 = (intermediate_input_2 @ r2_weight + r2_bias).reshape(1)
        intermediate_output_1 = (intermediate_input_1 @ r1_weight + r1_bias).reshape(1)
        intermediate_output_3 = (intermediate_input_3 @ r3_weight + r3_bias).reshape(1)
        intermediate_output_4 = (intermediate_input_4 @ r4_weight + r4_bias).reshape(1)
        #print(intermediate_output_4)

        activated_int_output_1 = tanh(intermediate_output_1)
        activated_int_output_2 = tanh(intermediate_output_2)
        activated_int_output_3 = tanh(intermediate_output_3)
        activated_int_output_4 = tanh(intermediate_output_4)

        if prev_ans is None:
            trans_prob = np.random.rand(1,4)
            trans_prob = trans_prob/np.sum(trans_prob)

        else:
            if math.isnan(prev_ans[0]):
              trans_prob = np.random.rand(1,4)
              trans_prob = trans_prob/np.sum(trans_prob)
            else:
              trans_prob = model.predict_proba(prev_ans[0].reshape(-1,1))

        trans_prob = np.array(trans_prob,dtype="object")
        selection_input = [*activated_int_output_1,
                    *activated_int_output_2,
                    *activated_int_output_3,
                    *activated_int_output_4,
                    *trans_prob[0]
                    ]

        selection_input = np.array(selection_input)

        selection_output = (selection_input @ s_weight + s_bias).reshape(1)
        activated_sel_output = tanh(selection_output)

        final_output = (activated_sel_output @ o_weight + o_bias).reshape(1)

        prev_ans = final_output


        prev_hidden = x_h

        hiddens[i,] = x_h


        outputs[i] = final_output[0]

Backpropogation for learning

Backpropagation Equations

Loss Gradient:

$∇Lt=∂L∂ot\nabla L_t = \frac{\partial L}{\partial o_t}∇Lt=∂ot∂L$

Output Layer Gradients:

dWoutput=stT∇LtdW_{\text{output}} = s_t^T \nabla L_tdWoutput=stT∇Lt dboutput=∇Ltdb_{\text{output}} = \nabla L_tdboutput=∇Lt

Selection Layer Gradients:

$∇st=∇LtWoutputT⊙tanh′(st)\nabla s_t = \nabla L_t W_{\text{output}}^T \odot \text{tanh}’(s_t)∇st=∇LtWoutputT⊙tanh′(st) dWselect=[rt1,rt2,rt3,rt4]T∇stdW_{\text{select}} = [r_t¹, r_t², r_t³, r_t⁴]^T \nabla s_tdWselect=[rt1,rt2,rt3,rt4]T∇st dbselect=∇stdb_{\text{select}} = \nabla s_tdbselect=∇st$

Regime-Specific Layer Gradients:

$∇rti=∇stWselectT⊙tanh′(rti)\nabla r_t^i = \nabla s_t W_{\text{select}}^T \odot \text{tanh}’(r_t^i)∇rti=∇stWselectT⊙tanh′(rti) dWregimei=htT∇rtidW_{\text{regime}}^i = h_t^T \nabla r_t^idWregimei=htT∇rti dbregimei=∇rtidb_{\text{regime}}^i = \nabla r_t^idbregimei=∇rti$

Hidden Layer Gradients:

$∇ht=∑i=14(∇rtiWregimei)⊙tanh′(ht)\nabla h_t = \sum_{i=1}^{4} (\nabla r_t^i W_{\text{regime}}^i) \odot \text{tanh}’(h_t)∇ht=i=1∑4(∇rtiWregimei)⊙tanh′(ht) dWhidden=σtT∇htdW_{\text{hidden}} = \sigma_t^T \nabla h_tdWhidden=σtT∇ht dbhidden=∇htdb_{\text{hidden}} = \nabla h_tdbhidden=∇ht$

Input Layer Gradients:

∇σt=∇htWhiddenT\nabla \sigma_t = \nabla h_t W_{\text{hidden}}^T∇σt=∇htWhiddenT dWinput=xtT∇σtdW_{\text{input}} = x_t^T \nabla \sigma_tdWinput=xtT∇σt dbinput=∇σtdb_{\text{input}} = \nabla \sigma_tdbinput=∇σt

The complete backward pass:

def backward(inp,  targets, learning_rate, wts):
    i_weight, h_weight, h_bias, r1_weight, r1_bias, r2_weight, r2_bias, r3_weight, r3_bias, r4_weight, r4_bias, s_weight, s_bias, o_weight, o_bias = wts
    d_i_weight = np.zeros_like(i_weight)
    d_h_weight = np.zeros_like(h_weight)
    d_h_bias = np.zeros_like(h_bias)
    d_r1_weight = np.zeros_like(r1_weight)
    d_r1_bias = np.zeros_like(r1_bias)
    d_r2_weight = np.zeros_like(r2_weight)
    d_r2_bias = np.zeros_like(r2_bias)
    d_r3_weight = np.zeros_like(r3_weight)
    d_r3_bias = np.zeros_like(r3_bias)
    d_r4_weight = np.zeros_like(r4_weight)
    d_r4_bias = np.zeros_like(r4_bias)
    d_s_weight = np.zeros_like(s_weight)
    d_s_bias = np.zeros_like(s_bias)
    d_o_weight = np.zeros_like(o_weight)
    d_o_bias = np.zeros_like(o_bias)



    outputs, activated_sel_output,selection_output,selection_input,activated_int_output_1,activated_int_output_2,activated_int_output_3,activated_int_output_4,hiddens,x_i = forward(inp)

    outputs = outputs.reshape(targets.shape)
    loss_grad = grad_mse(targets,outputs)
    print(loss_grad)
    next_hidden = None

    d_final_output = np.zeros_like(outputs)
    d_hiddens = np.zeros_like(hiddens)

    for t in reversed(range(len(inp))):

        l_grad = loss_grad[t].reshape(1,1)

        d_o_weight += activated_sel_output[:,np.newaxis] @ l_grad
        d_o_bias += np.mean(l_grad)

        o_grad = l_grad @ o_weight.T

        ac_s_grad = np.multiply(o_grad,tanh_derivative(selection_output))

        d_s_weight += selection_input[:,np.newaxis] @ ac_s_grad
        d_s_bias += np.mean(ac_s_grad)

        s_grad = o_grad @ s_weight.T

        d_r1_weight += activated_int_output_1[:,np.newaxis] @ s_grad[0][:1]
        d_r1_bias += np.mean(s_grad[0][:1])

        d_r2_weight += activated_int_output_2[:,np.newaxis] @ s_grad[0][1:2]
        d_r2_bias += np.mean(s_grad[0][1:2])

        d_r3_weight += activated_int_output_3[:,np.newaxis] @ s_grad[0][2:3]
        d_r3_bias += np.mean(s_grad[0][2:3])

        d_r4_weight += activated_int_output_4[:,np.newaxis] @ s_grad[0][3:4]
        d_r4_bias += np.mean(s_grad[0][3:4])

        r1_grad = s_grad[0][0:1] @ r1_weight.T
        r2_grad = s_grad[0][1:2] @ r2_weight.T
        r3_grad = s_grad[0][2:3] @ r3_weight.T
        r4_grad = s_grad[0][3:4] @ r4_weight.T

        r_grad = r1_grad+r2_grad+r3_grad+r4_grad

        if next_hidden is None:
            h_grad = r_grad[2:4]
        else:
            # print(f"r_grad: {t} : {r_grad[2:4]}")
            # print(f"next_hidden: {t} :{next_hidden}")
            # print(f"h_weight : {t} : {h_weight.T}")
            h_grad = r_grad[2:4] + next_hidden @ h_weight.T

        h_grad = np.multiply(h_grad,tanh_derivative(hiddens[t,:][np.newaxis,:]))
        next_hidden = h_grad

        if t>0:
            d_h_weight += hiddens[t-1,:][:,np.newaxis] @ h_grad
            d_h_bias += np.mean(h_grad)

        d_i_weight += x_i.T @ r_grad[0:2]


    i_weight -= learning_rate * d_i_weight
    h_weight -= learning_rate * d_h_weight
    h_bias -= learning_rate * d_h_bias
    r1_weight -= learning_rate * d_r1_weight
    r1_bias -= learning_rate * d_r1_bias
    r2_weight -= learning_rate * d_r2_weight
    r2_bias -= learning_rate * d_r2_bias
    r3_weight -= learning_rate * d_r3_weight
    r3_bias -= learning_rate * d_r3_bias
    r4_weight -= learning_rate * d_r4_weight
    r4_bias -= learning_rate * d_r4_bias
    s_weight -= learning_rate * d_s_weight
    s_bias -= learning_rate * d_s_bias
    o_weight -= learning_rate * d_o_weight
    o_bias -= learning_rate * d_o_bias

return i_weight, h_weight, h_bias, r1_weight, r1_bias, r2_weight, r2_bias, r3_weight, r3_bias, r4_weight, r4_bias, s_weight, s_bias, o_weight, o_bias

Significance of Volatility Forecasts Using This Approach

Accurate volatility forecasts are crucial for various applications in finance, including risk management, portfolio optimization, and derivative pricing. By incorporating regimes and leveraging the power of RNNs, my custom HMM-infused RNN architecture can provide more reliable volatility forecasts for credit spreads, potentially leading to better risk management practices and more informed investment decisions.

In the context of credit risk management, accurate volatility forecasts can improve the estimation of CVaR and other risk measures, enabling financial institutions to better quantify and manage their credit risk exposures. This can lead to more efficient capital allocation, improved risk-adjusted returns, and better compliance with regulatory requirements.

Furthermore, reliable volatility forecasts can aid in the pricing and hedging of credit derivatives, such as credit default swaps (CDS) and collateralized debt obligations (CDOs). These derivatives are heavily dependent on the volatility of credit spreads, and accurate volatility estimates can lead to more precise pricing and effective hedging strategies.

Overall, by combining the strengths of HMMs and RNNs, my custom implementation offers a powerful approach for modeling and forecasting the volatility of credit spreads, with potential applications in various areas of finance and risk management.