GAM Fraud Analysis

Data Science Capstone Project

Published

Invalid Date

Slides

Slides: slides.html ( our slides html link will go here)

Introduction

The introduction should:

Develop a storyline that captures attention and maintains interest.
Your audience is your peers
Clearly state the problem or question you’re addressing.

Introduce why it is relevant needs.
Provide an overview of your approach.

Example of writing including citing references:

This is an introduction to ….. regression, which is a non-parametric estimator that estimates the conditional expectation of two variables which is random. The goal of a kernel regression is to discover the non-linear relationship between two random variables. To discover the non-linear relationship, kernel estimator or kernel smoothing is the main method to estimate the curve for non-parametric statistics. In kernel estimator, weight function is known as kernel function [@efr2008]. Cite this paper [@bro2014principal]. The GEE [@wang2014]. The PCA [@daffertshofer2004pca]. Topology can be used in machine learning [@adams2021topology]

For Symbolic Regression [@wang2019symbolic] This is my work and I want to add more work…

Cite new paper [@su2012linear] Miller (2025) investigates GAMs for identifying fraudulent financial statements, often hidden in complex accounting data. GAMs, combined with models like random forests, detect irregular revenue patterns and generate interpretable visualizations for auditors. Although effective, GAMs may miss sophisticated frauds involving multiple interacting factors. They provide a strong balance of accuracy and clarity for early detection of financial fraud. ## Methods

Detail the models or algorithms used.
Justify your choices based on the problem and data.

The common non-parametric regression model is \(Y_i = m(X_i) + \varepsilon_i\), where \(Y_i\) can be defined as the sum of the regression function value \(m(x)\) for \(X_i\). Here \(m(x)\) is unknown and \(\varepsilon_i\) some errors. With the help of this definition, we can create the estimation for local averaging i.e. \(m(x)\) can be estimated with the product of \(Y_i\) average and \(X_i\) is near to \(x\). In other words, this means that we are discovering the line through the data points with the help of surrounding data points. The estimation formula is printed below [@R-base]:

\[ M_n(x) = \sum_{i=1}^{n} W_n (X_i) Y_i \tag{1} \]\(W_n(x)\) is the sum of weights that belongs to all real numbers. Weights are positive numbers and small if \(X_i\) is far from \(x\).

Another equation:

\[ y_i = \beta_0 + \beta_1 X_1 +\varepsilon_i \]

Analysis and Results

Data Exploration and Visualization

Describe your data sources and collection process.
Present initial findings and insights through visualizations.
Highlight unexpected patterns or anomalies.

A study was conducted to determine how…

Modeling and Results

Explain your data pre-processing and cleaning steps.
Present your key findings in a clear and concise manner.
Use visuals to support your claims.
Tell a story about what the data reveals.