Skip to content

Scratch Detection Assignment for a student position in the data science team at NI

License

Notifications You must be signed in to change notification settings

amits-ds/scratch_detection_assignment

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scratch Detection Assignment

Scratch Detection Assignment for a student position in the data science team at NI

Before you start working on this assignment, make sure you meet the prerequisites below: Machine Learning Engineer Student Job Description

Main mandatory requirements:

  • Studying for M.Sc/B.Sc in Computer Science or related technical discipline( M.Sc is a big advantage)
  • Remaining studies of at least one and a half years.
  • Availability for 3-4 working days a week.
  • Located in Israel

Introduction:

In the semiconductor industry, "wafers" are thin discs of semiconductor material, such as silicon, used to fabricate microelectronic devices such as transistors, integrated circuits, and other components. A single wafer can contain hundreds or thousands of individual devices, known as "dies", which are typically cut or "diced" from the wafer after the manufacturing process is completed.

You can read more about semiconductor here: Introduction to Semiconductors

Fig.1 - An example of a standard wafer

One of the challenges in manufacturing wafers is to identify and isolate defects, including scratches, which can affect the performance and reliability of the resulting devices.

Scratches are seen as elongated clusters of bad dies that have a high aspect ratio, meaning they are relatively thin and long compared to their width. They can be caused by equipment misalignment or mishandling by humans, and may contain latent defects that can affect the performance of the devices. Scratches may not always be continuous, so sometimes there may be good dies within the scratch. These good dies are often marked for removal in a manual process called "Inked dies"

Fig.2 - A scratch on a wafer - an optical view

In the data that you receive, there may be faulty dies that are part of a scratch, which are labeled as "Scratch" as well as a few good dies that are part of a scratch, which are labeled as "Ink."

Many times, the Scratch Detection process will be done on the logical wafer map and not on a visual image of it.

The data that you received is called "wafer map" as it maps the status of all dies in the wafer.

The dies in the wafers are tested in a large number of stations, operations, and in each operation it is possible to create a map of the dies in this operation by coloring the good dies in a certain color and the faulty dies in another color.

Fig.3 - A logical wafer map in a certain operation. good dies in green and bad dies in red

Did you notice a scratch on this wafer?

Well, with our eyes it is easy to notice the scratch that comes out from the right side in the center of the wafer.

Note, that this scratch is not continuous, meaning, not all the dies which are placed on this scratch are considered faults in this operation. We have to identify all scracthed dies including bad & good. The good dies that are part of the scartch have to be itendified actively in order to be killed. This process is called "inking".

We kill them because we fear that a physical scratch on the silicon wafer is what caused the sequence of these faulty dies, therefore even dies that passed the tests may be of low quality because they were damaged by the scratch on which they are placed.

Fig.4 - A wafer map in a certain operation with scratch detection marks. good dies in grenn, bad dies in red, scratch in blue, ink in yellow

You can read more about the causes of die failures here: Why Chips Die

Assignment description

In this assignment you are receiving wafer maps in a certain operation and the goal is to predict whether a given die belongs to scratch or not.

The data includes information about individual dies from a number of wafers.

The table data includes the following columns:

  • WaferName : The name of the wafer from which the die came.
  • DieX: The horizontal position of the die on the wafer.
  • DieY: The vertical position of the die on the wafer.
  • IsGoodDie: A binary column indicating whether the die is good or not.
  • IsScratchDie: A binary column indicating whether the die belongs to a scratch or not.

Your goal is to use the training data to build a model that can predict, given a certain wafer map, the dies on the map that are parts of a scratch (whether they are bad, 'Scratch' or good, 'Ink').

The purpose of the assignment is mainly to get to reasonable solution that can help the business. Please note that real industry solutions usually achieve lower scores than you may be used from academic problems so even a low metric score on the test set may be considered a success

Business goals:

  • Automation. This process is currently a manual and expensive procedure that takes a lot of time and is prone to errors by the tagger. The goal is to perform this procedure in a faster time and save the costs of the test
  • Quality. increasing the quality of the dies while balancing quality and yield (on the one hand, not to miss scratches, on the other hand not to do too much "Ink")
  • Prediction Level. As explained above, the main goal is to detect individual dies, but sometimes it will help to also get a classification at the wafer level, (binary classification, is there a scratch on this wafer or not?) because there are manufacturers who return scratched wafers to the factory.

Note. In wafers with a low yield (that is, a lot of faulty dies), we will not perform scratch detection because the customer is afraid to find randomly generated scratches there and perform unnecessary ink. In such cases, the customer will make sure to check all the dies strictly in any case, but regardless of the detection of scratches. Therefore, in these cases we will not consider a sequence of bad die to be scratch.

You are free to use any machine learning technique you find appropiate for solving this problem. Make sure choosing the relevamt metrics to test your solutions's performance.

In addition to the training data, you are given a test set, which includes the x and y coordinates and the good/not status of each die, but does not include the scratch/not scratch labels.

You are asked to use your model to predict the scratch/not scratch status of the dies in the test set, and to save the predictions in a CSV file. You should submit your notebook including the experiments you did along the way to improve the model/various methods you tried and including your final model.

Pay attention to the following points:

  • Exploratoration and analyze the data
  • Consideration of business goals
  • Selection of relevant machine learning models
  • Appropriate choice of metrics

Submission

  1. After completing the assignment please review your notebook, making sure it ran properly from start to finish
  2. Create the prediction column for the test set as described in the notebook and save the results to a CVS file
  3. Send an email to one of the following:
  4. After receiving the email with the assignment we will inform you about the next steps

Good Luck!

About

Scratch Detection Assignment for a student position in the data science team at NI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%