Order Trace Algorithm

class modules.order_trace.src.alg.OrderTraceAlg(data, poly_degree=None, expected_traces=None, orders_ccd=-1, do_post=False, config=None, logger=None)[source]

Order trace extraction.

This module defines class ‘OrderTraceAlg’ and methods to extract order trace from 2D spectral fits image. The extraction steps include

  • locate clusters: smooth the image and convert image pixels to be either black or white (‘1’ or ‘0’).

  • form clusters: find cluster units (each unit containing connected pixels with value ‘1’).

  • clean the clusters: remove noisy clusters, trim noise from the clusters, split the clusters and clean the clusters along the top and bottom borders.

  • merge clusters: merge broken clusters to form order trace based on the closeness and polynomial curve fitting.

  • model order trace: approximate each order trace by using least square polynomial fit.

  • find top and bottom widths: compute the top and bottom widths along the order trace by using normal distribution to model the distribution of the spectral data along the order trace and approximate the top and bottom widths based on the magnitude of standard deviation from the mean. If the width is unresolved by the use of normal distribution, it is either assigned by a default number or further estimated based on widths of the surrounding orders.

Parameters:
  • data (numpy.ndarray) – 2D spectral data.

  • poly_degree (int) – Order of polynomial for order trace fitting.

  • config (configparser.ConfigParser) – config context.

  • logger (logging.Logger) – Instance of logging.Logger.

instrument

Imaging instrument.

Type:

str

flat_data

Numpy array storing 2d image data.

Type:

numpy.ndarray

config_ins

Related to ‘PARAM’ section or section associated with the instrument if it is defined in the config file.

Type:

ConfigHandler

data_range

Range of data to be traced, [<y_start>, <y_end>, <x_start>, <x_end>].

Type:

list

original_size

Original size of the flat data, [<y_size>, <x_size>].

Type:

list

poly_degree

Order of polynomial for order trace fitting.

Type:

int

orders_ccd

Total orders of the ccd. Defaults to -1.

Type:

number, options

do_post

do post process to refine the upper/lower edges. Defaults to False.

Type:

bool, options

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for data or config.

  • Exception – If the size of data is less than 20 pixels by 20 pixels.

advanced_cluster_cleaning_handler(index: ndarray, x: ndarray, y: ndarray, start_cluster: int = None, stop_cluster: int = None)[source]

Remove or clean noisy clusters.

This removal process uses polynomial fitting on all or selected clusters formed by form_clusters().

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates on cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates on cluster pixels.

  • start_cluster (int, optional) – Cluster id of the first cluster to process. Defaults to None.

  • stop_cluster (int, optional) – Cluster id of the last cluster to process. Defaults to None.

Returns:

cleaning status on clusters:

  • index_p (numpy.ndarray): Array of cluster id on cluster pixels after cleaning.

  • all_status (dict): Cleaning status on processed clusters, like:

    {
       <cluster_id_i> int: <cleaning status> dict,
                # <cluster_id_i> is cluster id of i-th cluster.
                # <cleaning status> is cleaning status for the cluster
                # See Returns in handle_noisy_cluster()
       :
    }
    

Return type:

tuple

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for x, y or index.

  • Exception – If the size of x, y, or index are not the same.

approximate_width_of_default(cluster_widths: list, cluster_points: ndarray, cluster_coeffs: ndarray, poly_fit_power: int = 2)[source]

Approximate unresolved width using least square polynomial fit to determined widths.

Parameters:
  • cluster_widths (list) – Top and bottom widths of all clusters, like [{‘top_edge’: <number>, ‘bottom_edge’: <number>},…].

  • cluster_points (numpy.ndarray) – Arrays contains cluster points (y values) along the trace based on the polynomial fitting. Each row includes y values along x axis of one cluster.

  • cluster_coeffs (numpy.ndarray) – Coefficients of Polynomial fit and area of all order traces.

  • poly_fit_power (int, optional) – Degree of polynomial fit for width estimation, degree 2 or 3 is suggested. Defaults to 2.

Returns:

top and bottom widths of all clusters after using polynomial approximation, like:

[
    {
        'top_edge': float,     # top width of first cluster,
        'bottom_edge': float   # bottom width of first cluster
    },
    :
    {
        'top_edge': float,     # top width of last cluster,
        'bottom_edge': float   # bottom width of last cluster
    }
]

Return type:

list

clean_clusters_on_border(x: ndarray, y: ndarray, index: ndarray, border_y: int)[source]

Clean clusters crossing the top or bottom boundary based on the given border position along y axis.

Parameters:
  • x (array) – Array of x coordinates of cluster pixels.

  • y (array) – Array of y coordinates of cluster pixels.

  • index (array) – Array of cluster id on cluster pixels.

  • border_y (int) – The vertical position (y coordinate) of the border to check.

Returns:

Cluster pixels after cleaning:

  • (numpy.ndarray): Array of x coordinates of cluster pixels after cleaning.

  • (numpy.ndarray): Array of y coordinates of cluster pixels after cleaning.

  • (numpy.ndarray): Array of cluster id on cluster pixels after cleaning.

Return type:

tuple

clean_clusters_on_borders(x: ndarray, y: ndarray, index: ndarray, top_border: int = None, bottom_border: int = None)[source]

Clean clusters crossing the top and bottom boundaries of the image.

Parameters:
  • x (array) – Array of x coordinates of cluster pixels.

  • y (array) – Array of y coordinates of cluster pixels.

  • index (array) – Array of cluster id on cluster pixels.

  • top_border (int, optional) – Top border vertical position (along y axis). Defaults to None.

  • bottom_border (int, optional) – Bottom border vertical position (along y axis). Defaults to None.

Returns:

Cluster pixels after cleaning:

  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels after cleaning.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels after cleaning.

  • new_index (numpy.ndarray): Array of cluster id on cluster pixels after cleaning.

Return type:

tuple

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for x, y or index.

  • Exception – If the size of x, y, or index are not the same.

collect_clusters(c_x: ndarray, c_y: ndarray)[source]

Identify cluster units per positions of cluster pixels.

The cluster units are identified by checking into the set of cluster pixels and there is no pixels connected among the resultant cluster units.

Parameters:
  • c_x (numpy.ndarray) – Array of x coordinates for cluster pixels.

  • c_y (numpy.ndarray) – Array of y coordinates for cluster pixels.

Returns:

identified cluster units from the image, like:

{
    <y_1>: <clusters_1> list,
    <y_2>: <clusters_2> list,...,
    <y_n>: <clusters_n> list
},
'''
where
    <y_n> is vertical location (value along y axis)
    <clusters_n> is list of cluster units ending at <y_n>, like:
        [ cluster_1, cluster_2, ..., cluster_n],
        where cluster_i (dict) contains area of the cluster and horizontal
        segments it covers, like:
             {
                 'x1': int,  # left of the cluster.
                 'x2': int,  # right of the cluster.
                 'y1': int,  # top of the cluster.
                 'y2': int,  # bottom of the cluster.
                 <y_i_1>: <segments_1> dict, ..., <y_i_n>: <segments_n> dict
             }
            where
                 <y_i_t> is one of y location ranging from cluster_i['y1'] to
                 cluster_i['y2'].
                 <segments_i> contains horizontal segments at <y_i_t> like:
                    {
                        'segments': [[x_0, x_1], [x_2, x_3], ....[x_i, x_i+1]]
                    }
                    where x_i and x_i+1 means the starting and ending index for
                    array c_x.
ex: clusters units end at y = 10 and y = 11,
    {
        10: [
                {
                    'x1': 20, 'x2': 30, 'y1': 9,  'y2': 10,
                    9:{'segments': [[4, 8], [12, 13]]},
                    10:{'segments': [[100, 107], [109, 118]]}
                },
                {
                     'x1': 50, 'x2': 77, 'y1': 7, 'y2': 10,
                     7:{'segments': [...]},
                     8:{'segments': [....]},
                     9:{'segments': [....]},
                     10:{'segments: [....]}
                 }
             ],
         11: [
                 {<cluster unit ends at y = 11>},
                 {<cluster unit ends at y = 11>}...
             ]
     }
'''

Return type:

dict

static common_member(a: list, b: list)[source]

Find if there is common elements of two lists.

Parameters:
  • a (list) – First list.

  • b (list) – Second list.

Returns:

True if there is common element, or False.

Return type:

bool

cross_other_cluster(polys: ndarray, cluster_nos_for_polys: ndarray, cluster_nos: ndarray, x: ndarray, y: ndarray, index: ndarray, power: int, merged_coeffs: ndarray)[source]

Detect if there is another cluster that will prevent the merge of two given clusters.

Parameters:
  • polys (numpy.ndarray) – Array contains coefficients of polynomial fit to the clusters and the area of the clusters. Each row contains the coefficients and the area for one cluster.

  • cluster_nos_for_polys (numpy.ndarray) – The map between the polys and cluster no. Value of cluster_nos_for_polys points to the row index for polys.

  • cluster_nos (numpy.ndarray) – Array containing the cluster id of two clustered to have the merge test.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • power (int) – Degree of the polynomial to fit the cluster.

  • merged_coeffs (numpy.ndarray) – The coefficients of polynomial fit to and the area of the clusters in case the two clusters of cluster_nos are merged.

Returns:

The merge is blocked by other cluster if True, or the merge is safe if False.

Return type:

bool

curve_fitting_on_all_clusters(index: ndarray, x: ndarray, y: ndarray)[source]

Do polynomial fitting on cluster pixels for all clusters.

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates on cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates on cluster pixels.

Returns:

Coefficients and errors from polynomial fit:

  • poly_all (numpy.ndarray): Array contains coefficients of polynomial fit and the area of all clusters. Each row contains the coefficients and the area for one cluster. Please see Returns in curve_fitting_on_one_cluster() for the detail of each row.

  • errors (numpy.ndarray): Array contains least square errors of polynomial fit to all clusters.

Return type:

tuple

static curve_fitting_on_one_cluster(cluster_no: int, index: ndarray, x: ndarray, y: ndarray, power: int, poly_info: ndarray = None)[source]

Finding polynomial to fit the cluster pixels.

Parameters:
  • cluster_no (int) – cluster id

  • index (numpy.ndarray) – Array of cluster id of cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • power (int) – Degree of fitting polynomial.

  • poly_info (numpy.ndarray, optional) – Array contains the coefficients of polynomial fit and the area of the cluster. Defaults to None.

Returns:

Coefficients and errors from polynomial fit:

  • poly_info (numpy.ndarray): Array contains coefficients of fitting polynomial from higher degree to the lower and the area enclosing the cluster, minimum x, maximum x, minimum y and maximum y.

  • error (float): Polynomial fitting error.

  • area (list): Cluster area, like [min_x, max_x, min_y, max_y].

Return type:

tuple

static distance_between_clusters(cluster_nos: ndarray, x: ndarray, y: ndarray, index: ndarray)[source]

Find the horizontal and vertical distance between the clusters, the first cluster has smaller x position.

Parameters:
  • cluster_nos (numpy.ndarray) – Array contains the cluster id of two clusters.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

Returns:

tuple containing:

  • dist_x (float): The horizontal gap between two clusters. The distance is 0 if there is horizontal overlap between two clusters.

  • dist_y (float): The vertical gap between two clusters. The distance is 0 if there is vertical overlap between two clusters.

Return type:

tuple

extract_order_from_cluster(cluster_no: int, index: ndarray, x: ndarray, y: ndarray)[source]

Get curve fitting result on specified cluster.

Parameters:
  • cluster_no (int) – id of the cluster to find the curve fitting results.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

Returns:

Please see Returns of curve_fitting_on_one_cluster().

Return type:

tuple

extract_order_trace(power_for_width_estimation: int = -1, data_range=None, show_time: bool = False, print_debug: str = None, rows_to_reset=None, cols_to_reset=None, orderlet_gap_pixels=2)[source]

Order trace extraction.

The order trace extraction includes the steps to smooth the image, locate the clusters, form clusters, remove and trim noisy clusters, merge the clusters to form order traces, model the order trace using polynomial fit and find the top and bottom widths along the traces.

Parameters:
  • power_for_width_estimation (int) – Degree of polynomial fit for trace width estimation. Defaults to -1.

  • data_range (list) – Area of the data, x1, x2, y1, y2, to be processed, where x1, y1 and x2, y2 are the corner coordinates of the area. x1, x2 or y1, y2 respectively represents the horizontal or vertical position relative to the first column or row of the image when it is greater than or equal to 0, otherwise the position relative to the last column or the last row.

  • show_time (bool, optional) – Show running time if True. Defaults to False.

  • print_debug (str, optional) – Print debug information to stdout if it is provided as empty string, a file with path print_debug if it is non empty string, or no print if it is None. Defaults to None.

  • rows_to_reset (list, optional) – Collection of rows to reset. Default to None.

  • cols_to_reset (list, optional) – Collection of columns to reset. Default to None.

  • orderlet_gap_pixels (number, optional) – number of pixels to ignore between orderlets during extraction.

Returns:

order trace extraction and analysis result, like:

{
    'order_trace_result': Padas.DataFrame,
                          # table storing coefficients of polynomial
                          # fit, bottom/top width, and left/right boundary.
    'cluster_index': numpy.ndarray, # Array of cluster id of cluster pixels.
    'cluster_x': numpy.ndarray, # Array of x coordinates of cluster pixels.
    'cluster_y': numpy.ndarray  # Array of y coordinates of cluster pixels.
}

Return type:

dict

find_all_cluster_widths(index_t: ndarray, new_x: ndarray, new_y: ndarray, power_for_width_estimation: int = 3, cluster_set: list = None)[source]

Compute the top and bottom widths along the order trace.

Parameters:
  • index_t (numpy.ndarray) – Array of cluster id on cluster pixels.

  • new_x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • new_y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • power_for_width_estimation (int, optional) – Degree of polynomial fit for width estimation, degree 2 or 3 is suggested. Defaults to 3. The estimation step skips if it is less than 0.

  • cluster_set (list, optional) – Set of selected cluster id for width finding. Defaults to None. Widths of all clusters are computed if None.

Returns:

a list of width information for each order trace. Each element in the list is like:

{
    'top_edge': float,   # top width along the trace.
    'bottom_edge': float # bottom width along the trace.
}

Return type:

list

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for new_x, new_y or index_t.

  • Exception – If the size of new_x, new_y, or index_t are not the same.

find_cluster_width_by_gaussian(cluster_no: int, poly_coeffs: ndarray, cluster_points: ndarray)[source]

Find the width of the cluster using Gaussian to approximate the distribution of collected spectral data.

Parameters:
  • cluster_no (int) – Cluster id.

  • poly_coeffs (numpy.ndarray) – Polynomial fitting information and the covered area of all clusters.

  • cluster_points (numpy.ndarray) – Pixel position (y values) along the polynomial fit of every cluster.

Returns:

cluster width information like:

{
    'cluster_no': int,   # cluster id.
    'avg_pwidth': float, # bottom width of cluster.
    'avg_nwidth': float  # top width of cluster.
}

Return type:

dict

static find_mean_from_histogram(vals: ndarray, bin_no: int = 4, c_range: list = None, cut_at: float = None)[source]

Find the mean value based on the histogram of the data set.

Calculate the mean of the data selected from the given data set based on the histogram of the set.

Parameters:
  • vals (numpy.ndarray) – Array of values.

  • bin_no (int) – Bin number for the histogram.

  • c_range (list, optional) – Range for making histogram.

  • cut_at (float, optional) – Upper limit of the mean value. Defaults to None.

Returns:

Mean value of the data set.

Return type:

float

static fit_width_by_gaussian(x_set: ndarray, y_set: ndarray, center_y: float, xs: int, sigma: float = 3.0)[source]

Find the width using Gaussian fitting.

Fit the x, y set of data using Gaussian and find the width by looking at sigma of the Gaussian fit.

Parameters:
  • x_set (numpy.ndarray) – x data set.

  • y_set (numpy.ndarray) – y data set.

  • center_y (float) – Estimation of y value at the center from x_set.

  • xs (int) – x location for center_y.

  • sigma (float, optional) – Magnitude of standard deviation to get the width. Defaults to 3.0.

Returns:

Gaussian fit results:

  • gaussian_fit: Gaussian fit object.

  • width (float): Width found by Gaussian fit.

  • gaussian_center (float): Mean of Gaussian fit.

Return type:

tuple

static float_to_string(afloat)[source]

Convert float to string by taking 4 decimal digits.

Parameters:

afloat (float) – A float number.

Returns:

String of a float number with 4 decimal digits.

Return type:

str

form_clusters(c_x: ndarray, c_y: ndarray, th=None)[source]

Form clusters and assign id to each formed cluster.

Form the cluster units and remove the small size cluster units. There is no pixel connected between different cluster units.

Parameters:
  • c_x (numpy.ndarray) – Array of x coordinates for cluster pixels.

  • c_y (numpy.ndarray) – Array of y coordinates for cluster pixels.

  • th (int, optional) – Size threshold used for removing noisy cluster. Defaults to None.

Returns:

Information of cluster pixels after cluster units are formed,

  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels.

  • new_index (numpy.ndarray): Array of cluster id on cluster pixels.

Return type:

tuple

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for c_x or c_y.

  • Exception – If the size of c_x or c_y are not the same.

get_cluster_points(polys_coeffs: ndarray)[source]

Compute cluster points (y values) along the fitting curve within x range of the cluster.

Parameters:

polys_coeffs (numpy.ndarray) – Polynomial fit coefficients and area on clusters.

Returns:

Arrays contains cluster points (y values) along the trace based on the polynomial fitting. Each row includes y values along x axis of one cluster.

Return type:

numpy.ndarray

static get_cluster_size(c_id: int, index: ndarray, x: ndarray, y: ndarray)[source]

Compute the width, height, total pixels and pixel index collection per specified cluster id.

Parameters:
  • c_id (int) – Cluster id.

  • index (np.ndarray) – Array of cluster id on cluster pixels.

  • x (np.ndarray) – Array of x coordinates of cluster pixels.

  • y (np.ndarray) – Array of y coordinates of cluster pixels.

Returns:

Size information of the cluster,

  • w (int): width of the cluster c_id.

  • h (int): height of the cluster c_id.

  • total_pixel (int): total pixel contained in the cluster c_id.

  • crt_idx (numpy.ndarray): Array contains the index from index for all pixels belonging to cluster c_id.

Return type:

tuple

get_config_value(param: str, default)[source]

Get defined value from the config file.

Search the value of the specified property from config section. The default value is returned if no found.

Parameters:
  • param (str) – Name of the parameter to be searched.

  • default (str/int/float) – Default value for the searched parameter.

Returns:

Value for the searched parameter.

Return type:

int/float/str

get_fit_error_threshold()[source]

Get polynomial fitting mean square error threshold

Returns:

error threshold

Return type:

float

get_instrument()[source]

Get imaging instrument.

Returns:

Instrument name.

Return type:

str

get_poly_degree()[source]

Order of polynomial for order trace fitting.

Returns:

Order of polynomial.

Return type:

int

static get_segments_from_index_list(id_list: ndarray, loc: ndarray)[source]

Find horizontal segments at some y location.

Horizontal segment means a segment containing continuous cluster pixels at the same y position. The finding is based on index list associated with an array of x coordinates.

Parameters:
  • id_list (numpy.ndarray) – Array of index for the array of loc.

  • loc (numpy.ndarray) – Array of x coordinates of cluster pixels.

Returns:

List of horizontal segments, like:

[[<start_idx>_i, <end_idx>_i], ..., [<start_idx>_n, <end_idx>_n]]
'''
where
    <start_idx>_i and <end_idx>_i represent the starting and ending index
    of the i-th segment and the index is associated with parameter loc.

ex. [[1, 3], [7, 10], ..., [150, 160]] means the following segments are
    included,
    1st segment is from loc[1] to loc[3] along x-axis.
    2nd segment is from loc[7] to loc[10] along x-axis.
    last segment is from loc[150] to loc[160] along x-axis.
'''

Return type:

list

get_sigma_for_width_fititng()[source]

Get the deviation number to estimate the width of the order

Returns:

number of sigma

Return type:

float

static get_sorted_index(poly_coeffs: ndarray, cluster_no: int, power: int, x_loc: int)[source]

Get sorted index for a cluster.

Do sorting on the list with cluster id based on the cluster’s position (y values) at x_loc and find the index of the cluster with cluster_no in the newly sorted list.

Parameters:
  • poly_coeffs (numpy.ndarray) – Array contains coefficients of polynomial fit and the area of the clusters.

  • cluster_no (int) – id of the cluster to get the index from the sorted list.

  • power (int) – Degree of the polynomial fit to the clusters.

  • x_loc (int) – x position for the sorting.

Returns:

contains the sorted information, like:

{
    'idx': int,
            # index of the cluster `cluster_no` from the new sorted list.
    'index_v_pos': numpy.ndarray
            # sorted list of cluster id based on the y position at `x_loc`.
}

Return type:

dict

get_spectral_data()[source]

Get spectral information including data and dimension.

Returns:

Information of spectral data,

  • (numpy.ndarray): 2D spectral data.

  • nx (int): Width of the data.

  • ny (int): Height of the data.

Return type:

tuple

get_trace_vertical_gap()[source]

Get the estimated vertical gap between the traces

Returns:

Vertical gap between traces

Return type:

float

get_width_default()[source]

Get the trace width default

Returns:

number of width default

Return type:

float

handle_noisy_cluster(index_t: ndarray, x: ndarray, y: ndarray, num_set: list)[source]

Handle the cluster which is not well fitted by polynomial curve.

Parameters:
  • index_t (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates on cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates on cluster pixels.

  • num_set (list) – The cluster with the specified id (currently, the first member in the list) is handled.

Returns:

Status after processing:

  • new_index_t (np.ndarray): updated version of index_t after processing

  • status (dict): One of the following possible process results is returned:

    1. the cluster is to be deleted.

    2. the cluster pixels is to be changed.

    3. the cluster is to be split into multiple cluster units.

    4. the cluster remains the same.

    it is like:

    {
        'msg': 'delete'/'change'/'split'/'same',
        'cluster_id': <target_cluster_id> int,
        'cluster_added': [<new_id_1>, <new_id_2>,...,<new_id_i>],
        'poly_fitting':{
            <cluster_id>: {
                'errors': float,
                            # Least square error by using polynomial fit.
                'coeffs': numpy.ndarray,
                            # Coefficients of polynomial fit.
                'area': list,     # Area of the cluster, like
                                  # [<min_x>, <max_x>, <min_y>, <max_y>]
                                  # for 4 borders of the cluster.
            }
            <new_id_1>: {'errors': ..., 'coeffs': ..., 'area': ...},
            <new_id_n>: {'errors': ..., 'coeffs': ..., 'area': ...}}
    }
    # <new_id_i> is the id for newly created cluster, if 'split'.
    

Return type:

tuple

locate_clusters(img_rows_to_reset=None, img_cols_to_reset=None)[source]

Find cluster pixels from 2D data array.

Perform smoothing method to convert the pixels to be 1 and 0 and find cluster pixels. Cluster pixels mean a set of pixels with value 1 and each pixel connects to at least one neighbor pixel in vertical, horizontal or in diagonal direction.

Parameters:
  • img_rows_to_reset (list, optional) – collection of rows to be reest.

  • img_cols_to_reset (list, optional) – collection of columns to be reest.

Returns:

result of formed clusters, like:

{
    'x': numpy.ndarray,  # Array of x coordinates of cluster pixels.
    'y': numpy.ndarray,  # Array of y coordinates of cluster pixels.
    'cluster_image': numpy.ndarray
                         # 2D image in which the cluster pixels are with
                         # value 1 and non cluster pixels are with value 0.
}

Return type:

dict

make_2d_data(index: ndarray, x: ndarray, y: ndarray, selected_clusters: ndarray = None)[source]

Create 2D data based on cluster pixels related information.

Parameters:
  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • selected_clusters (numpy.ndarray, optional) – Make 2D data based on selected clusters only. Defaults to None.

Returns:

2D data with pixels set as 1 on cluster pixels, or 0 on non cluster pixels.

Return type:

numpy.ndarray

merge_clusters(index: ndarray, x: ndarray, y: ndarray)[source]

Merge clusters based on the closeness between the clusters and the fitting quality by the same polynomial.

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.array) – Array of x coordinates of cluster pixels.

  • y (numpy.array) – Array of y coordinates of cluster pixels.

Returns:

Information of cluster pixels after processing,
  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels after processing.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels after processing.

  • new_index (numpy.ndarray): Array of cluster id of cluster pixels.

  • m_coeffs (numpy.ndarray): Array containing polynomial fitting coefficients and the area of the clusters. Each row of the array has the data for one cluster.

Return type:

tuple

merge_clusters_and_clean(index: ndarray, x: ndarray, y: ndarray)[source]

Merge clusters and remove the clusters with big opening in the center.

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

Returns:

Information of cluster pixels after merging,

  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels after processing.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels after processing.

  • new_index (numpy.ndarray): Array of cluster id of cluster pixels after processing.

Return type:

tuple

Raises:
  • AttributeError – The Raises section is a list of all exceptions that are relevant to the interface.

  • TypeError – If there is type error for ‘x`, y or index.

  • Exception – If the size of x, y, or index are not the same.

merge_fitting_curve(poly_curves: ndarray, index: ndarray, x: ndarray, y: ndarray, threshold=2.5)[source]

Merge the cluster to the closest neighbor.

The merge iterates on cluster pairs and stops when one merge is made or all paris are tested.

Parameters:
  • poly_curves (numpy.ndarray) – Array containing coefficients of polynomial fitting to all clusters and the area of the clusters. Each row contains the coefficients and the area for one cluster.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • threshold (float) – error threshold to determine the polynomial fitting quality.

Returns:

merge status, like:

{
    'status': 'changed'|'nochange'.
    'index': numpy.ndarray,
                    # Array of cluster id on cluster pixels after merge.
    'kept_curves': list,     # Array of cluster id of unchanged clusters.
    'log': <messge>.
}

# 'status' means if clusters are 'changed' (if merge happens) or 'nochange'.
# 'log' contains the message regarding any merge action if there is,
# like 'remove id' or 'merge id_1 and id_2'.

Return type:

dict

static merge_two_clusters(cluster_nos: ndarray, x: ndarray, y: ndarray, index: ndarray, power: int)[source]

Calculate the polynomial fitting error and distance in case two clusters are merged.

Parameters:
  • cluster_nos (numpy.ndarray) – Two cluster id included and the first is the cluster located leftmost.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • power (int) – Degree of polynomial to fit two clusters.

Returns:

Information of polynomial fit to two clusters,

  • poly_info (numpy.ndarray): Array contains coefficients of fitting polynomial and area of the cluster after the merge.

  • errors (float): Least square error of polynomial fitting.

Return type:

tuple

static mirror_data(x_set: ndarray, y_set: ndarray, mirror_side: int)[source]

Mirror y value to the left or right side of x_set.

Parameters:
  • x_set (numpy.ndarray) – Array of x values.

  • y_set (numpy.ndarray) – Array of y values paired to each of x_set.

  • mirror_side (int) – Mirror direction. Mirror to the left side of x_set at if 0, or to the right side of x_set if 1.

Returns:

Data after mirroring,

  • x_new_set (numpy.ndarray): Array containing x coordinates from left to the right after mirroring.

  • y_new_set (numpy.ndarray): Array containing y coordinates relevant to x_new_set.

Return type:

tuple

one_step_merge_cluster(crt_coeffs: ndarray, crt_index: ndarray, crt_x: ndarray, crt_y: ndarray)[source]

Single step of cluster merging, at most one pair of clusters is merged.

Parameters:
  • crt_coeffs (numpy.ndarray) – Coefficients of polynomial fit and the area of the clusters.

  • crt_index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • crt_x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • crt_y (numpy.ndarray) – Array of y coordinates of cluster pixels.

Returns:

Information of cluster pixels after merge and merge status:

  • (numpy.ndarray): Array of cluster id of cluster pixels after merge.

  • (numpy.ndarray): Array of x coordinates of cluster pixels after merge.

  • (numpy.ndarray): Array of y coordinates of cluster pixels after merge.

  • (numpy.ndarray): Coefficients of polynomial fit and the area of the clusters after the merge.

  • merge_status (dict): merge status, please see merge_fitting_curve() for the detail.

Return type:

tuple

static opt_filter(y_data: ndarray, par: int, weight: ndarray = None)[source]

A smoothing filter.

post_process(orig_coeffs, orig_widths, orderlet_gap=2)[source]
post process and refine the calculated widths to make the widths located closer to the valley between two

consecutive orderlet traces and in the style of being more symmetric to the valley.

Parameters:
  • orig_coeffs – coeffs from high order to low order

  • orig_widths – widths array of lower and upper widths

Returns:

new_coeffs with one extra row added to orig_coeffs as the same format of the parameter

cluster_coeffs to write_cluster_info_to_dataframe.

list: orig_widths containing the width information in the same format of the parameter cluster_widths

to write_cluster_info_to_dataframe.

Return type:

numpy.array

remove_broken_cluster(index: ndarray, x: ndarray, y: ndarray)[source]

Remove the cluster which has big opening around the center of the image.

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

Returns:

Information of cluster pixels after processing,

  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels after processing.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels after processing.

  • new_index (numpy.ndarray): Array of cluster id on cluster pixels after processing.

Return type:

tuple

remove_cluster_by_size(clusters_endy_dict: dict, x_index: ndarray, y_index: ndarray, th=None)[source]

Remove noisy clusters.

The removal process is based on pixel number and the size of the cluster. Assign an id to non-noisy cluster.

Parameters:
  • clusters_endy_dict (dict) – Collection of clusters collected by collect_clusters, please see Returns section of collect_clusters() for more detail.

  • x_index (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y_index (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • th (int, optional) – Size threshold for removing the noisy cluster. Defaults to None.

Returns:

cluster information containing assigned id, like:

{
    'index': numpy.ndarray,
                        # array of cluster id associated with cluster pixels.
    'n_regions': int    # total cluster.
}

Return type:

dict

remove_noise_in_cluster(cluster_curves: list, x_index: ndarray, y_index: ndarray, crt_cluster_idx: ndarray, th=None)[source]

Remove noise cluster, trim noise from the cluster, or split the cluster into another clusters.

The removal works on the clusters collected by handle_noisy_cluster(). Whether the cluster in the collection is kept or removed depends on the size and polynomial fitting result.

Parameters:
  • cluster_curves (list) – Array of clusters collected by handle_noisy_cluster() are tested to be kept or removed.

  • x_index (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y_index (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • crt_cluster_idx (numpy.ndarray) – Set of index for the clusters included in cluster_curves and the index is for cluster pixels related array, like x_index or y_index.

  • th (float, optional) – Threshold for cluster size. Defaults to None.

Returns:

Polynomial fit results and cluster id for not removed clusters,

  • index (np.npdarray): Array associated with cluster pixels in which the pixels covered by any not removed clusters of cluster_curves are marked by a cluster no. starting from 1.

  • poly_fitting_results (dict): Polynomial fitting results for not removed clusters in cluster_curves, like:

    {
        'errors': float,  # Least square errors of polynomial fitting.
        'coeffs': numpy.ndarray,  # Coefficients of polynomial fitting.
        'area': list              # area of the cluster, like
                                  # [<min_x>, <max_x>, <min_y>, <max_y>]
                                  # for 4 borders of the cluster.
    }
    

Return type:

tuple

static remove_unassigned_cluster(x: ndarray, y: ndarray, index: ndarray)[source]

Remove the cluster pixels which has no cluster number assigned.

Parameters:
  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

Returns:

Information of cluster pixels after processing,

  • x_r (numpy.ndarray): Array of x coordinates of cluster pixels after processing.

  • y_r (numpy.ndarray): Array of y coordinates of cluster pixels after processing.

  • index_r (numpy.ndarray): Array of cluster id on cluster pixels after processing.

Return type:

tuple

reorganize_index(index: ndarray, x: ndarray, y: ndarray, return_map: bool = False)[source]

Remove cluster pixels with unsigned cluster no. and reorder the cluster.

Remove the cluster pixels with cluster number less than 1 and re-assign the cluster id to existing cluster pixels.

Parameters:
  • index (numpy.ndarray) – Array of cluster id on cluster pixels.

  • x (numpy.ndarray) – Array of x coordinates of cluster pixels.

  • y (numpy.ndarray) – Array of y coordinates of cluster pixels.

  • return_map (bool, optional) – Return map between old cluster id and new cluster id if True.

Returns:

Information of cluster pixels after processing,

  • new_x (numpy.ndarray): Array of x coordinates of cluster pixels after processing.

  • new_y (numpy.ndarray): Array of y coordinates of cluster pixels after processing.

  • new_index (numpy.ndarray): Array of cluster id on cluster pixels after processing.

  • return_map (dict): Map between old cluster id and new cluster id like:

    {
        <old cluster id> int: <new cluster id> int
    }
    

Return type:

tuple

static reset_row_or_column(imm: ndarray, reset_ranges: list = None, row_or_column: int = 0, val: int = 0)[source]

Set a value to columns or rows in 2D image array.

Assign a value to pixels of some columns or rows.

Parameters:
  • imm (numpy.ndarray) – Data of 2D array.

  • reset_ranges (list) – Range of columns or rows to be set.

  • row_or_column (int, optional) – Set value to rows if zero or to columns if non-zero. Defaults to 0.

  • val (int, optional) – Value to be set. Defaults to 0.

Returns:

Pixel information after resetting,

  • imm (numpy.ndarray): 2D array with reset value.

  • (numpy.ndarray): Array of x coordinate of pixels with value greater than 0.

  • (numpy.ndarray): Array of y coordinate of pixels with value greater than 0.

Return type:

tuple

set_data_range(data_range=None)[source]

Set data range to be processed

Parameters:

data_range (list) – Area of the data, [x1, x2, y1, y2], to be processed. The column (or the row) is counted relatively from the first column (or the first row) in case the number is not less than 0, otherwise the column (or the row) is counted from the last one.

Returns:

Data range position relative to the first column and first row of the raw image.

Return type:

list

sort_cluster_in_y(cluster_coeffs: ndarray)[source]

Sort cluster based on vertical position.

Parameters:

cluster_coeffs (np.ndarray) – Array contains coefficients of polynomial fit and the area of the clusters.

Returns:

Sorted list of cluster id based on the vertical position of the clusters.

Return type:

np.ndarray

static sort_cluster_on_loc(clusters: list, loc)[source]

Sort the clusters based on the specified location key.

Parameters:
  • clusters (list) – List of clusters to be sorted.

  • loc (str/int) – The key that the sorting is based on.

Returns:

Sorted result.

Return type:

list

static sort_cluster_segments(segments: list)[source]

Sort a set of segments based on the first number contained in each segment.

Parameters:

segments (list) – Array of segments. Each element in segments is list-like type.

Returns:

Sorted result.

Return type:

list

write_cluster_info_to_dataframe(cluster_widths: list, cluster_coeffs: ndarray)[source]

Write the coefficients of polynomial fit, area and top/bottom widths of order trace to DataFrame object.

Parameters:
  • cluster_widths (list) –

    Array contains the top and bottom widths of clusters, like:

    [
        {
            'top edge': float,     # top width of first cluster
            'bottom edge': float   # bottom width of first cluster
        }, ....,
        {
            'top edge': float,     # top width of last cluster
            'bottom edge': float   # bottom width of last cluster
        }
    ]
    

  • cluster_coeffs (numpy.ndarray) – Array contains coefficients of polynomial fit and the area of the clusters.

Returns:

Instance of DataFrame containing columns (for polynomial of degree 3) like,

Coeff0, Coeff1, Coeff2, Coeff3, BottomEdge, TopEdge, X1, X2

to contain coefficients of polynomial fit from lower order to higher, bottom and top widths, and the left and right boundary of the orders.

Return type:

Pandas.DataFrame