diff --git a/Management-Utilities/Workload-Factory-API-Samples/link_associate b/Management-Utilities/Workload-Factory-API-Samples/link_associate index 14c8d27..55eaac3 100755 --- a/Management-Utilities/Workload-Factory-API-Samples/link_associate +++ b/Management-Utilities/Workload-Factory-API-Samples/link_associate @@ -27,7 +27,7 @@ Where: refresh_token - Is a refresh token used to obtain an access token needed 'list_credentials' to get a list of credentials you have access to aws_region - is the AWS region where the FSx file systems are located filesystem_ID - is the AWS file system ID of the FSx file system where the volume resides - link_ID - Is the id of the link you want to disassociate + link_ID - Is the id of the link you want to associate Instead of passing parameters on the command line, you can set the following environment variables: diff --git a/Monitoring/CloudWatch-FSx/README.md b/Monitoring/CloudWatch-FSx/README.md index cb24503..f29d76c 100644 --- a/Monitoring/CloudWatch-FSx/README.md +++ b/Monitoring/CloudWatch-FSx/README.md @@ -4,163 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/CloudWatch-Dashboard-FSx](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/CloudWatch-Dashboard-FSx). Please refer to that repository for the latest updates. -# AWS CloudWatch Dashboard for FSx for ONTAP - -## Introduction -This sample provides a CloudFormation template to deploy an AWS CloudWatch dashboard for monitoring FSx for ONTAP systems. The dashboard offers comprehensive insights into your FSx for ONTAP resources, helping you monitor performance, track metrics, and manage alarms efficiently. -![Screenshot](images/CW-Dashboard-Screenshot.png) -The template creates the following resources: - -1. Dashboard - The Amazon CloudWatch dashboard divided into four main sections: - 1. Overview metrics of all FSx systems in the region. - 1. Metrics by individual FSx system. - 1. Metrics by volume ID. - 1. Alarms. -2. Lambda function - The service does the following: - 1. Build custom widgets for the dashboard. - 1. Collect metrics directly from ONTAP (like snapmirror health status). - 1. Create CloudWatch alarms for all files systems in the region. -3. Schedulers - Two Amazon EventBridge schedulers that trigger the Lambda function to: - 1. Collect ONTAP metrics. Scheduled to trigger every minute. - 1. Create, update or delete CloudWatch alarms. Scheduled to trigger once an hour. -4. Lambda Role - The IAM role that allows the Lambda function to run. -5. Scheduler Role - The IAM role that allows the scheduler to trigger the Lambda function. -6. SecretManager endpoint - The Lambda function runs inside a VPC, which by default lacks outgoing internet connectivity. To -enable the function to securely access the fsx credentials stored in AWS Secrets Manager, a VPC endpoint for the Secrets -Manager service is required. This endpoint allows the Lambda function to retrieve sensitive information from Secrets Manager -without needing direct internet access, maintaining security while ensuring the function can access the necessary credentials. -7. CloudWatch endpoint - The Lambda function runs inside a VPC, which by default lacks outgoing internet connectivity. To enable -the function to send logs and metrics to CloudWatch, a VPC endpoint for the CloudWatch service is required. This endpoint allows -the Lambda function to communicate with CloudWatch without needing direct internet access, maintaining security while ensuring -proper monitoring and logging functionality. -8. FSxService endpoint - The Lambda function runs inside a VPC, which by default lacks outgoing internet connectivity. To enable the -function to send calls to FSxService to retrieve file systems information. -  -## Prerequisites -1. You should have an AWS Account with the following permissions to create and manage resources: - * "cloudformation:DescribeStacks" - * "cloudformation:ListStacks" - * "cloudformation:DescribeStackEvents" - * "cloudformation:ListStackResources" - * "cloudformation:CreateChangeSet" - * "ec2:DescribeSubnets" - * "ec2:DescribeSecurityGroups" - * "ec2:DescribeVpcs" - * "iam:ListRoles" - * "iam:GetRolePolicy" - * "iam:GetRole" - * "iam:DeleteRolePolicy" - * "iam:CreateRole" - * "iam:DetachRolePolicy" - * "iam:PassRole" - * "iam:PutRolePolicy" - * "iam:DeleteRole" - * "iam:AttachRolePolicy" - * "lambda:AddPermission" - * "lambda:RemovePermission" - * "lambda:InvokeFunction" - * "lambda:GetFunction" - * "lambda:CreateFunction" - * "lambda:DeleteFunction" - * "lambda:TagResource" - * "codestar-connections:GetSyncConfiguration" - * "ecr:BatchGetImage" - * "ecr:GetDownloadUrlForLayer" - * "scheduler:GetSchedule" - * "scheduler:CreateSchedule" - * "scheduler:DeleteSchedule" - * "logs:PutRetentionPolicy" - * "secretsmanager:GetSecretValue" (on specific secret) -2. Optional: create a secret in AWS Secrets Manager with key-value pairs of file system IDs and their corresponding credentials value. -Value can be provided in two formats. The first format is simply the password for the 'fsxadmin' user. The second format includes both the username and password, separated by a colon. -This secret is necessary for making direct ONTAP API calls to monitor resources, such as SnapMirror relations. -Examples secret structure: -``` - { - "fs-111222333": "Password1", - "fs-444555666": "Password2" - } - or - { - "fs-111222333": "myUserName:Password1", - "fs-444555666": "Password2" - } -``` -When deploying the CloudFormation template, you will need to provide the ARN of this secret as a parameter. This allows the Lambda function to securely access the passwords for monitoring purposes. -Note: If you choose not to provide this secret, some monitoring capabilities (such as SnapMirror relations metrics) may be limited. -  -## Usage -To use this solution, you will need to run the CloudFormation template in your AWS account. -The CloudFormation template requires the following parameters: - -1. Stack name - Identifier for the CloudFormation stack. Must not exceed 25 characters. (Note: While AWS allows stack names up to -128 characters, limit yours to 25. This name is used as base name for other resource names created within the stack, so keeping it short prevents issues with other resource names getting too long.) -2. VPC ID - The ID of the VPC in which the Lambda function will run. This VPC must have connectivity to all target file systems. It -can be either the same VPC where the file systems are located, or a different VPC with established connectivity (e.g. VPC peering, Transit Gateway) to the file systems' VPCs. -3. Subnet IDs - The IDs of the subnets in which the Lambda function will run. These subnets must have connectivity to the file -systems. -4. Security Group IDs - The IDs of the Security Groups that will be associated with the Lambda function when it runs. These Security -Groups must allow connectivity to the file systems. -5. Create FSx Service Endpoint - A boolean flag indicating whether you plan to create a FSxService VPC endpoint inside the VPC. Set -this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true. -6. Create Secret Manager Endpoint - A boolean flag indicating whether you plan to create a SecretManager VPC endpoint inside the -VPC. Set this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true. -7. Create CloudWatch Endpoint - A boolean flag indicating whether you plan to create a CloudWatch VPC endpoint inside the VPC. Set -this to true if you want to create the endpoint, or false if you don't. The decision to create this endpoint depends on whether you already have this type of endpoint in the subnet where the Lambda function is to run. If you already have one, set this to false; otherwise, set it to true. -8. Secret Manager FSx Admin Passwords ARN - Optional - The ARN of the AWS Secrets Manager secret containing the fsx credentials. -This ARN is required for certain functionalities, such as snapmirror metrics collection. -If not provided, some features may not operate correctly. This secret should contain key-value pairs as described in Prerequisites section above. -9. SNS Topic ARN for CloudWatch alarms - Optional - The ARN of the SNS topic to which CloudWatch alarms will be sent. If not provided, alarms will not be notified to any SNS topic. - -## Alarms Configuration -The Lambda function is responsible for creating alarms based on the thresholds set via environment variables. These environment variables can be set from the AWS console, under the Configuration tab of the dashboard Lambda function. You can find the specific Lambda function by its name “FSxNDashboard-. -The following environment variables are used: -1. CLIENT_THROUGHPUT_ALARM_THRESHOLD: This sets the threshold for the client throughput alarm. The default value is "90", but this can be customized as needed. When the client throughput exceeds this value (expressed as a percentage), an alarm will be triggered. -1. DISK_PERFORMANCE_ALARM_THRESHOLD: This sets the threshold for the disk performance alarm. The default value is "90", but this can be customized as needed. When the disk performance exceeds this value (expressed as a percentage), an alarm will be triggered. -1. DISK_THROUGHPUT_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the disk throughput utilization alarm. The default value is "90", but this can be customized as needed. When disk throughput utilization exceeds this value (expressed as a percentage), an alarm will be triggered. -1. SNAPMIRROR_UNHEALTHY_ALARM_THRESHOLD: This sets the threshold for the SnapMirror unhealthy alarm. The default value is "0", but this can be customized as needed. When the number of unhealthy SnapMirror relationships exceeds this value, an alarm will be triggered. -1. STORAGE_CAPACITY_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the storage capacity utilization alarm. The default value is "80", but this can be customized as needed. When storage capacity utilization exceeds this value (expressed as a percentage), an alarm will be triggered. -1. VOLUME_STORAGE_CAPACITY_UTILIZATION_ALARM_THRESHOLD: This sets the threshold for the volume storage capacity utilization alarm. The default value is "80", but this can be customized as needed. When volume storage capacity utilization exceeds this value (expressed as a percentage), an alarm will be triggered. - -In addition to the environment variables, you can use tags on the FSx and volume resources to override default thresholds or skip alarm management for specific resources. If a threshold is set to 100, the alarm will not be created. Similarly, skip tag is set to true, the alarm will be skipped. - -The tag keys used for this purpose are: - -1. client-throughput-alarm-threshold -1. skip-client-throughput-alarm -1. disk-performance-alarm-threshold -1. skip-disk-performance-alarm -1. disk-throughput-utilization-threshold -1. skip-disk-throughput-utilization-alarm -1. storage-capacity-utilization-alarm-threshold -1. skip-storage-capacity-utilization-alarm -1. volume-storage-capacity-utilization-alarm-threshold -1. skip-volume-storage-capacity-utilization-alarm -1. snapMirror-unhealthy-relations-alarm-threshold -1. skip-snapmirror-unhealthy-relations-alarm - -## Important Disclaimer: CloudWatch Alarms Deletion -Please note that when you delete the CloudFormation stack associated with this project, the CloudWatch Alarms created by the stack will not be automatically deleted. - -CloudFormation does not manage the lifecycle of CloudWatch Alarms created by the Lambda function. This means that even after stack deletion, these alarms will persist in your AWS account. - -To fully clean up resources after using this solution: -1. Delete the CloudFormation stack as usual. -2. Manually review and delete any associated CloudWatch Alarms through the AWS Console or using AWS CLI/SDK. -You can find the alarms by searching for the name prefix "FSx-ONTAP" in the CloudWatch Alarms section. - -This behavior ensures that important monitoring setups are not unintentionally removed, but it requires additional steps for complete resource cleanup. - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). -  -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. diff --git a/Monitoring/auto-add-cw-alarms/README.md b/Monitoring/auto-add-cw-alarms/README.md index aa71d70..278735a 100644 --- a/Monitoring/auto-add-cw-alarms/README.md +++ b/Monitoring/auto-add-cw-alarms/README.md @@ -4,250 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx_Alerting/Auto-Add-CloudWatch-Alarms](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx_Alerting/Auto-Add-CloudWatch-Alarms). Please refer to that repository for the latest updates. -# Automatically Add Cloud Watch Alarms to Monitor Aggregate, Volume and CPU Utilization - -## Introduction -There are times when you want to be notified when a FSx for ONTAP file system, or one of its volumes, is reaching -its capacity. AWS CloudWatch has metrics that can give you this information. The only problem is that they are -on a per instance basis. This means as you add and delete file systems and/or volumes, you have to add and -delete alarms. This can be tedious, and error prone. This script will automate the creation of -AWS CloudWatch alarms that monitor the utilization of the file system and its volumes. It will also create alarms -to monitor the CPU utilization of the file system. And if a volume or file system is removed, it will remove the associated alarms. - -To implement this, you might think to just create EventBridge rules that trigger on the creation or deletion of an FSx Volume. -This would kind of work, but since you have command line access to the FSx for ONTAP file system, you can create -and delete volumes without generating any CloudTrail events. So, depending on CloudTrail events would not be reliable. Therefore, instead -of relying on those events, this script will scan all the file systems and volumes in all the regions then create and delete alarms as needed. - -## Invocation -The preferred way to run this script is as a Lambda function. That is because it is very inexpensive to run without having -to maintain any compute resources. You can use an `EventBridge Schedule` to run it on a regular basis to -ensure that all the CloudWatch alarms are kept up to date. Since there are several steps involved in setting up a Lambda function, -a CloudFormation template is included in the repo, named `cloudlformation.yaml`, that will do the following steps for you: -- Create a role that will allow the Lambda function to: - - List AWS regions. This is so it can get a list of all the regions, so it can know which regions to scan for FSx for ONTAP file systems and volumes. - - List the FSx for ONTAP file systems. - - List the FSx volumes. - - List the CloudWatch alarms. - - List tags for the resources. This is so you can customize the thresholds for the alarms on a per instance basis. More on that below. - - Create CloudWatch alarms. - - Delete CloudWatch alarms that it has created (based on alarm names). -- Create a Lambda function with the Python program. -- Create an EventBridge schedule that will run the Lambda function on a user defined basis. -- Create a role that will allow the EventBridge schedule to trigger the Lambda function. - -To use the CloudFormation template perform the following steps: - -1. Download the `cloudformation.yaml` file from this repo. -2. Go to the `CloudFormation` services page in the AWS console and select `Create Stack -> With new resources (standard)`. -3. Select `Choose an existing template` and `Upload a template file`. -4. Click `Choose file` and select the `cloudformation.yaml` file you downloaded in step 1. -5. Click `Next` and fill in the parameters presented on the next page. The parameters are: - - `Stack name` - The name of the CloudFormation stack. Note this name is also used as a base name for some of the resources that are created, to make them unique, so you must keep this string under 25 characters, so the resource names don't exceed their name length limit. - - `SNStopic` - The SNS Topic name where CloudWatch will send alerts to. Note that since CloudWatch can't send messages to an SNS topic residing in a different region, it is assumed that the SNS topic, with the same name, will exist in all the regions where alarms are to be created. - - `accountId` - The AWS account ID associated with the SNS topic. This is only used to compute the ARN to the SNS Topic set above. - - `customerId` - This is optional. If provided the string entered is included in the description of every alarm created. - - `defaultCPUThreshold` - This will define a default CPU utilization threshold. You can override the default by having a specific tag associated with the file system (see below for more information). - - `defaultSSDThreshold` - This will define a default SSD (aggregate) utilization threshold. You can override the default by having a specific tag associated with the file system (see below for more information). - - `defaultVolumeThreshold` - This will define the default Volume utilization threshold. You can override the default by having a specific tag associated with the volume (see below for more information). - - `checkInterval` - This is the interval in minutes that the program will run. - - `alarmPrefixString` - This defines the string that will be prepended to every CloudWatch alarm name that the program creates. Having a known prefix is how it knows it is the one maintaining the alarm. - - `regions` - This is a comma separated list of AWS region names (e.g. us-east-1) that the program will act on. If not specified, the program will scan on all regions that support an FSx for ONTAP file system. Note that no checking is performed to ensure that the regions you provide are valid. -6. Click `Next`. There aren't any recommended changes to make to any of the proceeding pages, so just click `Next` again. -7. On the final page, check the box that says `I acknowledge that AWS CloudFormation might create IAM resources with custom names.` and then click `Submit`. - -If you prefer, you can run this Python program on any UNIX based computer that has Python installed. See the "Running on a computer" section below for more information. - -### Configuring the program -If you use the CloudFormation template to deploy the program, it will create the appropriate environment variables for you. -However, if you didn't use the CloudFormation template, you will need to configure the program yourself. Here are the -various ways you can do so: -* By editing the top part of the program itself where there are the following variable definitions. -* By setting environment variables with the same names as the variables in the program. -* If running it as a standalone program, via some command line options. - -:bulb: **NOTE:** The CloudFormation template will prompt for these values when you create the stack and will set the appropriate environment variables for you. - -Here is the list of variables, and what they define: - -| Variable | Description |Command Line Option| -|:---------|:------------|:--------------------------------| -|SNStopic | The SNS Topic name where CloudWatch will send alerts to. Note that it is assumed that the SNS topic, with the same name, will exist in all the regions where alarms are to be created.|-s SNS\_Topic\_Name| -|accountId | The AWS account ID associated with the SNS topic. This is only used to compute the ARN to the SNS Topic.|-a Account\_number| -|customerId| This is just an optional string that will be added to the alarm description.|-c Customer\_String| -|defaultCPUThreshold | This will define the default CPU utilization threshold. You can override the default by having a specific tag associated with the file system. See below for more information.|-C number| -|defaultSSDThreshold | This will define the default SSD (aggregate) utilization threshold. You can override the default by having a specific tag associated with the file system. See below for more information.|-S number| -|defaultVolumeThreshold | This will define the default Volume utilization threshold. You can override the default by having a specific tag associated with the volume. See below for more information.|-V number| -|alarmPrefixCPU | This defines the string that will be put in front of the name of every CPU utilization CloudWatch alarm that the program creates. Having a known prefix is how it knows it is the one maintaining the alarm.|N/A| -|alarmPrefixSSD | This defines the string that will be put in front of the name of every SSD utilization CloudWatch alarm that the program creates. Having a known prefix is how it knows it is the one maintaining the alarm.|N/A| -|alarmPrefixVolume | This defines the string that will be put in front of the name of every volume utilization CloudWatch alarm that the program creates. Having a known prefix is how it knows it is the one maintaining the alarm.|N/A| -|regions | This is a comma separated list of AWS region names (e.g. us-east-1) that the program will act on. If not specified, the program will scan on all regions that support an FSx for ONTAP file system. Note that no checking is performed to ensure that the regions you provide are valid.|-r region -r region ...| - -There are a few command line options that don't have a corresponding variable: -|Option|Description| -|:-----|:----------| -|-d| This option will cause the program to run in "Dry Run" mode. In this mode, the program will only display messages showing what it would have done, and not really create or delete any CloudWatch alarms.| -|-F filesystem\_ID| This option will cause the program to only add or remove alarms that are associated with the filesystem\_ID.| - -As mentioned with the threshold variables, you can create a tag on the specific resource to override the default value set by the associated threshold -variable. Here is the list of tags and where they should be located: - -|Tag|Description|Location| -|:---|:------|:---| -|alarm\_threshold | Sets the volume utilization threshold. | Volume | -|cpu\_alarm\_threshold| Sets the CPU utilization threshold. | File System | -|ssd\_alarm\_threshold| Sets the SSD utilization threshold. | File System | - -:bulb: **NOTE:** When the alarm threshold is set to 100, the alarm will not be created. So, if you set the default to 100, then you can selectively add alarms by setting the appropriate tag. - -### Running on a computer -To run the program on a computer, you must have Python installed. You will also need to install the boto3 library. -You can do that by running the following command: - -```bash -pip install boto3 -``` -Once you have Python and boto3 installed, you can run the program by executing the following command: - -```bash -python3 auto_add_cw_alarms.py -``` -This will run the program based on all the variables set at the top. If you want to change the behavior without -having to edit the program, you can either use the Command Line Option specified in the table above or you can -set the appropriate environment variable. Note that you can give a `-h` (or `--help`) command line option -and the program will display a list of all the available options. - -You can limit the regions that the program will act on by using the `-r region` option. You can specify that option -multiple times to act on multiple regions. - -You can run the program in "Dry Run" mode by specifying the `-d` (or `--dryRun`) option. This will cause the program to only display -messages showing what it would have done, and not really create or delete any CloudWatch alarms. - -### Running as a Lambda function -A CloudFormation template is included in the repo that will do the steps below. If you don't want to use that, here are -the detailed steps required to install the program as a Lambda function. - -#### Create a Lambda function -1. Download the `auto_add_cw_alarms.py` file from this repo. -2. Create a new Lambda function in the AWS console by going to the Lambda services page and clicking on the `Create function` button. -3. Choose `Author from scratch` and give the function a name. For example `auto_add_cw_alarms`. -4. Choose the latest version of Python (currently Python 3.11) as the runtime and click on `Create function`. -5. In the function code section, copy and paste the contents of the `auto_add_cw_alarms.py` file into the code editor. -6. Click on the `Deploy` button to save the function. -7. Click on the Configuration tag and then the "General configuration" sub tab and set the "Timeout" to be at least 3 minutes. -8. Click on the "Environment variables" tab and add the following environment variables: - - `SNStopic` - The SNS Topic name where CloudWatch will send alerts to. - - `accountId` - The AWS account ID associated with the SNS topic. - - `customerId` - This is optional. If provided the string entered is included in the description of every alarm created. - - `defaultCPUThreshold` - This will define a default CPU utilization threshold. - - `defaultSSDThreshold` - This will define a default SSD (aggregate) utilization threshold. - - `defaultVolumeThreshold` - This will define the default Volume utilization threshold. - - `alarmPrefixString` - This defines the string that will be prepended to every CloudWatch alarm name that the program creates. - - `regions` - This is an optional comma separated list of AWS region names (e.g. us-east-1) that the program will act on. If not specified, the program will scan on all regions that support an FSx for ONTAP file system. - -You will also need to set up the appropriate permissions for the Lambda function to run. It doesn't need many permissions. It just needs to be able to: -* List the FSx for ONTAP file systems. -* List the FSx volume names. -* List tags associated with an FSx file system or volume. -* List the CloudWatch alarms. -* List all the AWS regions. -* Create CloudWatch alarms. -* Delete CloudWatch alarms. You can set resource to `arn:aws:cloudwatch:*:`*AccountId*`:alarm:`*alarmPrefixString*`*` to limit the deletion to only the alarms that it creates. -* Create CloudWatch Log Groups and Log Streams in case you need to diagnose an issue. - -The following is an example AWS policy that has all the required permissions to run the script (although you could narrow the "Resource" specification to suit your needs.) -Note it assumes that the alarmPrefixString is set to "FSx-ONTAP-Auto". -```JSON -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "VisualEditor0", - "Effect": "Allow", - "Action": [ - "fsx:DescribeFilesystems", - "fsx:DescribeVolumes", - "fsx:ListTagsForResource", - "cloudwatch:DescribeAlarms" - "cloudwatch:DescribeAlarmsForMetric", - "ec2:DescribeRegions", - "cloudwatch:PutMetricAlarm", - ], - "Resource": "*" - }, - { - "Sid": "VisualEditor1", - "Effect": "Allow", - "Action": [ - "cloudwatch:DeleteAlarms" - ], - "Resource": "arn:aws:cloudwatch:*:*:alarm:FSx-ONTAP-Auto*" - }, - { - "Sid": "VisualEditor2", - "Effect": "Allow", - "Action": [ - "logs:CreateLogStream", - "logs:PutLogEvents" - ], - "Resource": "arn:aws:logs:*:*:log-group:*:log-stream:*" - }, - { - "Sid": "VisualEditor3", - "Effect": "Allow", - "Action": "logs:CreateLogGroup", - "Resource": "arn:aws:logs:*:*:log-group:*" - } - ] -} -``` - -Once you have deployed the Lambda function it is recommended to set up a schedule to run it on a regular basis. -The easiest way to do that is: -1. Click on the `Add trigger` button from the Lambda function page. -2. Select `EventBridge (CloudWatch Events)` as the trigger type. -3. Click on the `Create a new rule` button. -4. Give the rule a name and a description. -5. Set the `Schedule expression` to be the interval you want the function to run. For example, if you want it to run every 15 minutes, you would set the expression to `rate(15 minutes)`. -6. Click on the `Add` button - -### Expected Action -Once the script has been configured and invoked, it will: -* Scan for every FSx for ONTAP file systems in every region, unless you have specified a specific list of regions to scan. For every file system that it finds it will: - * Create a CPU utilization CloudWatch alarm, unless the threshold value is set to 100 for the specific alarm. - * Create an SSD utilization CloudWatch alarm, unless the threshold value is set to 100 for the specific alarm. -* Scan for every FSx for ONTAP volume in every region, unless you have specified a specific list of regions to scan. For every volume it finds it will: - * Create a Volume Utilization CloudWatch alarm, unless the threshold value is set to 100 for the specific alarm. -* Scan for the CloudWatch alarms and remove any alarms that the associated resource doesn't exist anymore. - -### Cleaning up -If you decide you don't want to use this program anymore, you can delete the CloudFormation stack that you created. -This will remove the Lambda function, the EventBridge schedule, and the roles that were created for you. If you did -not use the CloudFormation template, you will have to do these steps yourself. - -Once you have removed the program, you can remove all the CloudWatch alarms that were created by the program by running -the following command: - -```bash -region=us-west-2 -aws cloudwatch describe-alarms --region=$region --alarm-name-prefix "FSx-ONTAP-Auto" --query "MetricAlarms[*].AlarmName" --output text | xargs -n 50 aws cloudwatch delete-alarms --region $region --alarm-names -``` -This command will remove all the alarms that have an alarm name that starts with "FSx-ONTAP-Auto" in the us-west-2 region. -Make sure to adjust the alarm-name-prefix to match the AlarmPrefix you set when you deployed the program. -You will also need to adjust the region variable and run the `aws` command again for each region where you have alarms in. - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2024 NetApp, Inc. All Rights Reserved. diff --git a/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README-MANUAL.md b/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README-MANUAL.md index 31477c6..0c5c7db 100644 --- a/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README-MANUAL.md +++ b/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README-MANUAL.md @@ -4,145 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx-Audit-Logs-CloudWatch](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx-Audit-Logs-CloudWatch). Please refer to that repository for the latest updates. -# Ingest FSx for ONTAP NAS audit logs into CloudWatch - -## Overview -This sample demonstrates a way to ingest the NAS audit logs from an FSx for Data ONTAP file system into a CloudWatch log group -without having to NFS or CIFS mount a volume to access them. -It will attempt to gather the audit logs from all the SVMs within all the FSx for Data ONTAP file systems that are within a specified region. -It will skip any file systems where the credentials aren't provided in the supplied AWS SecretManager's secret, or that do not have -the appropriate NAS auditing configuration enabled. -It will maintain a "stats" file in an S3 bucket that will keep track of the last time it successfully ingested audit logs from each -SVM to try to ensure it doesn't process an audit file more than once. -You can run this script as a standalone program or as a Lambda function. These directions assume you are going to run it as a Lambda function. -**NOTE**: There are two ways to install this program. Either with the [CloudFormaiton script](cloudformation-template.yaml) found this this repo, -or by following the manual instructions found in the this file. - -## Prerequisites -- An FSx for Data ONTAP file system. -- An S3 bucket to store the "stats" file and optionally a copy of all the raw NAS audit log files. It will also -hold a Lambda layer file needed to be able to an add Lambda Layer from a CloudFormation script. - - You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/ingest_nas_audit_logs_into_cloudwatch/lambda_layer.zip) - from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`. - - The "stats" file is maintained by the program. It is used to keep track of the last time the Lambda function - successfully ingested audit logs from each SVM. Its size will be small (i.e. less than a few megabytes). -- A CloudWatch log group to ingest the audit logs into. Each audit log file with get its own log stream within the log group. -- Have NAS auditing configured and enabled on the SVM within a FSx for Data ONTAP file system. **Ensure you have selected the XML format for the audit logs.** Also, -ensure you have set up a rotation schedule. The program will only act on audit log files that have been finalized, and not the "active" one. You can read this -[knowledge based article](https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/How_to_set_up_NAS_auditing_in_ONTAP_9) for instructions on how to setup NAS auditing. -- Have the NAS auditing configured to store the audit logs in a volume with the same name in all SVMs on all the FSx for Data ONTAP file -systems that you want to ingest the audit logs from. -- An AWS Secrets Manager secret that contains the credentials you want to use to obtain the NAS Audit logs with for all the FSxN file systems. - - The secret should be in the form of key/value pairs where the key is the file system ID and value is a dictionary with the keys `username` and `password`. For example: -```json - { - "fs-0e8d9172fa5411111": {"username": "fsxadmin", "password": "superSecretPassword"}, - "fs-0e8d9172fa5422222": {"username": "service_account", "password": "superSecretPassword"} - } -``` -- You have applied the necessary SACLs to the files you want to audit. The knowledge base article linked above provides guidance on how to do this. - -- AWS Endpoints. Since the Lambda function runs within your VPC it will not have access to the Internet, even if you can access the Internet from the Subnet it runs from. -Therefore, there needs to be an VPC endpoint for all the AWS services that the Lambda function uses. Specifically, the Lambda function needs to be able to access the following AWS services: - - FSx. - - Secrets Manager. - - CloudWatch Logs. - - S3 - Note that typically there is a Gateway type VPC endpoint for S3, so you should not need to create a VPC endpoint for S3. -- Role for the Lambda function. Create a role with the necessary permissions to allow the Lambda function to do the following: - - - - - - - - - - - - - - - -
ServiceActionsResources
Fsxfsx:DescribeFileSystems*
ec2DescribeNetworkInterfaces*
CreateNetworkInterfacearn:aws:ec2:<region>:<accountID>:*
DeleteNetworkInterface
CloudWatch LogsCreateLogGrouparn:aws:logs:<region>:<accountID>:log-group:*
CreateLogStream
PutLogEvents
s3 ListBucket arn:aws:s3:<region>:<accountID>:*
GetObjectarn:aws:s3:<region>:<accountID>:*/*
PutObject
Secrets Manager GetSecretValue arn:aws:secretsmanager:<region>:<accountID>:secret:<secretName>*
-Where: - -- <accountID> - is your AWS account ID. -- <region> - is the region where the FSx for ONTAP file systems are located. -- <secretName> - is the name of the secret that contains the credentials for the fsxadmin accounts. - -Notes: -- Since the Lambda function runs within your VPC it needs to be able to create and delete network interfaces. -- The AWS Security Group Policy builder incorrectly generates resource lines for the `CreateNetworkInterface` -and `DeleteNetworkInterface` actions. The correct resource line is `arn:aws:ec2:::*`. -- It needs to be able to create a log groups so it can create a log group for the diagnostic output from the Lambda function. -- Since the ARN of any Secrets Manager secret has random characters at the end of it, you must add the `*` at the end, or provide the full ARN of the secret. - -## Deployment -1. Create a Lambda deployment package by: - 1. Downloading the [ingest_nas_audit_logs.py](ingest_nas_audit_logs.py) file from this repository and placing it in an empty directory. - 1. Rename the file to `lambda_function.py`. - 1. Install a couple dependencies that aren't included with AWS's base Lambda runtime by executing the following command:
-`pip install --target . xmltodict requests_toolbelt`
- 1. Zip the contents of the directory into a zip file.
-`zip -r ingest_nas_audit_logs.zip .`
- -2. Within the AWS console, or using the AWS API, create a Lambda function with: - 1. Python 3.11, or higher, as the runtime. - 1. Set the permissions to the role created above. - 1. Under `Additional Configurations` select `Enable VPC` and select a VPC and Subnet that will have access to all the FSx for ONTAP -file system management endpoints that you want to gather audit logs from. Also, select a Security Group that allows TCP port 443 outbound. -Inbound rules don't matter since the Lambda function is not accessible from a network. - 1. Click `Create Function` and on the next page, under the `Code` tab, select `Upload From -> .zip file.` Provide the .zip file created by the steps above. - 1. From the `Configuration -> General` tab set the timeout to at least 30 seconds. You will may need to increase that if it has to -process a lot of audit entries and/or process a lot of SVMs. - -3. Configure the Lambda function by setting the following environment variables. For a Lambda function you do this by clicking on the `Configuration` tab and then the `Environment variables` sub tab. - - | Variable | Required| Description | - | --- | --- | --- | - | fsxRegion | Yes |The region where the FSx for ONTAP file systems are located. | - | s3BucketRegion |Yes | The region of the S3 bucket where the stats file is stored. | - | s3BucketName | Yes |The name of the S3 bucket where the stats file is stored. | - | copyToS3 | No| Set to `true` if you want to copy the raw audit log files to the S3 bucket.| - |fsxnSecretARNsFile|No|The name of a file within the S3 bucket that contains the Secret ARNs for each for the FSxN file systems. The format of the file should be just `=`. For example: `fs-0e8d9172fa5411111=arn:aws:secretsmanager:us-east-1:123456789012:secret:fsxadmin-abc123`| - |fileSystem1ID|No|The ID of the first FSxN file system to ingest the audit logs from.| - |fileSystem1SecretARN|No|The ARN of the secret that contains the credentials for the first FSx for Data ONTAP file system.| - |fileSystem2ID|No|The ID of the second FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem2SecretARN|No|The ARN of the secret that contains the credentials for the second FSx for Data ONTAP file system.| - |fileSystem3ID|No|The ID of the third FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem3SecretARN|No|The ARN of the secret that contains the credentials for the third FSx for Data ONTAP file system.| - |fileSystem4ID|No|The ID of the forth FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem4SecretARN|No|The ARN of the secret that contains the credentials for the forth FSx for Data ONTAP file system.| - |fileSystem5ID|No|The ID of the fifth FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem5SecretARN|No|The ARN of the secret that contains the credentials for the fifth FSx for Data ONTAP file system.| - | statsName | Yes| The name you want to use as the stats file. | - | logGroupName | Yes| The name of the CloudWatch log group to ingest the audit logs into. | - | volumeName | Yes| The name of the volume, on all the FSx for ONTAP file systems, where the audit logs are stored. | - - **NOTE:** You only need to set the `fsxnSecretARNsFile` or the `fileSystemXID` and `fileSystemXSecretARN` variables. - If both are provide, then the `fsxnSecretARNsFile` will be used and the `fileSystemXID` and `fileSystemXSecretARN` variables will be ignored. - -4. Test the Lambda function by clicking on the `Test` tab and then clicking on the `Test` button. You should see "Executing function: succeeded". -If not, click on the "Details" button to see what errors there are. - -5. After you have tested that the Lambda function is running correctly, add an EventBridge trigger to have it run periodically. -You can do this by clicking on the `Add Trigger` button within the AWS console on the Lambda page and selecting `EventBridge (CloudWatch Events)` -from the drop-down menu. You can then configure the schedule to run as often as you want. How often depends on how often you have -set up your FSx for ONTAP file systems to rotate audit logs, and how up-to-date you want the CloudWatch logs to be. - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2025 NetApp, Inc. All Rights Reserved. diff --git a/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README.md b/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README.md index 462dc18..0c5c7db 100644 --- a/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README.md +++ b/Monitoring/ingest_nas_audit_logs_into_cloudwatch/README.md @@ -4,186 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx-Audit-Logs-CloudWatch](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx-Audit-Logs-CloudWatch). Please refer to that repository for the latest updates. -# Ingest FSx for ONTAP NAS audit logs into CloudWatch - -## Overview -This sample demonstrates a way to ingest the NAS audit logs from an FSx for Data ONTAP file system into a CloudWatch log group -without having to NFS or CIFS mount a volume to access them. -It will attempt to gather the audit logs from all the SVMs within all the FSx for Data ONTAP file systems that are within a specified region. -It will skip any file systems where the credentials aren't provided in the supplied AWS SecretManager's secret, or that do not have -the appropriate NAS auditing configuration enabled. -It will maintain a "stats" file in an S3 bucket that will keep track of the last time it successfully ingested audit logs from each -SVM to try to ensure it doesn't process an audit file more than once. -You can run this script as a standalone program or as a Lambda function. These directions assume you are going to run it as a Lambda function. - -**NOTE**: There are two ways to install this program. Either with the [CloudFormation script](cloudformation-template.yaml) found this this repo, -or by following the manual instructions found in the [README-MANUEL.md](README-MANUAL.md) file. - -## Prerequisites -- An FSx for Data ONTAP file system. -- An S3 bucket to store the "stats" file and optionally a copy of all the raw NAS audit log files. It will also -hold a Lambda layer file needed to be able to an add Lambda Layer from a CloudFormation script. - - You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/ingest_nas_audit_logs_into_cloudwatch/lambda_layer.zip) - from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`. - - The "stats" file is maintained by the program. It is used to keep track of the last time the Lambda function successfully - ingested audit logs from each SVM. Its size will be small (i.e. less than a few megabytes). -- A CloudWatch log group to ingest the audit logs into. Each audit log file will get its own log stream within the log group. -- Have NAS auditing configured and enabled on the SVM within a FSx for Data ONTAP file system. **Ensure you have selected the XML format for the audit logs.** Also, -ensure you have set up a rotation schedule. The program will only act on audit log files that have been finalized, and not the "active" one. You can read this -[knowledge based article](https://kb.netapp.com/on-prem/ontap/da/NAS/NAS-KBs/How_to_set_up_NAS_auditing_in_ONTAP_9) for instructions on how to setup NAS auditing. -- Have the NAS auditing configured to store the audit logs in a volume with the same name in all SVMs on all the FSx for Data ONTAP file -systems that you want to ingest the audit logs from. -- An AWS Secrets Manager secret for each of the FSxN file systems you wish to ingest the audit logs from. The secret should have two keys `username` and `password`. For example: - ```json - { - "username": "fsxadmin", - "password": "superSecretPassword" - } - ``` - You can use the same secret for multiple file systems if the credentials are the same. -- You have applied the necessary SACLs to the files you want to audit. The knowledge base article linked above provides guidance on how to do this. - -**You can either create the following items before running the CloudFormation script, or allow it to create the items for you.** - -- AWS Endpoints. Since the Lambda function runs within your VPC it will have restrictions as to how it can access the Internet. -It will not be able to access the Internet from a "Public" subnet (i.e. one that has a Internet gateway attached it it.) It will, however, -be able to access the Internet through a Transit or a NAT gateway. So, if the subnets you plan to run this Lambda function from -don't have a Transit or NAT gateway then there needs to be an VPC AWS service endpoint for all the AWS services that this Lambda function uses. -Specifically, the Lambda function needs to be able to access the following AWS services: - - FSx. - - Secrets Manager. - - CloudWatch Logs. - - S3 - Note that typically there is a Gateway type VPC endpoint for S3, therefore you typically you don't need to create a VPC endpoint for S3. - - **NOTE**: That if you specify to have the CloudFormation template create an endpoint and one already exist, it will cause the CloudFormation script to fail. - -- Role for the Lambda function. Create a role with the necessary permissions to allow the Lambda function to do the following: - - - - - - - - - - - - - - - -
ServiceActionsResources
Fsxfsx:DescribeFileSystems*
ec2DescribeNetworkInterfaces*
CreateNetworkInterfacearn:aws:ec2:<region>:<accountID>:*
DeleteNetworkInterface
CloudWatch LogsCreateLogGrouparn:aws:logs:<region>:<accountID>:log-group:*
CreateLogStream
PutLogEvents
s3 ListBucket arn:aws:s3:<region>:<accountID>:*
GetObjectarn:aws:s3:<region>:<accountID>:*/*
PutObject
Secrets Manager GetSecretValue arn:aws:secretsmanager:<region>:<accountID>:secret:<secretName>*
-Where: - -- <accountID> - is your AWS account ID. -- <region> - is the region where the FSx for ONTAP file systems are located. -- <secretName> - is the name of the secret that contains the credentials for the fsxadmin accounts. **Note** that this -resource string, through the use of wild card characters, must include all the secrets that the Lambda function will access. -Or you must list each secret ARN individually. - -Notes: -- Since the Lambda function runs within your VPC it needs to be able to create and delete network interfaces. -- The AWS Security Group Policy builder incorrectly generates resource lines for the `CreateNetworkInterface` -and `DeleteNetworkInterface` actions. The correct resource line is `arn:aws:ec2:::*`. -- It needs to be able to create a log groups so it can create a log group for the diagnostic output from the Lambda function. -- Since the ARN of any Secrets Manager secret has random characters at the end of it, you must add the `*` at the end, or provide the full ARN of the secret. - -## Deployment -1. Download the [cloudformation-template.yaml](cloudformation-template.yaml) file from this repository. -1. Go to the CloudFormation page within the AWS console and click on the `Create stack -> With new resources` button. -1. Select the `Upload a template file` radio button and click on the `Choose file` button. Select the `cloudformation-template.yaml` that you downloaded in step 1. -1. Click on the `Next` button. -1. The next page will provide all the configuration parameters you can provide: - - |Parameter|Required|Description| - |---|---|--| - |Stack Name|Yes|The name of the CloudFormation stack. This can be anything, but since it is used as a suffix for some of the resources it creates, keep it under 40 characters.| - |volumeName|Yes|This is the name of the volume that will contain the audit logs. This should be the same on all SVMs on all the FSx for ONTAP file systems you want to ingest the NAS audit logs from.| - |checkInterval|Yes|The interval in minutes that the Lambda function will check for new audit logs. You should set this to match the rotate frequency you have set for your audit logs.| - |logGroupName|Yes|The name of the CloudWatch log group to ingest the audit logs into. This should have already been created based on your business requirements.| - |subNetIds|Yes|Select the subnets that you want the Lambda function to run in. Any subnet selected must have connectivity to all the FSxN file system management endpoints that you want to gather audit logs from.| - |lambdaSecruityGroupsIds|Yes|Select the security groups that you want the Lambda function associated with. The security group must allow outbound traffic on TCP port 443. Inbound rules don't matter since the Lambda function is not accessible from a network.| - |s3BucketName|Yes|The name of the S3 bucket where the stats file is stored. This bucket must already exist.| - |s3BucketRegion|Yes|The region of the S3 bucket resides.| - |copyToS3|No|If set to `true` it will copy the audit logs to the S3 bucket specified in `s3BucketName`.| - |createWatchdogAlarm|No|If set to `true` it will create a CloudWatch alarm that will alert you if the Lambda function throws in error.| - |snsTopicArn|No|The ARN of the SNS topic to send the alarm to. This is required if `createWatchdogAlarm` is set to `true`.| - |fsxnSecretARNsFile|No|The name of a file within the S3 bucket that contains the Secret ARNs for each for the FSxN file systems. The format of the file should have one line for each file system where it specifies the file system id, an equal sign, and then the Secret ARN to use. For example: `fs-0e8d9172fa5411111=arn:aws:secretsmanager:us-east-1:123456789012:secret:fsxadmin-abc123`| - |fileSystem1ID|No|The ID of the first FSxN file system to ingest the audit logs from.| - |fileSystem1SecretARN|No|The ARN of the secret that contains the credentials for the first FSx for Data ONTAP file system.| - |fileSystem2ID|No|The ID of the second FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem2SecretARN|No|The ARN of the secret that contains the credentials for the second FSx for Data ONTAP file system.| - |fileSystem3ID|No|The ID of the third FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem3SecretARN|No|The ARN of the secret that contains the credentials for the third FSx for Data ONTAP file system.| - |fileSystem4ID|No|The ID of the forth FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem4SecretARN|No|The ARN of the secret that contains the credentials for the forth FSx for Data ONTAP file system.| - |fileSystem5ID|No|The ID of the fifth FSx for Data ONTAP file system to ingest the audit logs from.| - |fileSystem5SecretARN|No|The ARN of the secret that contains the credentials for the fifth FSx for Data ONTAP file system.| - |lambdaRoleArn|No|The ARN of the role that the Lambda function will use. If not provided, the CloudFormation script will create a role for you.| - |schedulerRoleArn|No|The ARN of the role that the EventBridge scheduler will run as. If not provided, the CloudFormation script will create a role for you.| - |createFsxEndpoint|No|If set to `true` it will create the VPC endpoints for the FSx service| - |createCloudWatchLogsEndpoint|No|If set to `true` it will create the VPC endpoints for the CloudWatch Logs service| - |createSecretsManagerEndpoint|No|If set to `true` it will create the VPC endpoints for the Secrets Manager service| - |createS3Endpoint|No|If set to `true` it will create the VPC endpoints for the S3 service| - |routeTableIds|No|If creating an S3 gateway endpoint, these are the routing tables you want updated to use the endpoint.| - |vpcId|No|This is the VPC that the endpoint(s) will be created in. Only needed if you are creating an endpoint.| - |endpointSecurityGroupIds|No|The security group that the endpoint(s) will be associated with. Must allow incoming TCP traffic over port 443. Only needed if you are creating an endpoint.| - - **Note**: You must either provide the fsxnSecretARNsFile or the fileSystem1ID, fileSystem1SecretARN, fileSystem2ID, fileSystem2SecretARN, etc. parameters. - -6. Click on the `Next` button. -7. The next page will provide for some additional configuration options. You can leave these as the default values. -At the bottom of the page, there is a checkbox that you must check to allow the CloudFormation script to create the -necessary IAM roles and policies. Note that if you have provided the ARNs to the two required roles, then the -CloudFormation script will not create any roles. -8. Click on the `Next` button. -9. The next page will provide a summary of the configuration you have provided. Review it to ensure it is correct. -10. Click on the `Create stack` button. - -## After deployment tasks -### Confirm that the Lambda function is ingesting audit logs. -After the CloudFormation deployment has completed, go to the "resource" tab of the CloudFormation stack and click on the Lambda function hyperlink. -This will take you to the Lambda function's page. -Click on the Monitoring sub tab and then click on "View CloudWatch logs". This will take you to the CloudWatch log group where the Lambda function -writes its diagnostic output to. You should see a log stream. If you don't, wait a few minutes, and then refresh the page. If you still don't -see a log stream, check the Lambda function's configuration to ensure it is correct. Once a log stream appears, click on it to see the diagnostic -output from the Lambda function. You should see log messages indicating that it is ingesting audit logs. If you see any "Errors" then you will -need to investigate and correct the issue. If you can't figure it out, please open an [issue](https://github.com/NetApp/FSx-ONTAP-samples-scripts/issues) in this repository. - -### Add more FSx for ONTAP file systems. -The way the program is written, it will automatically discover all FSxN file systems within a region, -and then all the vservers under that FSxN. So, if you add another FSxN it will automatically attempt -to ingest the audit files from all the vservers under it. Unfortunately, it won't be able to, until -you provide a Secret ARN for that file system. - -The best way to add a secret ARN, is to either update the secretARNs file you -initially passed to the CloudFormation script, that should be in the S3 bucket you specified in -the `s3BucketName` parameter, or create that file with the information for all the FSxN file systems -you want to ingest the audit logs from and then store it in the S3 bucket. See the description -of the `fsxnSecretARNsFile` parameter above for the format of the file. - -If you are creating the file for the first time, you'll also need to set the `fsxSecretARNsFile` environment variable -to point to the file. You can leave all the other parameters as they are, including the `fileSystem1ID`, `fileSystem1SecretARN`, etc. ones. -The program will ignore those parameters if the `fsxnSecretARNsFile` environment variable is set. To set -the environment variable, go to the Lambda function's configuration page and click on the "Configuration" tab. Then -click on the "Environment variables" sub tab. Click on the "Edit" button. The `fsxnSecretARNsFile` -environment variable should already be there, but the value should be blank. If the variable isn't there click on the -'add' button and add it. Once the line is there with the `fsxnSecretARNsFile` variable, set the value -to the name of the file you created. - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2025 NetApp, Inc. All Rights Reserved. diff --git a/Monitoring/monitor-ontap-services/README.md b/Monitoring/monitor-ontap-services/README.md index 9be8bf1..aefc437 100644 --- a/Monitoring/monitor-ontap-services/README.md +++ b/Monitoring/monitor-ontap-services/README.md @@ -3,525 +3,3 @@ Continuous development for this solution has moved to a separate GitHub repository found here [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx_Alerting/FSx_ONTAP_Alerting](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/FSx_Alerting/FSx_ONTAP_Alerting). Please refer to that repository for the latest updates. - -# Monitoring ONTAP Services - -## Introduction -This program is used to monitor various services of an AWS FSx for NetApp ONTAP file system and alert you if anything -is outside of the specified conditions. It uses the ONTAP APIs to obtain the required information to -determine if any of the conditions that are being monitored have been met. -If they have, then the program will send an SNS message to the specified SNS topic. The program can also send -a syslog message to a syslog server as well as store the event information into a CloudWatch Log Stream. -The program will store the event information in an S3 bucket so that it can be compared against to ensure -it doesn't send multiple messages for the same event. You can configure the program either via environment variables -or via a configuration file. The configuration file is kept in the S3 bucket for easy access. - -Here is an itemized list of the services that this program can monitor: -- If the file system is available. -- If the underlying Data ONTAP version has changed. -- If the file system is running off its partner node (i.e. is running in failover mode). -- If any of the network interfaces are down. -- Any EMS message. Filtering is provided to allow you to only be alerted on the EMS messages you care about. -- If any of the vservers are down. -- If any of the protocol (NFS & CIFS) servers within a vserver are down. -- If a SnapMirror relationship hasn't been updated within either a specified amount of time or as a percentage of time since its last scheduled update. -- If a SnapMirror update has stalled. -- If a SnapMirror relationship is in a "non-healthy" state. -- If the aggregate is over a certain percentage full. You can set two thresholds (Warning and Critical). -- If a volume is over a certain percentage full. You can set two thresholds (Warning and Critical). -- If a volume is using more than a specified percentage of its inodes. You can set two thresholds (Warning and Critical). -- If a volume if offline. -- If any quotas are over a certain percentage full. You can be alerted on both soft and hard limits. - -## Architecture -The program is designed to be run as a Lambda function but can be run as a standalone program. -As a Lambda function it can be set up to run on a regular basis by creating an EventBridge schedule. -Once the program has been invoked it will use the ONTAP APIs to obtain the required information -from the ONTAP system. It will compare this information against the conditions that have been -specified in the conditions files. If they have, then the program will send an SNS message -to the specified SNS topic and optionally, send a syslog message as well as put an event -into a CloudWatch log group. The program stores event information in an S3 bucket so it can ensure that it doesn't -send duplicate messages for the same event. The configuration file is also kept in the S3 bucket for easy access. - -Since the program must be able to communicate with the FSxN file system management endpoint, it must -run within a VPC that has connectivity to the FSxN file system. This requires special considerations for -a Lambda function, both how it is deployed, and how it is able to access AWS services. You can read more about -that in the [Endpoints for AWS Services](#endpoints-for-aws-services) section below. - -![Architecture](images/Monitoring_ONTAP_Services_Architecture-2.png) - -## Prerequisites -- An FSx for NetApp ONTAP file system you want to monitor. -- An S3 bucket to store the configuration and event status files, as well as the Lambda layer zip file. - - You will need to download the [Lambda layer zip file](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor-ontap-services/lambda_layer.zip) from this repo and upload it to the S3 bucket. Be sure to preserve the name `lambda_layer.zip`. -- The security group associated with the FSx for ONTAP file system must allow inbound traffic from the Lambda function over TCP port 443. -- An SNS topic to send the alerts to. -- An AWS Secrets Manager secret that holds the FSx for ONTAP file system credentials. There should be two keys in the secret, one for the username and one for the password. -- Optionally: - - A CloudWatch Log Group to store events. - - A syslog server to receive event messages. - -## Installation -There are two ways to install this program. You can either perform all the steps shown in the -[Manual Installation](#manual-installation) section below, or run the [CloudFormation template](cloudformation.yaml) -that is provided in this repository. The manual installation is more involved, but it gives you -more control and allows you to make changes to settings that aren't available through the CloudFormation template. -The CloudFormation template is easier to use, but it doesn't allow for as much customization. - -### Installation using the CloudFormation template -The CloudFormation template will do the following: -- Create a role for the Lambda function to use. The permissions will be the same as what - is outlined in the [Create an AWS Role](#create-an-aws-role) section below. - **NOTE:** You can provide the ARN of an existing role to use instead of having it create a new one. -- Create a role that allows the EventBridge schedule to trigger the Lambda function. - The only permission that this role needs is to be able to invoke a Lambda function. - **NOTE:** You can provide the ARN of an existing role to use instead of having it create a new one. -- Create the Lambda function with the Python code provided in this repository. -- Create an EventBridge Schedule to trigger the Lambda function. By default, it will trigger - it to run every 15 minutes, although there is a parameter that will allow you to set it to whatever interval you want. -- Optionally create a CloudWatch alarm that will alert you if the Lambda function fails. - - Create a Lambda function to send the CloudWatch alarm alert to an SNS topic. This is done so the SNS topic can be in another region since CloudWatch doesn't support doing that natively. -- Optionally create a VPC Endpoints for the SNS, Secrets Manager, CloudWatch and/or S3 AWS services. - -To install the program using the CloudFormation template, you will need to do the following: -1. Download the CloudFormation template from this repository. You can do that by clicking on - the [cloudformation.yaml](./cloudformation.yaml) file in the repository, then clicking on - the download icon next to the "Raw" button at the top right of the page. That should - cause your browser to download the file to your local computer. -2. Go to the [CloudFormation service in the AWS console](https://us-west-2.console.aws.amazon.com/cloudformation/) and click on "Create stack (with new resources)". -3. Choose the "Upload a template file" option and select the CloudFormation template you downloaded in step 1. -4. This should bring up a new window with several parameters to provide values to. Most have - defaults, but some do require values to be provided. See the list below for what each parameter is for. - -|Parameter Name | Notes| -|---|---| -|Stackname|The name you want to assign to the CloudFormation stack. Note that this name is used as a base name for some of the resources it creates, so please keep it **under 25 characters**.| -|OntapAdminServer|The DNS name, or IP address, of the management endpoint of the FSxN file system you wish to monitor.| -|S3BucketName|The name of the S3 bucket where you want the program to store event information. It should also have a copy of the `lambda_layer.zip` file. **NOTE** This bucket must be in the same region where this CloudFormation stack is being created.| -|SubnetIds|The subnet IDs that the Lambda function will be attached to. They must have connectivity to the FSxN file system management endpoint that you wish to monitor. It is recommended to select at least two.| -|SecurityGroupIds|The security group IDs that the Lambda function will be attached to. The security group must allow outbound traffic over port 443 to the SNS, Secrets Manager, and CloudWatch and S3 AWS service endpoints, as well as the FSxN file system you want to monitor.| -|SnsTopicArn|The ARN of the SNS topic you want the program to publish alert messages to.| -|CloudWatchLogGroupARN|The ARN of **an existing** CloudWatch Log Group that the Lambda function can send event messages to. It will create a new Log Stream within the Log Group every day that is unique to this file system so you can use the same Log Group for multiple instances of this program. If this field is left blank, alerts will not be sent to CloudWatch.| -|SecretArn|The ARN of the secret within the AWS Secrets Manager that holds the FSxN file system credentials.| -|SecretUsernameKey|The name of the key within the secret that holds the username portion of the FSxN file system credentials. The default is 'username'.| -|SecretPasswordKey|The name of the key within the secret that holds the password portion of the FSxN file system credentials. The default is 'password'.| -|CheckInterval|The interval, in minutes, that the EventBridge schedule will trigger the Lambda function. The default is 15 minutes.| -|CreateCloudWatchAlarm|Set to "true" if you want to create a CloudWatch alarm that will alert you if the monitoring Lambda function fails.| -|ImplementWatchdogAsLambda|If set to "true" a Lambda function will be created that will allow the CloudWatch alarm to publish an alert to an SNS topic in another region. Only necessary if the SNS topic is in another region since CloudWatch cannot send alerts across regions.| -|WatchdogRoleArn|The ARN of the role assigned to the Lambda function that the watchdog CloudWatch alarm will use to publish SNS alerts with. The only required permission is to publish to the SNS topic listed above, although highly recommended that you also add the AWS managed "AWSLambdaBasicExecutionRole" policy that allows the Lambda function to create and write to a CloudWatch log stream so it can provide diagnostic output of something goes wrong. Only required if creating a CloudWatch alert, implemented as a Lambda function, and you want to provide your own role. If left blank a role will be created for you if needed.| -|LambdaRoleArn|The ARN of the role that the Lambda function will use. This role must have the permissions listed in the [Create an AWS Role](#create-an-aws-role) section below. If left blank a role will be created for you.| -|CreateSecretsManagerEndpoint|Set to "true" if you want to create a Secrets Manager endpoint. **NOTE:** If a SecretsManager Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.| -|CreateSNSEndpoint|Set to "true" if you want to create an SNS endpoint. **NOTE:** If a SNS Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.| -|CreateCWEndpoint|Set to "true" if you want to create a CloudWatch endpoint. **NOTE:** If a CloudWatch Endpoint already exist for the specified Subnet the endpoint creation will fail, causing the entire CloudFormation stack to fail. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.| -|CreateS3Endpoint|Set to "true" if you want to create an S3 endpoint. **NOTE:** If a S3 Gateway Endpoint already exist for the specified VPC the endpoint creation will fail, causing the entire CloudFormation stack to fail. Note that this will be a "Gateway" type endpoint, since they are free to use. Please read the [Endpoints for AWS services](#endpoints-for-aws-services) for more information.| -|RoutetableIds|The route table IDs to update to use the S3 endpoint. Since the S3 endpoint is of type `Gateway` route tables have to be updated to use it. This parameter is only needed if you are creating an S3 endpoint.| -|VpcId|The ID of a VPC where the subnets provided above are located. Required if you are creating an endpoint, not needed otherwise.| -|EndpointSecurityGroupIds|The security group IDs that the endpoint will be attached to. The security group must allow traffic over TCP port 443 from the Lambda function. This is required if you are creating an Lambda, CloudWatch or SecretsManager endpoint.| - -The remaining parameters are used to create the matching conditions configuration file, which specify when the program will send an alert. -You can read more about it in the [Matching Conditions File](#matching-conditions-file) section below. All these parameters have reasonable default values -so you probably won't have to change any of them. Note that if you enable EMS alerts, then the default rule will -alert on all EMS messages that have a severity of `Error`, `Alert` or `Emergency`. You can change the -matching conditions at any time by updating the matching conditions file that is created in the S3 bucket. -The name of the file will be `-conditions` where `` is the value you -set for the OntapAdminServer parameter. - -### Post Installation Checks -After the stack has been created, check the status of the Lambda function to make sure it is -not in an error state. To find the Lambda function go to the Resources tab of the CloudFormation -stack and click on the "Physical ID" of the Lambda function. This should bring you to the Lambda service in the AWS -console. Once there, click on the "Monitor" tab to see if the function has been invoked. Note that it will take -at least the configured iteration time before the function is invoked for the first time. Locate the -"Error count and success rate(%)" chart, which is usually found at the top right corner of the "Monitor" dashboard. -After the "CheckInterval" number of minutes there should be at least one dot on that chart. -Hover your mouse over the dot and you should see the "success rate" and "number of errors." -The success rate should be 100% and the number of errors should be 0. If it is not, then scroll up a little bit and -click on "View CloudWatch Logs" link. Once on this page, click on the first LogStream and review any output. -If there are any errors, they will be displayed there. If you can't figure out what is causing an error, -please create an issue on the [Issues](https://github.com/NetApp/FSx-ONTAP-samples-scripts/issues) section -in this repository and someone will help you. - ---- - -### Manual Installation -If you want more control over the installation then you can install it manually by following the steps below. Note that these -instructions assume you have familiarity with how to create the various AWS service mentioned below. If you do not, -the recommended course of action is to use the CloudFormation method of deploying the program. Then, if you need to change things, -you can make the required modifications using the information found below. - -#### Create an AWS Role -This program doesn't need many AWS permissions. It just needs to be able to read the FSxN file system credentials stored in a Secrets Manager secret, -read and write objects in an s3 bucket, be able to publish SNS messages, and optionally create CloudWatch log Streams and put events. -Below is the specific list of permissions needed. - -| Permission | Reason | -|:------------------------------|:----------------| -|secretsmanager:GetSecretValue | To be able to retrieve the FSxN administrator credentials.| -|sns:Publish | To allow it to send messages (alerts) via SNS.| -|s3:PutObject | So it can store its state information in various s3 objects.| -|s3:GetObject | So it can retrieve previous state information, as well as configuration files, from various s3 objects. | -|s3:ListBucket | So it can detect if an object exist or not. | -|logs:CreateLogStream | If you want the program to send its logs to CloudWatch, it needs to be able to create a log stream. | -|logs:PutLogEvents | If you want the program to send its logs to CloudWatch, it needs to be able to put log events into the log stream. | -|logs:DescribeLogStreams | If you want the program to send its logs to CloudWatch, it needs to be able to see if a log stream already exists before attempting to send events to it. | -|ec2:CreateNetworkInterface | Since the program runs as a Lambda function within your VPC, it needs to be able to create a network interface in your VPC. You can read more about that [here](https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html). | -|ec2:DeleteNetworkInterface | Since it created a network interface, it needs to be able to delete it when not needed anymore. | -|ec2:DescribeNetworkInterfaces | So it can check to see if a network interface already exists. | - -#### Create an S3 Bucket -The first use of the s3 bucket will be to store the Lambda layer zip file. This is required to include some dependencies that -aren't included in the AWS Lambda environment. Currently the only dependency in the zip file is [cronsim](https://pypi.org/project/cronsim/). -This is used to interpret the SnapMirror schedules to be able to report on lag issues. You can download the zip file from this repository by clicking on -the [lambda_layer.zip](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor-ontap-services/lambda_layer.zip) link. -You will refer to this file, and bucket, when you create the Lambda function. - -Another use of the s3 bucket is to store events that have already reported on so they can be compared against -to ensure program does not send multiple messages for the same event. -Note that it doesn't keep every event indefinitely, it only stores them while the condition is true. So, say for -example it sends an alert for a SnapMirror relationship that has a lag time that is too long. It will -send the alert and store the event. Once a successful SnapMirror synchronization has happened, the event will be removed -from the s3 object allowing for a new event to be created and alerted on. If you want to keep the event information -longer than that, please configure the program to store them in a CloudWatch log group. - -So, for the program to function, you will need to provide an S3 bucket for it to store event history. It is recommended to -have a separate bucket for each deployment of this program. However, that isn't required, since you can -specify the object names for the event file and therefore you could manually ensure that each instance of the Lambda function doesn't -overwrite the event files of another instance. - -This bucket is also used to store the Matching Condition file. You can read more about it in the [Matching Conditions File](#matching-conditions-file) below. - -#### Create an SNS Topic -Since the way this program sends alerts is via an SNS topic, you need to either create SNS topic, or use an -existing one. - -#### Create a Secrets Manager Secret -Since the program issues API calls to the FSxN file system, it needs to be able to authenticate itself to the FSxN file system. -The safest way to provide credentials to the program is to use the AWS Secrets Manager service. Therefore, a Secrets Manager -secret must be created that contains the FSxN file system credentials. The secret should contain two keys, one for the -username and one for the password. - -The following command will create the secret for you. Just replace the values in the command with your own. - -```bash -aws secretsmanager create-secret --name --secret-string '{"username":"","password":""}' -``` - -#### Create a CloudWatch Log Group -If you want the program to send its logs to CloudWatch, you will need to create a CloudWatch Log Group for it to -send its logs to. The program will create a new log stream within the log group every day that is unique to the file system. -This step is optional if you don't want to send the logs to CloudWatch. - -#### Endpoints for AWS Services -If you deploy this program as a Lambda function, you will have to run it within a VPC that has connectivity to the FSxN file -system that you want to monitor. The Lambda function will also need access to the AWS service endpoints for the AWS services -that it uses (S3, SNS, CloudWatch, and SecretsManager). Access to these service endpoints is typically routed through the Internet, -however, because of the way AWS gives Lambda access to your subnet, it will not be allowed access to the Internet through an Internet Gateway -but it will allow access to the Internet through a NAT Gateway. Therefore, in order to allow access to these service endpoints you'll need to do one of the following: -- Deploy the Lambda function in a subnet that does **not** have an Internet Gateway, yet does have access - to the Internet via a NAT Gateway or through a Transit Gateway. -- Deploy AWS Service endpoint for each of the AWS services that the Lambda function uses. - -If you do need to deploy AWS service endpoints, keep the following in mind: -- Since VPC endpoints can't traverse AWS regions, all AWS assets (e.g. FSx for ONTAP file system, SecretsManager Secret, S3 bucket, - SNS Topic and CloudWatch Log Group) must be in the same region as the VPC endpoint. -- For interface type endpoints, a network interface will be created in the VPC's subnet to allow access. - To transparently access the service via the VPC endpoint, AWS will update its - DNS entry for the service endpoint to point to the IP address of the VPC interface endpoint. This is only - done if the "Private DNS names" option is enabled for the endpoint and "DNS Hostnames" is enabled for the subnet. - If those two options aren't possible, and/or you aren't using AWS's DNS resolver, then you can set the - following configuration parameters to override the hostname that the program - uses for the specified AWS service endpoints. You will want to set the parameter to the DNS hostname for the - endpoint. It typically starts with 'vpce'. - - |Configuration Parameter|AWS Service Name| - |---|---| - |snsEndPointHostname|Simple Notification Service (SNS)| - |secretsManagerEndPointHostname|Secrets Manager| - |cloudWatchEndPointHostname|CloudWatch Logs| -- For the S3 service endpoint it is best to deploy a "Gateway" type endpoint, since they are free. For the other - services they will have to be of type "interface." - - In order for a gateway type service endpoint to be accessible, a route has to be created in the - subnet's routing table. - -**NOTE:** One indication that the Lambda function doesn't have access to an AWS service endpoint is if the Lambda function -times out, even after adjusting the timeout to more than several minutes. - -#### Lambda Function -There are a few things you need to do to properly configure the Lambda function. -- Assign it the role you created above. -- Put it in a VPC and subnet that has access to the FSxN file system management endpoint. -- Assign it the security group that allows outbound traffic over TCP port 443 to the FSxN file system management endpoint. - -Once you have created the function you will be able to: -- Copy the Python code from the [monitor_ontap_service.py](monitor_ontap_services.py) - file found in this repository into the code box and deploy it. -- Add the Lambda layer to the function. You do this by first creating a Lambda layer then adding it to your function. - To create a Lambda layer go to the Lambda service page on the AWS console and click on the "Layers" - tab under the "Additional resources" section. Then, click on the "Create layer" button. - From there you'll need to provide a name for the layer, and the path to the - [lambda_layer.zip](https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor-ontap-services/lambda_layer.zip) - file that you should download from this repository. If you uploaded that into the S3 bucket you created above, then - just provide the S3 path to the file. For example, `s3://.s3..awsamzoneaws.com/lambda_layer.zip`. - Once you have the layer created, you can add it to your Lambda function by going to the Lambda - function in the AWS console, and clicking on the `Code` tab and scrolling down to the Layers section. - Click on the "Add a layer" button. From there you can select the layer you just created. -- Increase the total run time to at least 20 seconds. You might have to raise that if you have a lot - of components in your FSxN file system. However, if you have to raise it to more than a couple minutes - and the function still times out, then it could be an issue with the endpoint causing the calls to the - AWS services to hang. See the [Endpoints for AWS Services](#endpoints-for-aws-services) section above - for more information. -- Provide for the base configuration via environment variables and/or a configuration file. - See the [Configuration Parameters](#configuration-parameters) section below for more information. -- Create the "Matching Conditions" file, that specifies when the Lambda function should send alerts. - See the [Matching Conditions File](#matching-conditions-file) section below for more information. -- Once you have tested the function to ensure it works, set up an EventBridge Schedule - rule to trigger the function on a regular basis. - -##### Configuration Parameters -Below is a list of parameters that are used to configure the program. Some parameters are required to be set -while others are optional. Some of the optional ones are still required to be set to something but -will have a usable default value if the parameter is not explicitly set. For the parameters that aren't required to be -set via an environment variable, they can be set by creating a "configuration file" and putting the assignments -in it. The assignments should be of the form "parameter=value". The default filename for the configuration -file is what you set the OntapAdminServer variable to plus the string "-config". If you want to use a different -filename, then set the configFilename environment variable to the name of your choosing. - -**NOTE:** Parameter names are case sensitive. - -|Parameter Name | Required | Required as an Environment Variable | Default Value | Description | -|:--------------|:--------:|:-----------------------------------:|:--------------|:------------| -| s3BucketName | Yes | Yes | None | Set to the name of the S3 bucket where you want the program to store events to. It will also read the matching configuration file from this bucket. | -| s3BucketRegion | Yes | Yes | None | Set to the region where the S3 bucket is located. | -| OntapAdminServer | Yes | Yes | None | Set to the DNS name, or IP address, of the ONTAP server you wish to monitor. | -| configFilename | No | No | OntapAdminServer + "-config" | Set to the filename (S3 object) that contains parameter assignments. It's okay if it doesn't exist, as long as there are environment variables for all the required parameters. | -| emsEventsFilename | No | No | OntapAdminServer + "-emsEvents" | Set to the filename (S3 object) where you want the program to store the EMS events that it has alerted on. This file will be created as necessary. | -| smEventsFilesname | No | No | OntapAdminServer + "-smEvents" | Set to the filename (S3 object) where you want the program to store the SnapMirror that it has alerted on. This file will be created as necessary. | -| smRelationshipsFilename | No | No | OntapAdminServer + "-smRelationships" | Set to the filename (S3 object) where you want the program to store the SnapMirror relationships into. This file is used to track the number of bytes transferred so it can detect stalled SnapMirror updates. This file will be created as necessary. | -| storageEventsFilename | No | No | OntapAdminServer + "-storageEvents" | Set to the filename (S3 object) where you want the program to store the Storage Utilization events it has alerted on. This file will be created as necessary. | -| quotaEventsFilename | No | No | OntapAdminServer + "-quotaEvents" | Set to the filename (S3 object) where you want the program to store the Quota Utilization events it has alerted on. This file will be created as necessary. | -| vserverEventsFilename | No | No | OntapAdminServer + "-vserverEvents" | Set to the filename (S3 object) where you want the program to store the vserver events it has alerted on. This file will be created as necessary. | -| systemStatusFilename | No | No | OntapAdminServer + "-systemStatus" | Set to the filename (S3 object) where you want the program to store the overall system status information into. This file will be created as necessary. | -| snsTopicArn | Yes | No | None | Set to the ARN of the SNS topic you want the program to publish alert messages to. | -| cloudWatchLogGroupName | No | No | None | The name of **an existing** CloudWatch log group that the Lambda function will also send alerts to. If left blank, alerts will not be sent to CloudWatch.| -| conditionsFilename | Yes | No | OntapAdminServer + "-conditions" | Set to the filename (S3 object) where you want the program to read the matching condition information from. | -| secretArn | Yes | No | None | Set to the ARN of the secret within the AWS Secrets Manager that holds the FSxN credentials. | -| secretUsernameKey | Yes | No | None | Set to the key name within the AWS Secrets Manager secret that holds the username portion of the FSxN credentials. | -| secretPasswordKey | Yes | No | None | Set to the key name within the AWS Secrets Manager secret that holds the password portion of the FSxN credentials. | -| snsEndPointHostname | No | No | None | Set to the DNS hostname assigned to the SNS endpoint. Only needed if you had to create a VPC endpoint for the SNS service. | -| secretsManagerEndPointHostname | No | No | None | Set to the DNS hostname assigned to the SecretsManager endpoint created above. Only needed if you had to create a VPC endpoint for the Secrets Manager service.| -| cloudWatchLogsEndPointHostname | No | No | None | Set to the DNS hostname assigned to the CloudWatch Logs endpoint created above. Only needed if you had to create a VPC endpoint for the Cloud Watch Logs service| -| syslogIP | No | No | None | Set to the IP address (or DNS hostname) of the syslog server where you want alerts sent to.| - -##### Matching Conditions File -The Matching Conditions file allows you to specify which events you want to be alerted on. The format of the -file is JSON. JSON is basically a series of "key" : "value" pairs. Where the value can be object that also has -"key" : "value" pairs. For more information about the format of a JSON file, please refer to this [page](https://www.json.org/json-en.html). -The JSON schema in this file is made up of an array of objects, with a key name of "services". Each element of the "services" array -is an object with at least two keys. The first key is “name" which specifies the name of the service it is going to provide -matching conditions (rules) for. The second key is "rules" which is an array of objects that provide the specific -matching condition. Note that each service's rules has its own unique schema. The following is the definition of each service's schema. - -###### Matching condition schema for System Health (systemHealth) -Each rule should be an object with one, or more, of the following keys: - -|Key Name|Value Type|Notes| -|---|---|---| -|versionChange|Boolean (true, false)|If `true` the program will send an alert when the ONTAP version changes. If it is set to `false`, it will not report on version changes.| -|failover|Boolean|If 'true' the program will send an alert if the FSxN cluster is running on its standby node. If it is set to `false`, it will not report on failover status.| -|networkInterfaces|Boolean|If 'true' the program will send an alert if any of the network interfaces are down. If it is set to `false`, it will not report on any network interfaces that are down.| - -###### Matching condition schema for EMS Events (ems) -Each rule should be an object with three keys, with an optional 4th key: - -|Key Name|Value Type|Notes| -|---|---|---| -|name|String|Regular expression that will match on an EMS event name.| -|message|String|Regular expression that will match on an EMS event message text.| -|severity|String|Regular expression that will match on the severity of the EMS event (debug, informational, notice, error, alert or emergency).| -|filter|String|If any event's message text match this regular express, then the EMS event will be skipped. Try to be as specific as possible to avoid unintentional filtering. This key is optional.| - -Note that all values to each of the keys are used as a regular expressions against the associated EMS component. For -example, if you want to match on any event message text that starts with “snapmirror” then you would put `^snapmirror`. -The `^` character matches the beginning on the string. If you want to match on a specific EMS event name, then you should -anchor it with a regular express that starts with `^` for the beginning of the string and ends with `$` for the end of -the string. For example, `^arw.volume.state$`. For a complete explanation of the regular expression syntax and special -characters, please refer to the [Python documentation](https://docs.python.org/3/library/re.html). - -###### Matching condition schema for SnapMirror relationships (snapmirror) -Each rule should be an object with one, or more, of the following keys: - -|Key Name|Value Type|Notes| -|---|---|---| -|maxLagTime|Integer|Specifies the maximum allowable time, in seconds, since the last successful SnapMirror update before an alert will be sent. Only used if maxLagTimePercent hasn't been provide, or if the SnapMirror relationship, and the policy it is assigned to, don't have a schedule associated with them. Best practice is to provide both maxLagTime and maxLagTimePercent to ensure all relationships get monitored, in case a schedule gets accidentally removed.| -|maxLagTimePercent|Integer|Specifies the maximum allowable time, in terms of percent of the amount of time since the last scheduled SnapMirror update, before an alert will be sent. Should be over 100. For example, a value of 200 means 2 times the period since the last scheduled update and if that was supposed to have happen 1 hour ago, it would alert if the relationship hasn't been updated within 2 hours.| -|stalledTransferSeconds|Integer|Specifies the minimum number of seconds that have to transpire before a SnapMirror transfer will be considered stalled.| -|healthy|Boolean|If `true` will alert with the relationship is healthy. If `false` will alert with the relationship is unhealthy.| - -###### Matching condition schema for Storage Utilization (storage) -Each rule should be an object with one, or more, of the following keys: -|Key Name|Value Type|Notes| -|---|---|---| -|aggrWarnPercentUsed|Integer|Specifies the maximum allowable physical storage (aggregate) utilization (between 0 and 100) before an alert is sent.| -|aggrCriticalPercentUsed|Integer|Specifies the maximum allowable physical storage (aggregate) utilization (between 0 and 100) before an alert is sent.| -|volumeWarnPercentUsed|Integer|Specifies the maximum allowable volume utilization (between 0 and 100) before an alert is sent.| -|volumeCriticalPercentUsed|Integer|Specifies the maximum allowable volume utilization (between 0 and 100) before an alert is sent.| -|volumeWarnFilesPercentUsed|Integer|Specifies the maximum allowable volume files (inodes) utilization (between 0 and 100) before an alert is sent.| -|volumeCriticalFilesPercentUsed|Integer|Specifies the maximum allowable volume files (inodes) utilization (between 0 and 100) before an alert is sent.| -|offline:|Boolean|If `true` will alert if the volume is offline.| - -###### Matching condition schema for Quota (quota) -Each rule should be an object with one, or more, of the following keys: - -|Key Name|Value Type|Notes| -|---|---|---| -|maxHardQuotaSpacePercentUsed|Integer|Specifies the maximum allowable storage utilization (between 0 and 100) against the hard quota limit before an alert is sent.| -|maxSoftQuotaSpacePercentUsed|Integer|Specifies the maximum allowable storage utilization (between 0 and 100) against the soft quota limit before an alert is sent.| -|maxQuotaInodesPercentUsed|Integer|Specifies the maximum allowable inode utilization (between 0 and 100) before an alert is sent.| - -###### Matching condition schema for Vserver (vserver) -Each rule should be an object with one, or more, of the following keys: -|Key Name|Value Type|Notes| -|---|---|---| -|vserverState|Boolean|If `true` will alert if the vserver is not in `running` state.| -|nfsProtocolState|Boolean|If `true` will alert if the NFS protocol is not enabled on a vserver.| -|cifsProtocolState|Boolean|If `true` will alert if the CIFS protocol is enabled for a vserver but doesn't have an `online` status.| - -###### Example Matching conditions file: -```json -{ - "services": [ - { - "name": "systemHealth", - "rules": [ - { - "versionChange": true, - "failover": true - }, - { - "networkInterfaces": true - } - ] - }, - { - "name": "ems", - "rules": [ - { - "name": "^passwd.changed$", - "severity": "", - "message": "" - }, - { - "name": "", - "severity": "alert|emergency", - "message": "" - } - ] - }, - { - "name": "snapmirror", - "rules": [ - { - "maxLagTime": 86400 - "maxLagTimePercent": 200 - }, - { - "healthy": false - }, - { - "stalledTransferSeconds": 600 - } - ] - }, - { - "name": "storage", - "exceptions": [{"svm": "fsx", "name": "fsx_root"}], - "rules": [ - { - "aggrWarnPercentUsed": 80, - "aggrCriticalPercentUsed": 95 - }, - { - "volumeWarnPercentUsed": 85, - "volumeCriticalPercentUsed": 90 - }, - { - "volumeWarnFilesPercentUsed": 85, - "volumeCriticalFilesPercentUsed": 90 - } - ] - }, - { - "name": "storage", - "match": [{"svm": "fsx", "name": "fsx_root"}], - "rules": [ - { - "volumeWarnPercentUsed": 75, - "volumeCriticalPercentUsed": 85 - }, - { - "volumeWarnFilesPercentUsed": 80, - "volumeCriticalFilesPercentUsed": 90 - } - ] - }, - { - "name": "quota", - "rules": [ - { - "maxHardQuotaSpacePercentUsed": 95 - }, - { - "maxSoftQuotaSpacePercentUsed": 100 - }, - { - "maxQuotaInodesPercentUsed": 95 - } - ] - } - ] -} -``` -In the above example, it will alert on: - -- Any version change, including patch level, of the ONTAP O/S. -- If the system is running off of the standby node. -- Any network interfaces that are down. -- Any EMS message that has an event name of “passwd.changed”. -- Any EMS message that has a severity of "alert" or “emergency”. -- Any SnapMirror relationship with a lag time more than 200% the amount of time since its last scheduled update, if it has a schedule assoicated with it. - Otherwise, if the last successful update has been more than 86400 seconds (24 hours). -- Any SnapMirror relationship with a lag time more than 86400 seconds (24 hours). -- Any SnapMirror relationship that has a non-healthy status. -- Any SnapMirror update that hasn't had any flow of data in 600 seconds (10 minutes). -- If the cluster aggregate is more than 80% full. -- If the cluster aggregate is more than 95% full. -- If any volume, except for the 'fsx_root' volume in the 'fsx' SVM, that is more than 85% full. -- if any volume, except for the 'fsx_root' volume in the 'fsx' SVM, that is more than 90% full. -- if any volume, except for the 'fsx_root' volume in the 'fsx' SVM, that is using more than 85% of its inodes. -- if any volume, except for the 'fsx_root' volume in the 'fsx' SVM, that is using more than 90% of its inodes. -- If for the 'fsx_root' volume in the 'fsx SVM, when it is more than 75% full. -- if for the 'fsx_root' volume in the 'fsx SVM, when it is more than 85% full. -- if for the 'fsx_root' volume in the 'fsx SVM, when it is using more than 80% of its inodes. -- if for the 'fsx_root' volume in the 'fsx SVM, when it is using more than 90% of its inodes. -- If any quota policies where the space utilization is more than 95% of the hard limit. -- If any quota policies where the space utilization is more than 100% of the soft limit. -- If any quota policies are showing any inode utilization more than 95% - -A matching conditions file must be created and stored in the S3 bucket with the name given as the "conditionsFilename" -configuration variable. Feel free to use the example above as a starting point. Note that you should ensure it -is in valid JSON format, otherwise the program will fail to load the file. There are various programs and -websites that can validate a JSON file for you. - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. diff --git a/Monitoring/monitor_fsxn_with_harvest_on_ec2/README-Manual.md b/Monitoring/monitor_fsxn_with_harvest_on_ec2/README-Manual.md index 87ff813..e8ba555 100644 --- a/Monitoring/monitor_fsxn_with_harvest_on_ec2/README-Manual.md +++ b/Monitoring/monitor_fsxn_with_harvest_on_ec2/README-Manual.md @@ -4,366 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EC2](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EC2). Please refer to that repository for the latest updates. -# Deploy NetApp Harvest on EC2 - -Harvest installation for monitoring Amazon FSxN using Prometheus and Grafana stack, integrating AWS Secret Manager for FSxN credentials. - -## Introduction - -### What to Expect - -Harvest installation will result in the following: -* Install NetApp Harvest with the latest version on your EC2 instance. -* Collecting metrics about your FSxNs and adding existing Grafana dashboards for better visualization. - -### Prerequisites -* A FSx for ONTAP file system running in the same VPC as the EC2 instance. -* If not running an AWS based Linux, ensure that the `aws` command has been installed and configured. - -## Installation Steps - -### 1. Create AWS Secret Manager with Username and Password for each FSxN -Since this solution uses an AWS Secrets Manager secret to authenticate with the FSx for ONTAP file system -you will need to create a secret for each FSxN you want to monitor. You can use the following command to create a secret: - -```sh -aws secretsmanager create-secret --name --secret-string '{"username":"fsxadmin","password":""}' -``` - -### 2. Create Instance Profile with Permission to AWS Secret Manager and CloudWatch metrics - -#### 2.1. Create Policy - -Edit the harvest-policy.json file found in this repo with the ARN of the AWS Secret Manager secrets created above. -If you only have one FSxN and therefore only one secret, remove the comma after the one secret ARN (i.e. the last -entry should not have a comma after it). - -``` -{ - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "secretsmanager:GetSecretValue", - "secretsmanager:DescribeSecret", - "secretsmanager:ListSecrets" - ], - "Resource": [ - "", - "" - ] - }, - { - "Effect": "Allow", - "Action": [ - "tag:GetResources", - "cloudwatch:GetMetricData", - "cloudwatch:GetMetricStatistics", - "cloudwatch:ListMetrics", - "apigateway:GET", - "aps:ListWorkspaces", - "autoscaling:DescribeAutoScalingGroups", - "dms:DescribeReplicationInstances", - "dms:DescribeReplicationTasks", - "ec2:DescribeTransitGatewayAttachments", - "ec2:DescribeSpotFleetRequests", - "shield:ListProtections", - "storagegateway:ListGateways", - "storagegateway:ListTagsForResource", - "iam:ListAccountAliases" - ], - "Resource": [ - "*" - ] - } - ], - "Version": "2012-10-17" -} -``` - -Run the following command to create the policy and obtain the policy ARN: -```sh -POLICY_ARN=$(aws iam create-policy --policy-name harvest-policy --policy-document file://harvest-policy.json --query Policy.Arn --output text) -``` - -#### 2.2. Create Instance Profile Role - -Run the following commands to create the instance profile role and attach the policy to it: -```sh -aws iam create-role --role-name HarvestRole --assume-role-policy-document file://trust-policy.json -aws iam attach-role-policy --role-name HarvestRole --policy-arn $POLICY_ARN -aws iam create-instance-profile --instance-profile-name HarvestProfile -aws iam add-role-to-instance-profile --instance-profile-name HarvestProfile --role-name HarvestRole -``` - -Note that the `trust-policy.json` file can be found in this repo. - -### 3. Create EC2 Instance - -We recommend using a `t2.xlarge` or larger instance type with at least 20GB disk. - -Once you have created your ec2 instance, you can use the following command to attach the instance profile: - -```sh -aws ec2 associate-iam-instance-profile --instance-id --iam-instance-profile Arn=,Name=HarvestProfile -``` -You should get the instance profile ARN from step 2.2 above. - -If your exiting ec2 instance already had an instance profile, then simply add the policy create in step 2.2 above to its instance profile role. - -### 4. Install Docker and Docker Compose - -To install Docker use the following commands if you are running an Red Hat based Linux: -```sh -sudo yum install docker -sudo systemctl start docker -sudo systemctl enable docker -``` -If you aren't running a Red Hat based Linux, you can follow the instructions [here](https://docs.docker.com/engine/install/). - -Install Docker Compose: -```text -LATEST_COMPOSE_VERSION=$(curl -s https://api.github.com/repos/docker/compose/releases/latest | jq -r '.tag_name') -ARCH=$(uname -m) -if [ -z "$ARCH" -o -z "$LATEST_COMPOSE_VERSION" ]; then - echo "Error: Unable to determine latest version or architecture." -else - sudo curl -s -L "https://github.com/docker/compose/releases/download/$LATEST_COMPOSE_VERSION/docker-compose-linux-$ARCH" -o /usr/local/bin/docker-compose - sudo chmod +x /usr/local/bin/docker-compose - # Create a symlink in /usr/bin for more accessibility. - [ ! -L /usr/bin/docker-compose ] && sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose -fi -``` - -To confirm that docker has been installed correctly, run the following command: - -```sh -sudo docker run hello-world -``` - -You should get output similar to the following: -``` -Hello from Docker! -This message shows that your installation appears to be working correctly. - -To generate this message, Docker took the following steps: - 1. The Docker client contacted the Docker daemon. - 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. - (amd64) - 3. The Docker daemon created a new container from that image which runs the - executable that produces the output you are currently reading. - 4. The Docker daemon streamed that output to the Docker client, which sent it - to your terminal. - -To try something more ambitious, you can run an Ubuntu container with: - $ docker run -it ubuntu bash - -Share images, automate workflows, and more with a free Docker ID: - https://hub.docker.com/ - -For more examples and ideas, visit: - https://docs.docker.com/get-started/ -``` - -### 5. Install Harvest on EC2 - -Preform the following steps to install Harvest on your EC2 instance: - -#### 5.1. Generate Harvest Configuration File - -Modify the `harvest.yml` found in this repo with your clusters details. You should just have to change the `` with the IP addresses of your FSxNs. -Add as many pollers as you need to monitor all your FSxNs. There should be an AWS Secrets Manager secret for each FSxN. - -Note in the example below, it has places for two FSxN. If you only have one, remove the "fsx02" section. - -```yaml -Exporters: - prometheus1: - exporter: Prometheus - port_range: 12990-14000 - add_meta_tags: false -Defaults: - use_insecure_tls: true -Pollers: - fsx01: - datacenter: fsx - addr: - collectors: - - Rest - - RestPerf - - Ems - exporters: - - prometheus1 - credentials_script: - path: /opt/fetch-credentails - schedule: 3h - timeout: 10s - fsx02: - datacenter: fsx - addr: - collectors: - - Rest - - RestPerf - - Ems - exporters: - - prometheus1 - credentials_script: - path: /opt/fetch-credentails - schedule: 3h - timeout: 10s -``` - -#### 5.2. Generate a Docker Compose from Harvest Configuration - -Run the following command to generate a Docker Compose file from the Harvest configuration: - -```sh -docker run --rm \ - --env UID=$(id -u) --env GID=$(id -g) \ - --entrypoint "bin/harvest" \ - --volume "$(pwd):/opt/temp" \ - --volume "$(pwd)/harvest.yml:/opt/harvest/harvest.yml" \ - ghcr.io/netapp/harvest \ - generate docker full \ - --output harvest-compose.yml -``` - -:warning: Ignore the command that it outputs that it says will start the cluster. - -#### 5.3. Replace Harvest images in the harvest-compose.yml: - -Replace the Harvest image with one that supports using AWS Secret Manager for FSxN credentials: - -```yaml -sed -i 's|ghcr.io/netapp/harvest:latest|ghcr.io/tlvdevops/harvest-fsx:latest|g' harvest-compose.yml -``` - -#### 5.4. Add AWS Secret Manager Names to Docker Compose Environment Variables - -Edit the `harvest-compose.yml` file by adding the "environment" section for each FSxN with the two variables: `SECRET_NAME` and `AWS_REGION`. -These environment variables are required for the credentials script. - -For example: -```yaml -services: - fsx01: - image: ghcr.io/tlvdevops/harvest-fsx:latest - container_name: poller-fsx01 - restart: unless-stopped - ports: - - "12990:12990" - command: '--poller fsx01 --promPort 12990 --config /opt/harvest.yml' - volumes: - - ./cert:/opt/harvest/cert - - ./harvest.yml:/opt/harvest.yml - - ./conf:/opt/harvest/conf - environment: - - SECRET_NAME= - - AWS_REGION= - networks: - - backend -``` -#### 5.5. Download FSxN dashboards and import into Grafana container: -The following commands will download the FSxN designed dashboards from this repo and replace the default Grafana dashboards with them: -```yaml -wget https://raw.githubusercontent.com/NetApp/FSx-ONTAP-samples-scripts/main/Monitoring/monitor_fsxn_with_harvest_on_ec2/fsx_dashboards.zip -unzip fsx_dashboards.zip -rm -rf grafana/dashboards -mv dashboards grafana/dashboards -``` - -#### 5.6. Configure Prometheus to use yet-another-exporter (yace) to gather AWS FSxN metrics -AWS has useful metrics regarding the FSxN file system that ONTAP doesn't provide. Therefore, it is recommended to install -an exporter that will expose these metrics. The following steps show how to install a recommended exporter. - -##### 5.6.1 Create the yace configuration file. -Edit the `yace-config.yaml` file found in this repo and replace ``, in both places, with the region where your FSxN resides: -```yaml -apiVersion: v1alpha1 -sts-region: -discovery: - jobs: - - type: AWS/FSx - regions: [] - period: 300 - length: 300 - metrics: - - name: DiskReadOperations - statistics: [Average] - - name: DiskWriteOperations - statistics: [Average] - - name: DiskReadBytes - statistics: [Average] - - name: DiskWriteBytes - statistics: [Average] - - name: DiskIopsUtilization - statistics: [Average] - - name: NetworkThroughputUtilization - statistics: [Average] - - name: FileServerDiskThroughputUtilization - statistics: [Average] - - name: CPUUtilization - statistics: [Average] -``` - -##### 5.6.2 Add Yet-Another-Exporter to harvest-compose.yaml - -Copy the following to the end of the `harvest-compose.yml` file: -```yaml - yace: - image: quay.io/prometheuscommunity/yet-another-cloudwatch-exporter:latest - container_name: yace - restart: always - expose: - - 8080 - volumes: - - ./yace-config.yaml:/tmp/config.yml - - $HOME/.aws:/exporter/.aws:ro - command: - - -listen-address=:8080 - - -config.file=/tmp/config.yml - networks: - - backend -``` - -##### 5.6.3. Add Yet-Another-Exporter target to prometheus.yml: -```yaml -sudo sed -i -e "\$a\- job_name: 'yace'" -e "\$a\ static_configs:" -e "\$a\ - targets: ['yace:8080']" container/prometheus/prometheus.yml -``` - -##### 6. Bring Everything Up - -```sh -sudo docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans -``` - -After bringing up the prom-stack.yml compose file, you can access Grafana at -http://IP_OF_GRAFANA:3000. - -You will be prompted to create a new password the first time you log in. Grafana's default credentials are: -``` -username: admin -password: admin -``` - -## Adding additional FSx for ONTAP file systems. -If you need to add additional FSxN file systems to monitor after the initial installation, -you can do so by following the steps mentioned at the bottom of the [CloudFormation deployment](README.md) version of this read me file. - ---- - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-utils/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2025 NetApp, Inc. All Rights Reserved. diff --git a/Monitoring/monitor_fsxn_with_harvest_on_ec2/README.md b/Monitoring/monitor_fsxn_with_harvest_on_ec2/README.md index dd3989a..efcd9ad 100644 --- a/Monitoring/monitor_fsxn_with_harvest_on_ec2/README.md +++ b/Monitoring/monitor_fsxn_with_harvest_on_ec2/README.md @@ -4,198 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EC2](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EC2). Please refer to that repository for the latest updates. -# Harvest/Grafana Deployment using AWS CloudFormation -This is the NetApp FSx for ONTAP deployment for monitoring FSx for ONTAP file systems with Grafana. -The following solution leverages Harvest and YACE(Yet Another CloudWatch Exporter) as the exporters for ONTAP and CloudWatch metrics. - -YACE, or Yet Another CloudWatch Exporter, is a Prometheus exporter for AWS CloudWatch metrics. It is written in -Go and uses the official AWS SDK. YACE supports auto-discovery of resources via tags, structured logging, -filtering monitored resources via regex, and more. You can read more about YACE capabilities from its -[Documentation](https://github.com/prometheus-community/yet-another-cloudwatch-exporter). - -Here are some screenshots of a couple of the dashboards that are included to visualize the metrics collected by Harvest and YACE. - -![Screenshot-01](images/grafana-dashboard-01.png) - -![Screenshot-02](images/grafana-dashboard-02.png) - -## Prerequisites -The only prerequisite is an FSx for ONTAP file system running in your AWS account. - -## Architectural Overview - -This solution uses several components to collect and display all the pertinent metrics from your FSx for ONTAP file system. -Instead of trying to describe them in words, the following architectural diagram does a great job of showing the components and how they interact with each other: -![Architectural Diagram](images/FSxN-MonitoringStack-EC2.png) - -## Deployment Overview - -There are two methods to deploy this solution, either via the AWS CloudFormation template or manually. -The steps below are geared towards the CloudFormation deployment method. If you want to deploy manually, -please refer to these [Manual Deployment Instructions](README-Manual.md). - -This deployment includes: -- **Harvest**: Collects ONTAP metrics.[Documentation](https://github.com/NetApp/harvest). -- **Yet Another CloudWatch Exporter (YACE)**: Collects FSxN CloudWatch metrics.[Documentation](https://github.com/prometheus-community/yet-another-cloudwatch-exporter). -- **Prometheus**: Stores the metrics. -- **Grafana**: Visualizes the metrics. - -## Deployment Steps - -1. **Download the AWS CloudFormation Template file** - - Download the `harvest-grafana-cf-template.yaml` file from this repo. - -2. **Create the Stack** - - Open the AWS console and to the CloudFormation service page. - - Choose **Create stack** and select **With new resources**, - - Select **Choose an existing template** and **Upload a template file** - - Upload the `harvest-grafana-cf-template.yaml` file. - - Click **Next** - -3. **Specify Stack Details** - - **Parameters**: Review and modify the parameters as needed for your file system. The default values are: - - **InstanceType**: Select the instance type to run the Harvest+Grafana+Prometheus stack. You should allocate at least 2 vCPUs and 1GB of RAM for every 10 FSxN file systems you plan to monitor. The default is `t3.medium`. - - **KeyPair**: Specify the key pair to access the EC2 instance. - - **SecurityGroup**: Ensure inbound ports 22, 3000 and 9090 are open. - - **SubnetType**: Choose `public` or `private`. `Public` will allocated a public IP address to the EC2 instance. - - **Subnet**: Specify a subnet that will have connectivity to all the FSxN file systems you plan to monitor over TCP port 433. - - **InstanceAmiId**: Specify the Amazon Linux 2 AMI ID for the EC2 instance. The default is the latest version. - - **FSxEndPoint**: Specify the management endpoint IP address of your FSx file system. - - **SecretName**: Specify the AWS Secrets Manager secret name containing the password for the `fsxadmin` user. - -4. **Configure Stack Options** - - Click **Next** for stack options. - -5. **Review and Create** - - Review the stack details and confirm the settings. - - Select the check box to acknowledge that the template creates IAM resources. - - Choose **Create stack**. - -6. **Monitor Stack Creation** - - Monitor the status of the stack in the AWS CloudFormation console. The status should change to `CREATE_COMPLETE` in about five minutes. - -## Accessing Grafana - -- After the deployment is complete, log in to the Grafana dashboard using your browser: - - URL: `http://:3000` - - Default credentials: - - Username: `admin` - - Password: `admin` - - **Note**: You will be prompted to change your password upon the first login. - -## Supported Dashboards - -Amazon FSx for NetApp ONTAP exposes a different set of metrics than on-premises NetApp ONTAP. -Therefore, only the following out-of-the-box Harvest dashboards tagged with `fsx` are currently supported for use with FSx for ONTAP. -Some panels in these dashboards may be missing information that is not supported: - -- **FSxN_Clusters** -- **FSxN_CW_Utilization** -- **FSxN_Data_protection** -- **FSxN_LUN** -- **FSxN_SVM** -- **FSxN_Volume** - ---- - -## Monitor additional AWS FSx for NetApp ONTAP - -To monitor additional FSxN resources, follow these steps: - -1. **Log in via SSH to the EC2 instance** - -2. **Move to the Harvest Directory** - - Navigate to the Harvest directory: - ```bash - cd /opt/harvest - ``` - -3. **Configure Additional AWS FSx for NetApp ONTAP in `harvest.yml`** - - Edit the `harvest.yml` file to add the new AWS FSx for NetApp ONTAP configuration. For example: - - ```yaml - fsx02: - datacenter: fsx - addr: - collectors: - - Rest - - RestPerf - - Ems - exporters: - - prometheus1 - credentials_script: - path: /opt/fetch-credentials - schedule: 3h - timeout: 10s - ``` - -4. **Update `harvest-compose` with the Additional FSx for NetApp ONTAP** - - In the same directory, edit the `harvest-compose.yml` file to include the new FSx for NetApp ONTAP configuration: - - ```yaml - fsx02: - image: ghcr.io/tlvdevops/harvest-fsx:latest - container_name: poller-fsx02 - restart: unless-stopped - ports: - - "12991:12991" - command: '--poller fsx02 --promPort 12991 --config /opt/harvest.yml' - volumes: - - ./cert:/opt/harvest/cert - - ./harvest.yml:/opt/harvest.yml - - ./conf:/opt/harvest/conf - environment: - - SECRET_NAME= - - AWS_REGION= - ``` - - **Note**: Make the following changes for each system you add: - - - The name of the block (i.e. the first line of the block). - - The `container_name`. - - The `ports`. All pollers must use a different port. Just increment by one for each system. - - The `command` parameter should be updated with: - - The name after the `--poller` should match the block name. - - The `promPort` port should match the port in the `ports` line set above. - - The `SECRET_NAME` as needed. - -5. **Add FSx for NetApp ONTAP to Prometheus Targets** - - Navigate to the Prometheus directory: - ```bash - cd /opt/harvest/container/prometheus/ - ``` - - Edit the `harvest_targets.yml` file to add the new FSx for NetApp ONTAP target: - ```yaml - - targets: ['fsx01:12990','fsx02:12291'] - ``` - -6. **Restart Docker Compose** - - Navigate to the Harvest directory: - ```bash - cd /opt/harvest - ``` - - Bring down the Docker Compose stack: - ```bash - docker-compose -f prom-stack.yml -f harvest-compose.yml down - ``` - - Bring the Docker Compose stack back up: - ```bash - docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans - ``` - ---- - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2025 NetApp, Inc. All Rights Reserved. diff --git a/Monitoring/monitor_fsxn_with_harvest_on_eks/README.md b/Monitoring/monitor_fsxn_with_harvest_on_eks/README.md index 2cdcfad..197f2a0 100644 --- a/Monitoring/monitor_fsxn_with_harvest_on_eks/README.md +++ b/Monitoring/monitor_fsxn_with_harvest_on_eks/README.md @@ -4,396 +4,3 @@ Continuous development for this solution has moved to a separate GitHub reposito [https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EKS](https://github.com/NetApp/FSx-ONTAP-monitoring/tree/main/Grafana-Prometheus-FSx/Monitor-FSxN-with-Harvest-on-EKS). Please refer to that repository for the latest updates. -# Deploy NetApp Harvest on EKS - -This subfolder contains a Helm chart to install [NetApp Harvest](https://github.com/NetApp/harvest/blob/main/README.md) -into an AWS EKS cluster to monitor multiple FSx for ONTAP file systems using the -Grafana + Prometheus stack. It uses the AWS Secrets Manager to obtain -credentials for each of the FSxN file systems so those credentials aren't insecurely stored. - -Here are some screenshots of a couple of the dashboards that are included to visualize the metrics collected by Harvest and YACE. - -![Screenshot-01](images/grafana-dashboard-01.png) - -![Screenshot-02](images/grafana-dashboard-02.png) - -## Introduction - -### Architectural Overview - -This solution uses several components to collect and display all the pertinent metrics from your FSx for ONTAP file system. -Instead of trying to describe them in words, the following architectural diagram does a great job of showing the components and how they interact with each other: -![Architectural Diagram](images/FSxN-MonitoringStack-EKS.png) - -### What to expect - -Harvest Helm chart installation will result the following: -* Install the latest version of NetApp Harvest on to your EKS cluster. -* Each FSxN cluster will have its own Harvest poller in the EKS cluster. -* Collecting metrics about your FSxNs and adding existing Grafana dashboards for better visualization. - -### Integration with AWS Secrets Manager -This Harvest installation uses the AWS Secrets Manager to obtain the credentials for the each of FSxN file systems. -The format of the secret string should to be a json structure with a `username` and `password` keys. For example: -```json -{ - "username": "fsxadmin", - "password": "fsxadmin's_password" -} -``` -A service account should be created during the installation of Harvest with the sufficient permissions to fetch the secrets. - -### Prerequisites -* An AWS EKS cluster. -* An FSx for ONTAP file system with connectivity to the EKS cluster. - * If you don't have an EKS cluster with FSx for ONTAP file system, you can follow the steps in the [FSx as PVC for EKS](https://github.com/NetApp/FSx-ONTAP-samples-scripts/tree/add_grafana_eks/EKS/FSxN-as-PVC-for-EKS) repository to build one. -* `Helm` - for resources installation. -* `kubectl` - for managing Kubernetes resources. -* `eksctl` - for creating and managing EKS clusters. -* `jq` - for parsing JSON data in the command line. This is optional but recommended for some of the commands below. - -## Deployment - -### Deployment of Prometheus and Grafana -If you don't already have Prometheus and Grafana running in your EKS cluster, you can deploy both of them -using the Helm chart from the Prometheus community repository by using the following commands: - -:memo: **NOTE:** You need to make a substitution in the command below before running it. -```bash -helm repo add prometheus-community https://prometheus-community.github.io/helm-charts -helm repo update -helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace \ - --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \ - --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName= -``` -Where: -* \ is the storage class you want to use to store the data collected from the FSxN file systems. -If you don't care about persistent storage, you can omit the last two lines from the above command. - -The above will create a 50Gib PVC for Prometheus to use. You can adjust the size as needed. - -A successful installation should look like this: -``` -$ helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --create-namespace \ - --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \ - --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fsx-basic-nas -NAME: kube-prometheus-stack -LAST DEPLOYED: Fri Jul 26 22:57:04 2024 -NAMESPACE: prometheus -STATUS: deployed -REVISION: 1 -NOTES: -kube-prometheus-stack has been installed. Check its status by running: - kubectl --namespace prometheus get pods -l "release=kube-prometheus-stack" - -Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure the Alertmanager and Prometheus instances using the Operator. -``` -To check the status, you can run the following command: -```bash -kubectl get pods -n prometheus -``` -The output should look something like this: -```bash -$ kubectl get pods -n prometheus -NAME READY STATUS RESTARTS AGE -alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 51s -kube-prometheus-stack-grafana-86844f6b47-njw6n 3/3 Running 0 56s -kube-prometheus-stack-kube-state-metrics-7c8d64d446-rj4tv 1/1 Running 0 56s -kube-prometheus-stack-operator-85b765d6bc-ll5q2 1/1 Running 0 56s -kube-prometheus-stack-prometheus-node-exporter-7rtbp 1/1 Running 0 56s -kube-prometheus-stack-prometheus-node-exporter-ffckd 1/1 Running 0 56s -prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 50s -``` - -### Deployment of the Harvest Helm chart - -#### 1. Download the Harvest Helm chart -Download the Harvest helm chart by copying the contents of the 'harvest' directory found in this repo. The easiest -way to do that, is to simply clone the entire repo and change into the `harvest` directory: -```bash -git clone https://github.com/NetApp/FSx-ONTAP-samples-scripts.git -cd FSx-ONTAP-samples-scripts/Monitoring/monitor_fsxn_with_harvest_on_eks/harvest -``` -This custom Helm chart includes: -* `deplyment.yaml` - Harvest deployment using Harvest latest version image -* `harvest-config.yaml` - Harvest backend configuration -* `harvest-cm.yaml` - Environment variables configuration for credentials script. -* `service-monitor.yaml` - Prometheus ServiceMonitor for collecting Harvest metrics. - -:memo: **NOTE** You should not have to modify these files. - -#### 2. Specify which FSxNs to monitor - -The Helm chart supports monitoring multiple FSxNs. You can add multiples FSxNs by editing the `values.yaml` file -and updating the `clusters` section. The following is an example with two FSxNs. -``` -fsxs: - clusters: - - name: - managment_lif: - promPort: 12990 - secretName: - region: - - name: - managment_lif: - promPort: 12991 - secretName: - region: -``` -Of course replace the strings within the <> with your own values. - -:memo: **NOTE:** Each FSxN cluster must have unique port number for promPort between the range of 12990 and 14000. - -#### 3. Create AWS Secrets Manager for FSxN credentials -If you don't already have an AWS Secrets Manager secret with your FSxN credentials, you can create one using the AWS CLI. -``` -aws secretsmanager create-secret --region --name \ - --secret-string '{"USERNAME":"fsxadmin", "PASSWORD":""}' -``` -Replace `` with the actual password for the `fsxadmin` user on your FSxN file system. - -#### 4. Create Service Account with permissions to read the AWS Secrets Manager secrets - -##### 4a. Create Policy -The following IAM policy can be used to grant the all the permissions required by Harvest to fetch the secrets. -Note in this example, it has places to put two AWS Secrets Manager ARNs. You should add all the secret ARNs -for all the FSxN you plan to monitor. Typically on per FSxN, but it is okay to use the same secret for multiple -FSxNs as long as the credentials are the same. - -``` -{ - "Statement": [ - { - "Action": [ - "secretsmanager:GetSecretValue", - "secretsmanager:DescribeSecret", - "secretsmanager:ListSecrets" - ], - "Effect": "Allow", - "Resource": [ - "", - "" - ] - } - ], - "Version": "2012-10-17" -} -``` -Of course replace the strings within the <> with your own values. Save the edited policy in a file named `harvest-read-secrets-policy.json`. - -You can use the following command to create the policy: -```bash -POLICY_ARN=$(aws iam create-policy --policy-name harvest_read_secrets --policy-document file://harvest-read-secrets-policy.json --query Policy.Arn --output text) -``` -Note that this sets a variable named `POLICY_ARN` to the ARN of the policy that is created. -It is done this way to make it easy to pass that policy ARN when you create the service account in the next step. - -##### 4b. Create ServiceAccount -The following command will create a role, associated with the policy created above, and an Kubernettes service account that Harvest will run under: -``` -eksctl create iamserviceaccount --name harvest-sa --region= --namespace --role-name harvest-role --cluster --attach-policy-arn "$POLICY_ARN" --approve -``` -Of course replace all the strings within the <> with your own values. Note that the `` should -be where your Prometheus stack is deployed. If you used the command above to install Prometheus -then the namespace should be `prometheus`. - -#### 5. Install Harvest helm chart -Once you have update the values.yaml file, created the AWS Secrets Manager secrets, -and created the service account with permissions to read the secrets, you are ready to install the Harvest Helm chart -by running: -```text -helm upgrade --install harvest -f values.yaml ./ --namespace= --set promethues= -``` -Note that the `` should be where your Prometheus stack is deployed. If you used the command above to install Prometheus -then it will be `prometheus`. - -Once the deployment is complete, Harvest should be listed as a target on Prometheus. You can check that by running -the following commands. The first one sets up a port forwarder for port 9090 on your local machine to the Prometheus server running -in the EKS cluster as a background job. -```bash -kubectl port-forward -n prometheus prometheus-kube-prometheus-stack-prometheus-0 9090 & -sleep 4 # Give it a few seconds to establish the connection -curl -s http://localhost:9090/api/v1/targets | jq -r '.data.activeTargets[] | select(.labels.service[0:14] == "harvest-poller") | "\(.labels.service) Status = \(.health)"' -``` -It should list a status of 'up' for each of the FSxN clusters you are monitoring. For example: -``` -$ curl -s http://localhost:9090/api/v1/targets | jq -r '.data.activeTargets[] | select(.labels.service[0:14] == "harvest-poller") | "\(.labels.service) Status = \(.health)"' -Handling connection for 9090 -harvest-poller-dr Status = up -harvest-poller-prod Status = up -``` -You might have to give it a minute before getting an 'up' status. - -Once you have obtain the status, you don't need the "kubctl port-forward" command running anymore. You can stop it by running: -```bash -kill %?9090 -``` -That kills any background job that has 9090 in the command line, which the port forwarding command should have. - -### Import FSxN CloudWatch metrics into your monitoring stack using YACE -AWS CloudWatch provides metrics for the FSx for ONTAP file systems which cannot be collected by Harvest. -Therefore, we recommend to using the [yet-another-cloudwatch-exporter](https://github.com/prometheus-community/yet-another-cloudwatch-exporter) -(by Prometheus community) to collect these metrics. - -#### 1. Create Service Account with permissions to get AWS CloudWatch metrics -The following IAM policy can be used to grant the all permissions required by YACE to fetch the CloudWatch metrics: - -``` -{ - "Version": "2012-10-17", - "Statement": [ - { - "Action": [ - "tag:GetResources", - "cloudwatch:GetMetricData", - "cloudwatch:GetMetricStatistics", - "cloudwatch:ListMetrics", - "apigateway:GET", - "aps:ListWorkspaces", - "autoscaling:DescribeAutoScalingGroups", - "dms:DescribeReplicationInstances", - "dms:DescribeReplicationTasks", - "ec2:DescribeTransitGatewayAttachments", - "ec2:DescribeSpotFleetRequests", - "shield:ListProtections", - "storagegateway:ListGateways", - "storagegateway:ListTagsForResource" - ], - "Effect": "Allow", - "Resource": "*" - } - ] - } - -``` -The policy shown above is in a file named `yace-export-policy.json` in the repo. You shouldn't -have to modify the file so just run the following command in order to create the policy: -```bash -POLICY_ARN=$(aws iam create-policy --policy-name yace-exporter-policy --policy-document file://yace-exporter-policy.json --query Policy.Arn --output text) -``` -Note that this sets a variable named `POLICY_ARN` to the ARN of the policy that is created. -It is done this way to make it easy to pass that policy ARN when you create the service account in the next step. - -#### 2. Create the service account -The following command will create a role associated with the policy created above, and a Kubernetes service account that YACE will run under: - -```bash -eksctl create iamserviceaccount --name yace-exporter-sa --region= --namespace --role-name yace-cloudwatch-exporter-role --cluster --attach-policy-arn "$POLICY_ARN" --approve -``` -Of course replace the strings within the <> with your own values. Note, the overrides file below assumes the account -name is `yace-exporter-sa` so if you change it, you will need to update the overrides file accordingly. - -#### 3. Install yace-exporter helm chart -First add the nerdswords Helm repository to your local Helm client. This repository contains the YACE exporter chart. - -```bash -helm repo add nerdswords https://nerdswords.github.io/helm-charts -helm repo update -``` -Edit the `yace-override-values.yaml` file found in this repo by changing the prometheus release name in the ServiceMonitor section: -``` -serviceMonitor: - enabled: true - labels: - release: -``` -If you installed Prometheus using the previous steps, the release name will be `kube-prometheus-stack`. - -While editing that file, also update the region name, in both places, to FSxN's region in the "config" section: -``` - apiVersion: v1alpha1 - sts-region: - discovery: - jobs: - - type: AWS/FSx - regions: - - - period: 300 - length: 300 - metrics: - - name: DiskReadOperations - statistics: [Average] - - name: DiskWriteOperations - statistics: [Average] - - name: DiskReadBytes - statistics: [Average] - - name: DiskWriteBytes - statistics: [Average] - - name: DiskIopsUtilization - statistics: [Average] - - name: NetworkThroughputUtilization - statistics: [Average] - - name: FileServerDiskThroughputUtilization - statistics: [Average] -``` - -Finally, run the following command to install the yace-exporter helm chart: -```text -helm install yace-cw-exporter --namespace nerdswords/yet-another-cloudwatch-exporter -f yace-override-values.yaml -``` -Of course replace the strings within the <> with your own values. - -### Accessing Grafana -If you newly installed the Prometheus stack, that includes Grafana, you will need to provide a way of accessing it from the Kubernetes cluster. -One way to do that is to setup a "port-forward" from your local machine using the following command: - -```bash -kubectl --namespace prometheus port-forward svc/kube-prometheus-stack-grafana 3000:80 --address 0.0.0.0/0 & -``` - -This is okay for a test, but this method is not persistent and would force everyone to go through your local machine to access the Grafana dashboards. -To allow for more permanent access to Grafana, you should consider setting up an LoadBalancer service. -That can easily be done by running: -```bash -kubectl expose deployment kube-prometheus-stack-grafana --port=80 --target-port=3000 --name=load-balancer-service --type=LoadBalancer -``` - -This will create a AWS Elastic Load Balancer (ELB) in front of the Grafana service, which will allow you to access Grafana via the ELB's DNS name. -To get the DNS name, you can run the following command: -```bash -kubectl get svc load-balancer-service --namespace prometheus -``` -The output should be similar to this: -```bash -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -load-balancer-service LoadBalancer 172.20.85.44 ab130084a571d4e59afeabafb0477037-1196442078.us-west-2.elb.amazonaws.com 80:30611/TCP 56m -``` -The `EXTERNAL-IP` column will show the DNS name of the ELB that was created. You can use this DNS name to access Grafana from your web browser. -Once you have access to Grafana, you can log in using the default credentials: -* **Username:** `admin` -* **Password:** `prom-operator` - -### Adding Grafana dashboards and visualize your FSxN metrics on Grafana -Once you login, you'll want to import some dashboards to visualize the metrics collected by Harvest and YACE. You will find -some example dashboards in the [dashboards](dashboards) folder in this repository. You can import these dashboards into Grafana by following these steps: -1. Download the dashboards from the `dashboards` folder in this repository to your local PC. -1. Log in to your Grafana instance. -1. Click on the "+" icon on the left-hand side menu and select "Import Dashboard". -1. Click in the box with "Upload dashboard JSON file" and browse to one of the dashboard JSON files from the `dashboards` folder in this repository. -1. Click "Import." - -You can repeat the steps above for each of the dashboard JSON files you want to import. - -You can also import the "default" dashboards from the Harvest repo found [here](https://github.com/NetApp/harvest/tree/main/grafana/dashboards). -Only consider the dashboards in the `cmode` and `cmode-details` directories. - -:memo: **NOTE:** Since the special 'fsxadmin' account doesn't have access to all the metrics that a traditional ONTAP 'admin' account would have, -some of the metrics and dashboards may not be fully applicable or available. The ones with 'fsx' tag are more relevant for FSxN. - ---- - -## Author Information - -This repository is maintained by the contributors listed on [GitHub](https://github.com/NetApp/FSx-ONTAP-samples-scripts/graphs/contributors). - -## License - -Licensed under the Apache License, Version 2.0 (the "License"). - -You may obtain a copy of the License at [apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). - -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an _"AS IS"_ basis, without WARRANTIES or conditions of any kind, either express or implied. - -See the License for the specific language governing permissions and limitations under the License. - -© 2025 NetApp, Inc. All Rights Reserved.