Tag Archives: chkdsk

NTFS Errors: Event ID 55 – KB10391728

Description:  Found NTFS errors in the Event Viewer while performing maintenance

NTFS errors are caused by one of the two things: The disk might have bad sectors or I/O requests issued by the file system to the disk subsystem might not have been completed successfully

Common customer description: Computer seems slow or is telling me that it’s corrupted

Probing questions: Is this a recent occurrence?

Steps to isolate: Check Event Viewer for Event ID 55

Determine which drive the issue is located on

Steps to resolve:

1. Open the Command Prompt as Administrator

2. Run chkntfs on the bad drive

  • chkntfs checks the file system to see if there are any “dirty” bits
  • If the status come back “[Drive] is dirty” or even “[Drive] is clean, go to step 3

3. Run the read-only chkdsk (chkdsk [Drive Letter:]) to make sure in fact that the drive is having an issue

  • chkdsk verifies the file system integrity of a volume and fixes logical file system errors
  • If the chkdsk (drive Letter) comes back with an error, run the chkdsk /f command

4. Call the customer to see when we can reboot their server so the chkdsk can run

5. After the reboot happens, watch the Event Viewer for a couple of days to see if the Event ID 55 comes back.

  • If it doesn’t, you can close the ticket
  • If it does, try running the command chkdsk /r

6. Open command prompt as administrator

7. Run the command chkdsk /r

  • What that does is it locates bad sectors and recovers readable information by checking the entire disk surface for bad sectors and attempt to repair or work around any that it finds, if it can

8. Again you might need to reboot the computer again

9. Repeat Step 7 and watch to see if the NTFS errors come back

 

Driver Detected a Controller Error – KB1039904

KB1039904

Description:  The controller that controls your hard drive – Storage event saying that it’s going bad

Common customer description: Hard drive is making a funny sound

Noticing errors in Event Viewer

Probing questions: Was this a recent occurrence?

Was there any recent power outages/surges/brownouts?

Steps to isolate: Check previous tickets

Check HP System Management or Dell Server Administrator

Check Event Viewer

Steps to resolve:

1. Like majority of all the alerts you will encounter, check HP System Management or Dell Server Administrator.

  • Is there any issues with the hard drives?

2. If not, check to make sure that the Firmware and the Driver are up-to-date — if they are out of date, it has been known to cause false alerts.

  • If they are out of date, install the latest firmware and/or driver

3. If there are no errors in HP System Management or Dell Server Administrator, check Event Viewer for errors.

  • HP System Management shows things as:
    • Server Agents
    • NIC Agents
    • Storage Agents
  • Dell Server Administrator shows things as:
    • Server Administrator

4. If you are finding errors, try using the command chkdsk [Drive Letter:] /f and schedule a reboot of the server with the customer’s approval.

5. If there are still errors reporting, call the customer and see if we can send an Onsite Technician out to troubleshoot this issue further since it’s showing a potential failure in the future.

Additional considerations: Remember to get approval from the customer when scheduling a reboot of their server. If we have to escalate the ticket, please get approval from Remote Maintenance Lead, Tier 2 or Tier 3.

Disk Paging Operation Alert – KB1039897

KB1039897

Description: Error was detected on device DeviceHarddisk0DR0 during a paging operation. This issue comes from the OS swapping memory to disk or disk to memory. Refer to article http://support.microsoft.com/kb/244780

Common customer description:

My server seems to be running slow

Probing questions:

Is the server slow often?

Is the server a terminal server?

Steps to isolate:

Check the memory. Is it adequate enough for what the need and/or use?

Steps to resolve:

There can be multiple reasons for this alert. Ranging from a USB being plugged in, to Intel Storage Manager causing issues or an indication of a future drive failure.

1. Open command prompt and run chkdsk [Drive Letter]:  — to make sure there isn’t any issues with the drive

  •  If the chkdsk comes back with errors, call the customer to see when the server can be rebooted and run the command chkdsk [Drive Letter]: /f

2. Check on the server to make sure that there isn’t any errors reporting

3. If there are no errors reporting while running chskdsk, the next possible issue is the memory

If there is not enough memory for the server or applications to function, it has been known to throw out an error

1. Open Task Manager and see if the memory is running really high.

  • If the memory is running really high, see what is taking up most of the memory.

2. Consult with a Tier 2 tech and see what the best options are — most likely they will need to upgrade the memory which would be doubled of what they currently have

3. If they don’t want to purchase memory, there is a tool we can use called ADSI Edit that can limit the memory usage of the program or service.

  • This will be a billable task and should get approved by the customer first, also this might require a server reboot.
  • If you have never used ADSI Edit before, have a Tier 2 tech walk you through the process

If the error repeats every few hours or everyday, consult with a Tier 2 tech

Additional considerations: The easiest answer would be to replace the RAM. Consult a Tier 2 tech to make sure that there isn’t anything else we can try. Also if we have to send this onsite, make sure that you have approval from the customer and Remote Maintenance Lead, Tier 2 or Tier 3.

Disk Has a Bad Block – KB1039598

KB1039598

Description:  Received an alert that there is a bad block on a disk

Common customer description: Customer doesn’t call for this issue unless they check their event logs

Probing questions: Does Event Viewer show this issue happening a lot?

Was there recent changes to the server since this happened?

Steps to isolate: Check Event Viewer for errors, if it has a lot of errors or warnings, we need to run some disk checks. If you find one or two warnings…I would watch it and see if it gets better or worse.

Steps to resolve:

Windows Error-Checking (windows version of chkdsk)

1. Go to Computer>>Right click on the drive and go to Properties

Computer

2. Under Tools, click on Check Now under Error-checking

error checking

  • There are two options that you can choose, “Automatically fix file system errors” and “Scan for and attempt recovery of bad sectors”
  • The quick way is to run the Windows Error-Checking without checking the boxes

chkdsk

3. Press Start

  • Now either the scan will come back all clean and say that there are no errors reporting or in this case, it will fail out

error in checking

  • In order to have the “Automatically fix file system errors” work (server 2008 and newer), you would need to reboot the server at the customer’s earliest convenience. When you call, explain that we received an alert that the disk has a bad block which could be a sign of a future issue with the drive, but in order for us to be sure, we need to reboot the server to run this test.Automatic

4. If we got approval to reboot the server, set the reboot schedule to reflect what they wanted

5. Once the server rebooted, run a chkdsk on the drive letter to see if it still has an issue (which is covered below)

 

Chkdsk with Command Prompt

1. To confirm that there is an error, you could also run the chkdsk command in Command Prompt

chkdsk

2. Chkdsk will run in read-only mode (that’s if you did chkdsk [drive letter]: like shown above) It will complete and say that Windows didn’t detect any errors or fail out (see below)

errors in chkdsk

3. If chkdsk does fail, you could try running the command chkdsk  [drive letter]: /f but this will require a reboot of the server in order for this to complete. If you do this, call the customer and see when is it appropriate for us to reboot their server. When you call, explain that we received an alert that the disk has a bad block which could be a sign of a future issue with the drive, but in order for us to be sure, we need to reboot the server to run this test.

4. If we got approval to reboot the server, set the reboot schedule to reflect what they wanted

5. After the server reboots, run a chkdsk on the drive letter to see if it still has an issue

 

Verify the drive is having an issue with Crystal Disk

1. In order for us to use this, in Remote Control, check the box for redirect local drives

2. Connect to the server

3.  Under Computer, find the Z: (Z on MP-X-Username)

Z

4. Double-click into the Z: and find CrystalDiskInfo5_6_2

Crystal

5. Copy the folder to the desktop on the server (the network share is slow if we run the program)

6. Once the folder has copied to the desktop, click inside the folder

7. Double click on DiskInfo (the application)

disk info

8. It will pop-up a windows asking if you you want it to access the hard drive, press Yes

9. When it loads it will look like this:

  • This program will only work if the drive has the S.M.A.R.T features
  • If the drive doesn’t have the S.M.A.R.T features, it will say “No Drive Found” or “No Disk Info”

No Disk found

 

10. The report will tell you about the drive (which is nice) and the attributes of that drive (temperature, read/write errors etc…) All blue is a good thing, drive is healthy. Yellow means caution, potential failures or issues in the future. Red means error/failure, drive could possibly fail soon.

  • If there is a yellow alerts, contact the customer and see if they want us to escalate the ticket to onsite to look into this issue more and since it might require a hard drive replacement.

 

  • If there are red alerts, call the customer to see if we can get someone out to replace the hard drive. Take a screen shot (snipping tool) of the issue and contact the AM with the screen shot the device information, and the server information and say that you got approval to escalate the ticket and that you will be escalating the ticket. In the escalation, state that the drive still needs to be ordered.

Disk info 1

  • If you did not get approval from the customer to escalate, make a note of that in the ticket and see if they will give a reason why (not worried about it, not enough hours etc…) and put that in your ticket and close the ticket. If they did not give an answer, simply state that they declined an onsite at this time and close the ticket.

 

Verify the drive is having an issue with Gsmart Control

1. In order for us to use this, in Remote Control, check the box for redirect local drives

2. Connect to the server

3.  Under Computer, find the Z: (Z on MP-X-Username)

Z

4. Double-click into the Z: and find Gsmart Control

gsmart

5. Copy the folder to the desktop on the server (the network share is slow if we run the program)

6. Once the folder has copied to the desktop, click inside the folder

7. Double click on the application

  • This one will install on the company’s server
  • It will prompt with an UAC, just say allow
  • Keep the defaults for the install

8. Start the GSmart Control program

  • This program will only work if the drive has the S.M.A.R.T features
  • If the drive doesn’t have the S.M.A.R.T features, it will say “No Drive Found” or “No Disk Info”

gmsart 1

9. Find the correct drive that you need

  • In this case, it’s WDC-WD10PVT-00HT5T0

gmart2

10. Double Click the drive and it will pop open this screen

gsmart3

  • The first tab Identity is nice because it will give you all the information about the drive (name, model, drive size etc…) which will be handy in case there is an issue

11. The next tab Attributes, is the main focus. Just like in Crystal Disk, it provides information on the same things (remember it uses the S.M.A.R.T features)

Attributes

  • As you can tell, it found an issue with the “Current Pending Sector Count,” any issues will be highlighted in pink

12. If there are issues, contact the customer and see if we can escalate this to have someone look into this further since it’s a possibility that the drive might need to be replaced.

  • If you escalate, take a screen shot (snipping tool) of the issue and contact the AM with the screen shot the device information, and the server information and say that you got approval to escalate the ticket and that you will be escalating the ticket. In the escalation, state that the drive still needs to be ordered.

 

  • If you did not get approval from the customer, make a note of that in the ticket and see if they will give a reason why (not worried about it, not enough hours etc…) and put that in your ticket and close the ticket. If they did not give an answer, simply state that they declined an onsite at this time and close the ticket.

13. When you are done, uninstall the program from their server

 

Additional considerations:  If the alert still shows up, we need to get someone onsite to replace the drive. Tier 2 can help with this issue also, they can help determine if there are any other possibilities left or if we need to escalate this issue.